Difference between revisions of "The CTP XML Anonymizer"

From MircWiki
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 32: Line 32:
 
   
 
   
 
There are several functions. In each function, any argument is an expression, which can be any combination of literals, paths, names, and function calls.  
 
There are several functions. In each function, any argument is an expression, which can be any combination of literals, paths, names, and function calls.  
*<b>$require( expression )</b> forces the creation of the element or attribute identified by the path on the left side of the assignment, including all necessary parent elements. The value assigned to the element or attribute is the value of expression argument of the <b>$require</b> function.  
+
*<b>$require( expression )</b> forces the creation of the element or attribute identified by the path on the left side of the assignment, including all necessary parent elements. The value assigned to the element or attribute is the value of the expression argument of the <b>$require</b> function.  
 
*<b>$remove()</b> causes the element or attribute identified by the path on the left side of the assignment to be removed from the document.  
 
*<b>$remove()</b> causes the element or attribute identified by the path on the left side of the assignment to be removed from the document.  
 
*<b>$uid( root )</b> creates a new UID, using the value of the root parameter as the UID root.
 
*<b>$uid( root )</b> creates a new UID, using the value of the root parameter as the UID root.
*<b>$hashuid( root, uid )</b> causes the value of the uid parameter to be remapped, using the value of the root parameter as the new UID's root. The UID remapping function uses the same <b>hashuid</b> function as in the DICOM anonymizer, allowing UIDs to be remapped while preserving the relationships between DICOM, XML, and Zip objects.
+
*<b>$lookup( string, uid )</b> causes the value of the string parameter to be used as an index into the lookup table (using the keytype as the prefix) and the value stored in the table to be returned.
 +
*<b>$hashuid( root, keytype )</b> causes the value of the uid parameter to be remapped, using the value of the root parameter as the new UID's root. The UID remapping function uses the same <b>hashuid</b> function as in the DICOM anonymizer, allowing UIDs to be remapped while preserving the relationships between DICOM, XML, and Zip objects.
 
*<b>$hash( string, maxlen )</b> causes the value of the string parameter to be hashed, limiting the length of the output string to the number of characters specified in the maxlen parameter. If the maxlen parameter is not specified, the full length of the hashed string (typically about 40 numeric characters) is produced.
 
*<b>$hash( string, maxlen )</b> causes the value of the string parameter to be hashed, limiting the length of the output string to the number of characters specified in the maxlen parameter. If the maxlen parameter is not specified, the full length of the hashed string (typically about 40 numeric characters) is produced.
 
*<b>$hashname( string, maxlen, maxwds )</b> causes the value of the string parameter to be treated as a DICOM name string in the form Last^First^Middle. If present, the maxwds parameter limits the number of names used. After imposing such a limit, the result is hashed, and the maxlen parameter, if present, is then used to limit the number of characters output.
 
*<b>$hashname( string, maxlen, maxwds )</b> causes the value of the string parameter to be treated as a DICOM name string in the form Last^First^Middle. If present, the maxwds parameter limits the number of names used. After imposing such a limit, the result is hashed, and the maxlen parameter, if present, is then used to limit the number of characters output.
Line 46: Line 47:
 
*<b>$time( sep )</b> causes the current time to be returned in the format HH:MM:SS, with the colons replaced with the value of the sep parameter. If the sep parameter is missing, no separator is used.
 
*<b>$time( sep )</b> causes the current time to be returned in the format HH:MM:SS, with the colons replaced with the value of the sep parameter. If the sep parameter is missing, no separator is used.
 
*<b>$date( sep )</b> causes the current date to be returned in the format YYYY-MM-DD, with the dashes replaced with the value of the sep parameter. If the sep parameter is missing, no separator is used.
 
*<b>$date( sep )</b> causes the current date to be returned in the format YYYY-MM-DD, with the dashes replaced with the value of the sep parameter. If the sep parameter is missing, no separator is used.
 +
*<b>$text()</b> returns the full text of the document at the time the function is called.
  
 
Here is an example script for remapping UIDs of two different types in XML files whose root elements are called <b>LidcReadMessage</b>:  
 
Here is an example script for remapping UIDs of two different types in XML files whose root elements are called <b>LidcReadMessage</b>:  
Line 61: Line 63:
 
/LidcReadMessage/ResponseHeader/SeriesInstanceUid = $remove()
 
/LidcReadMessage/ResponseHeader/SeriesInstanceUid = $remove()
 
/LidcReadMessage//imageSOP_UID = $uid($UIDROOT, this)
 
/LidcReadMessage//imageSOP_UID = $uid($UIDROOT, this)
 +
</pre>
 +
 +
Here is an example script to create a unique identifier for a root element that might not have one:
 +
<pre>
 +
$UIDROOT = "1.2.3.4"
 +
/*/@uid = $require( $hashuid( $UIDROOT, $text() ) )
 
</pre>
 
</pre>
  

Latest revision as of 17:58, 31 January 2017

This article describes how to configure the CTP XML anonymizer. The anonymizer provides automatic modification of elements in XML objects. The intended audience for this article is CTP system administrators.

The XML anonymizer is driven by a script file which is identified in the XmlAnonymizer pipeline stage's configuration file element. This article describes the script language, which is different from that of the DicomAnonymizer and DicomFilter pipeline stages. (Note: the ZipAnonymizer uses the XmlAnonymizer to modify the manifest.xml files of ZipObjects, and the script language is the same.)

The script language contains three types of statements. Each starts on the first character of a line. Each is indicated by a specific starting character. A line not starting with one of the three command-start characters is appended to the preceeding line.

  • Any line starting with a '#' character is a comment line.
  • A line starting with an identifier is an assignment statement. An identifier always starts with a '$' and is immediately followed by a name, for example, $UIDROOT.
  • A line starting with a '/' character is a path assignment statement. Paths are XPath-like expressions, always starting from the root element.

If no bracketed qualifier is present in a path segment, the first element matching the segment name is selected. If all elements matching the segment name are to be selected, use the "[*]" wildcard qualifier, e.g. /root/element[*].

Here are some examples of paths:

  • /MIRCdocument/authorization/owner refers to the first owner child element of the first authorization child element of the MIRCdocument root element.
  • /MIRCdocument/@display refers to the display attribute of the MIRCdocument root element.
  • /*/* refers to any second-generation child element of an XML document, no matter what its root element is named.
  • /*//owner refers to any owner element in an XML document, no matter what its root element is named.
  • /message/segment[3] refers to the fourth (always count from zero) segment element of the message root element.

Statements are executed in order. No flow control statements are provided. A path statement in which the left side path is not matched in the document is ignored. Thus, one script file can generally contain anonymization instructions for files of different schemas. If a path on the right side of a statement does not appear in the document, it generates an empty string.

Assignment statements are of two types:

  • $name = expression
  • /path = expression

In a $name assignment statement, an expression can be any combination of literals (quoted strings, e.g. "some text"), paths, and other names.

In a path assignment statement, an expression can be any combination of literals, paths, names, and function calls. Whitespace between expression terms is ignored, and there are no operators. On the right side, a reserved word, this, is available. It always contains the value of the path on the left side of the assignment.

The value of any path is the value of the corresponding location in the XML object. Paths which identify locations which do not appear in the object return the empty string.

There are several functions. In each function, any argument is an expression, which can be any combination of literals, paths, names, and function calls.

  • $require( expression ) forces the creation of the element or attribute identified by the path on the left side of the assignment, including all necessary parent elements. The value assigned to the element or attribute is the value of the expression argument of the $require function.
  • $remove() causes the element or attribute identified by the path on the left side of the assignment to be removed from the document.
  • $uid( root ) creates a new UID, using the value of the root parameter as the UID root.
  • $lookup( string, uid ) causes the value of the string parameter to be used as an index into the lookup table (using the keytype as the prefix) and the value stored in the table to be returned.
  • $hashuid( root, keytype ) causes the value of the uid parameter to be remapped, using the value of the root parameter as the new UID's root. The UID remapping function uses the same hashuid function as in the DICOM anonymizer, allowing UIDs to be remapped while preserving the relationships between DICOM, XML, and Zip objects.
  • $hash( string, maxlen ) causes the value of the string parameter to be hashed, limiting the length of the output string to the number of characters specified in the maxlen parameter. If the maxlen parameter is not specified, the full length of the hashed string (typically about 40 numeric characters) is produced.
  • $hashname( string, maxlen, maxwds ) causes the value of the string parameter to be treated as a DICOM name string in the form Last^First^Middle. If present, the maxwds parameter limits the number of names used. After imposing such a limit, the result is hashed, and the maxlen parameter, if present, is then used to limit the number of characters output.
  • $hashptid( siteid, string, maxlen) causes the values of the siteid and string parameters to be combined and hashed. Again, the maxlen parameter limits the number of characters output.
  • $initials( string ) causes the value of the string parameter to be treated as a DICOM name string in the form Last^First^Middle. The first initials of the names are combined in the order FML.
  • $encrypt( string, key ) causes the value of the string parameter to be encrypted using the value of the key parameter as the encryption key.
  • $round ( string, bin ) causes the value of the string parameter to be treated as a numeric value. The function rounds the string value to the center of the nearest bin. Bins have a width set by the value of the bin parameter, and the first bin is centered on zero. Thus, if the bin parameter has the value "10", the first bin is centered on zero, the second on 10, etc.
  • $incrementdate( string, increment ) causes the value of the string parameter to be treated as a date string and incremented by the number of days specified in the increment parameter. The increment parameter can be positive or negative, with negative increments moving dates into the past.
  • $modifydate( string, year, month, day ) causes the value of the string parameter to be treated as a date and each of the fields replaced by the values of the year, month, and day parameters. If a parameter is specified as an asterisk, the corresponding value in the original date is preserved. For example, $modifydate( this, "*", "1", "1" ) causes the date to be reset to the first of January, leaving the year unmodified.
  • $time( sep ) causes the current time to be returned in the format HH:MM:SS, with the colons replaced with the value of the sep parameter. If the sep parameter is missing, no separator is used.
  • $date( sep ) causes the current date to be returned in the format YYYY-MM-DD, with the dashes replaced with the value of the sep parameter. If the sep parameter is missing, no separator is used.
  • $text() returns the full text of the document at the time the function is called.

Here is an example script for remapping UIDs of two different types in XML files whose root elements are called LidcReadMessage:

$UIDROOT = "1.2.3.4"
/LidcReadMessage/ResponseHeader/SeriesInstanceUid = $hashuid($UIDROOT, this)
/LidcReadMessage//imageSOP_UID = $hashuid($UIDROOT, this)

Here is an example script for doing the same work as above but at the same time renaming the SeriesInstanceUid element to SeriesInstanceUID:

$UIDROOT = "1.2.3.4"
/LidcReadMessage/ResponseHeader/SeriesInstanceUid = $hashuid($UIDROOT, this)
$temp = /LidcReadMessage/ResponseHeader/SeriesInstanceUid
/LidcReadMessage/ResponseHeader/SeriesInstanceUID = $require($temp)
/LidcReadMessage/ResponseHeader/SeriesInstanceUid = $remove()
/LidcReadMessage//imageSOP_UID = $uid($UIDROOT, this)

Here is an example script to create a unique identifier for a root element that might not have one:

$UIDROOT = "1.2.3.4"
/*/@uid = $require( $hashuid( $UIDROOT, $text() ) )

As a final example, the following is a more or less random script used when testing the ZipAnonymizer.

$KEY = "12345abcde"
$UIDROOT = "9.9"
$SITEID = "32"
$MAXLEN = "8"
$OLDUID = /manifest/@uid

#Remap the manifest attributes
/manifest/@uid = $hashuid($UIDROOT, this)
/manifest/@study-uid = $hashuid($UIDROOT, this)
/manifest/@pt-name = "[" $SITEID "]-" $initials(this)
/manifest/@pt-id = $hashptid($SITEID, this, $MAXLEN)
/manifest/@date = $incrementdate(this, "-137")

#Create some new elements
/manifest/today = $require($date() "@" $time())
/manifest/newuid= $require($uid($UIDROOT))
/manifest/encrypteduid= $require($encrypt($OLDUID, $KEY))

#In the test manifest, there were some child elements containing UIDs.
#There were many series-uid and instance-uid elements. 
#This is how to do them all in one line each:
/manifest//series-uid = $hashuid($UIDROOT, this)
/manifest//instance-uid = $hashuid($UIDROOT, this)

#You can access elements you have just created:
$AGE = "26"
/manifest/age = $require($AGE)
/manifest/roundedage = $require($round(/manifest/age, "5"))

#You can call functions with static strings as well as ones from the document:
/manifest/hashedtext = $require($hash("Now is the time for all good men..."))

#The next two lines should produce elements with the same value:
/manifest/hashedFML = $require($hashname("Last^First^Middle",,"2"))
/manifest/hashedFL = $require($hashname("Last^First"))

To assist in debugging XML anonymizer scripts, there is a special name assignment command:

  • $print = expression

This causes the value of the expression to be inserted in the log.