The CTP XML Anonymizer

From MircWiki
Revision as of 21:17, 8 November 2008 by Johnperry (talk | contribs)
Jump to navigation Jump to search

This article describes how to configure the CTP XML anonymizer. The anonymizer provides automatic modification of elements in XML objects. The intended audience for this article is CTP system administrators.

The XML anonymizer is driven by a script file which is identified in the XmlAnonymizer pipeline stage's configuration file element. This article describes the script language, which is different from that of the DicomAnonymizer and DicomFilter pipeline stages. (Note: the ZipAnonymizer uses the XmlAnonymizer to modify the manifest.xml files of ZipObjects, and the script language is the same.)

The script language contains three types of statements. Each starts on the first character of a line. Each is indicated by a specific starting character. A line not starting with one of the three command-start characters is appended to the preceeding line.

  • Any line starting with a '#' character is a comment line.
  • A line starting with an identifier is an assignment statement. An identifier always starts with a '$' and is immediately followed by a name, for example, $UIDROOT.
  • A line starting with a '/' character is a path assignment statement. Paths are XPath-like expressions, always starting from the root element.

If no bracketed qualifier is present in a path segment, the first element matching the segment name is selected. If all elements matching the segment name are to be selected, use the "[*]" wildcard qualifier, e.g. /root/element[*].

Here are some examples of paths:

  • /MIRCdocument/authorization/owner refers to the first owner child element of the first authorization child element of the MIRCdocument root element.
  • /MIRCdocument/@display refers to the display attribute of the MIRCdocument root element.
  • /*/* refers to any second-generation child element of an XML document, no matter what its root element is named.
  • /*//owner refers to any owner element in an XML document, no matter what its root element is named.
  • /message/segment[3] refers to the fourth (always count from zero) segment element of the message root element.

Statements are executed in order. No flow control statements are provided. A path statement in which the left side path is not matched in the document is ignored. Thus, one script file can generally contain anonymization instructions for files of different schemas. If a path on the right side of a statement does not appear in the document, it generates an empty string.

Assignment statements are of two types:

  • $name = expression
  • /path = expression

In a $name assignment statement, an expression can be any combination of literals (quoted strings, e.g. "some text"), paths, and other names.

For a path assignment, an expression can be any combination of literals, paths, names, or function calls. Whitespace between expression terms is ignored, and there are no operators. There are several functions. In each function, any argument is an expression, which can be any combination of literals, paths, names, and function calls.

  • $require( expression ) forces the creation of the element or attribute identified by the path on the left side of the assignment, including all necessary parent elements. The value assigned to the element or attribute is the value of expression argument of the $require function.
  • $remove() causes the element or attribute identified by the path on the left side of the assignment to be removed from the document.
  • $uid( root ) creates a new UID, using the value of the root parameter as the UID root.
  • $hashuid( root, uid ) causes the value of the uid parameter to be remapped, using the value of the root parameter as the new UID's root. The UID remapping function uses the same hashuid function as in the DICOM anonymizer, allowing UIDs to be remapped while preserving the relationships between DICOM, XML, and Zip objects.
  • $hash( string, maxlen ) causes the value of the string parameter to be hashed, limiting the length of the output string to the number of characters specified in the maxlen parameter. If the maxlen parameter is not specified, the full length of the hashed string (typically about 40 numeric characters) is produced.
  • $hashname( string, maxlen, maxwds ) causes the value of the string parameter to be treated as a DICOM name string in the form Last^First^Middle. If present, the maxwds parameter limits the number of names used. After imposing such a limit, the result is hashed, and the maxlen parameter, if present, is then used to limit the number of characters output.
  • $hashptid( siteid, string, maxlen) causes the values of the siteid and string parameters to be combined and hashed. Again, the maxlen parameter limits the number of characters output.
  • $initials( string ) causes the value of the string parameter to be treated as a DICOM name string in the form Last^First^Middle. The first initials of the names are combined in the order FML.
  • $encrypt( string, key ) causes the value of the string parameter to be encrypted using the value of the key parameter as the encryption key.
  • $round ( string, bin ) causes the value of the string parameter to be treated as a numeric value. The function rounds the string value to the center of the nearest bin. Bins have a width set by the value of the bin parameter, and the first bin is centered on zero. Thus, if the bin parameter has the value "10", the first bin is centered on zero, the second on 10, etc.
  • $incrementdate( string, increment ) causes the value of the string parameter to be treated as a date string and incremented by the number of days specified in the increment parameter. The increment parameter can be positive or negative, with negative increments moving dates into the past.
  • $time( sep ) causes the current time to be returned in the format HH:MM:SS, with the colons replaced with the value of the sep parameter. If the sep parameter is missing, no separator is used.
  • $date( sep ) causes the current date to be returned in the format YYYY-MM-DD, with the dashes replaced with the value of the sep parameter. If the sep parameter is missing, no separator is used.

In a path assignment statement, a reserved word $this is available on the right side. It always contains the value of the path on the left side of the assignment.

Here is an example script for remapping UIDs of two different types in XML files whose root elements are called LidcReadMessage:

$UIDROOT = "1.2.3.4"
/LidcReadMessage/ResponseHeader/SeriesInstanceUid = $hashuid($UIDROOT, $this)
/LidcReadMessage//imageSOP_UID = $hashuid($UIDROOT, $this)

Here is an example script for doing the same work as above but at the same time renaming the SeriesInstanceUid element to SeriesInstanceUID:

$UIDROOT = "1.2.3.4"
/LidcReadMessage/ResponseHeader/SeriesInstanceUid = $hashuid($UIDROOT, $this)
$temp = /LidcReadMessage/ResponseHeader/SeriesInstanceUid
/LidcReadMessage/ResponseHeader/SeriesInstanceUID = $require($temp)
/LidcReadMessage/ResponseHeader/SeriesInstanceUid = $remove()
/LidcReadMessage//imageSOP_UID = $uid($UIDROOT, $this)

As a final example, the following is a more or less random script used when testing the ZipAnonymizer.

$KEY = "12345abcde"
$UIDROOT = "9.9"
$SITEID = "32"
$MAXLEN = "8"
$OLDUID = /manifest/@uid

#Remap the manifest attributes
/manifest/@uid = $hashuid($UIDROOT, $this)
/manifest/@study-uid = $hashuid($UIDROOT, $this)
/manifest/@pt-name = "[" $SITEID "]-" $initials($this)
/manifest/@pt-id = $hashptid($SITEID, $this, $MAXLEN)
/manifest/@date = $incrementdate($this, "-137")

#Create some new elements
/manifest/today = $require($date() "@" $time())
/manifest/newuid= $require($uid($UIDROOT))
/manifest/encrypteduid= $require($encrypt(/manifest/@uid, $KEY))

#In the test manifest, there were some child elements containing UIDs.
#There were many series-uid and instance-uid elements. 
#This is how to do them all in one line each:
/manifest//series-uid = $hashuid($UIDROOT, $this)
/manifest//instance-uid = $hashuid($UIDROOT, $this)

#You can access elements you have just created:
$AGE = "26"
/manifest/age = $require($AGE)
/manifest/roundedage = $require($round(/manifest/age, "5"))

#You can call functions with static strings as well as ones from the document:
/manifest/hashedtext = $require($hash("Now is the time for all good men..."))

#These next two elements should have the same value:
/manifest/hashedJHP = $require($hashname("Last^First^Middle","100","2"))
/manifest/hashedJP = $require($hashname("Last^First","100"))

To assist in debugging XML anonymizer scripts, there is a special name assignment command:

  • $print = expression

This causes the value of the expression to be inserted in the log.