The CTP XML Anonymizer

From MircWiki
Revision as of 14:37, 13 August 2008 by Johnperry (talk | contribs)
Jump to navigation Jump to search

This article describes how to configure the CTP XML anonymizer. The anonymizer provides automatic modification of elements in XML objects. The intended audience for this article is CTP system administrators.

The XML anonymizer is driven by a script file which is identified in the XmlAnonymizer pipeline stage's configuration file element. This article describes the script language, which is different from that of the DicomAnonymizer and DicomFilter pipeline stages.

The script language contains three types of statements. Each starts on the first character of a line. Each is indicated by a specific starting character. A line not starting with one of the three command-start characters is appended to the preceeding line.

  • Any line starting with a '#' character is a comment line.
  • A line starting with an identifier is an assignment statement. An identifier always starts with a '$' and is immediately followed by a name, for example, $UIDROOT.
  • A line starting with a '/' character is a path assignment statement. Paths are XPath-like expressions, always starting from the root element.

If no bracketed qualifier is present in a path segment, the first element matching the segment name is selected. If all elements matching the segment name are to be selected, use the "[*]" wildcard qualifier, e.g. /root/element[*].

Here are some examples of paths:

  • /MIRCdocument/authorization/owner refers to the first owner child element of the first authorization child element of the MIRCdocument root element.
  • /MIRCdocument/@display refers to the display attribute of the MIRCdocument root element.
  • /*/* refers to any second-generation child element of an XML document, no matter what its root element is named.
  • /*//owner refers to any owner element in an XML document, no matter what its root element is named.
  • /message/segment[3] refers to the fourth (always count from zero) segment element of the message root element.

Statements are executed in order. No flow control statements are provided. A path statement in which the left side path is not matched in the document is ignored. Thus, one script file can contain anonymization instructions for files of different schemas. If a path on the right side of a statement does not appear in the document, it generates an empty string.

Assignment statements are of two types:

  • $name = expression
  • /path = expression

In a $name assignment statement, an expression can be any combination of literals (quoted strings, e.g. "some text"), paths, and other names. For a path assignment, an expression can be any combination of literals, paths, names, or function calls. Whitespace between expression terms is ignored, and there are no operators. There are three functions:

  • $require( expression ) forces the creation of the element or attribute identified by the path on the left side of the assignment, including all necessary parent elements. The value assigned to the element or attribute is the value of expression argument of the $require function.
  • $remove() causes the element or attribute identified by the path on the left side of the assignment to be removed from the document.
  • $uid( expression ) causes the value of the element to be remapped using the value of the expression as the new UID's root. The UID remapping function uses the same UID remapping table as is used by the DICOM anonymizer, allowing UIDs to be remapped while preserving the relationships between DICOM and XML objects.

Here is an example script for remapping UIDs of two different types in XML files whose root elements are called LidcReadMessage:

$UIDROOT = "1.2.3.4"
/LidcReadMessage/ResponseHeader/SeriesInstanceUid = $uid($UIDROOT)
/LidcReadMessage//imageSOP_UID = $uid($UIDROOT)

Here is an example script for doing the same work as above but at the same time renaming the SeriesInstanceUid element to SeriesInstanceUID:

$UIDROOT = "1.2.3.4"
/LidcReadMessage/ResponseHeader/SeriesInstanceUid = $uid($UIDROOT)
$temp = /LidcReadMessage/ResponseHeader/SeriesInstanceUid
/LidcReadMessage/ResponseHeader/SeriesInstanceUID = $require($temp)
/LidcReadMessage/ResponseHeader/SeriesInstanceUid = $remove()
/LidcReadMessage//imageSOP_UID = $uid($UIDROOT)

To assist in debugging XML anonymizer scripts, there is a special name assignment command:

  • $print = expression

This causes the value of the expression to be inserted in the log.