The MIRC XML Anonymizer
This article describes how to configure the XML anonymizer contained in MIRC Storage Services. The anonymizer provides automatic modification of elements in XML objects. Typical applications for the XML anonymizer are in MIRC Clinical Trial Services. It is also available as part of the DicomEditor program (which processes both DICOM and XML files). The intended audience for this article is MIRC system administrators or clinical trial coordinators at field center sites.
The XML anonymizer is driven by a script file called xml-anonymizer.script located in a MIRC Storage Service's trial subdirectory. This file is not delivered as part of the installation, so it must be constructed manually using any text editor (e.g., TextPad). Because XML is very general, the script language is different from that of the DICOM anonymizer. It is also different in order to make it look XPath-like for the benefit of XSL wizards.
The script language contains three types of statements. Each starts on the first character of a line. Each is indicated by a specific starting character. A line not starting with one of the three command-start characters is appended to the preceeding line.
- Any line starting with a '#' character is a comment line.
- A line starting with an identifier is an assignment statement. An identifier always starts with a '$' and is immediately followed by a name, for example, $UIDROOT.
- A line starting with a '/' character is a path assignment statement. Paths are XPath-like expressions, always starting from the root element.
Here are some examples of paths:
- /MIRCdocument/authorization/owner refers to the owner child element of an authorization element in an XML document whose root element is MIRCdocument.
- /MIRCdocument/@display refers to the display attribute of an XML document whose root element is MIRCdocument.
- /*/* refers to any second-generation child element of an XML document, no matter what its root element is named.
- /*//owner refers to any owner element in an XML document, no matter what its root element is named.
- /message/segment[3] refers to the fourth (always count from zero) segment element in an XML document whose root element is message.
If no bracketed qualifier is present in a path segment, the first element matching the segment name is selected. If all elements matching the segment name are to be selected, use the "[*]" wildcard qualifier, e.g. /root/element[*].
Statements are executed in order. No flow control statements are provided. A path statement in which the left side path is not matched in the document is ignored. Thus, one script file can contain anonymization instructions for files of different schemas. If a path on the right side of a statement does not appear in the document, it generates an empty string.
Assignment statements are of two types:
- $name = expression
- /path = expression
In a $name assignment statement, an expression can be any combination of literals (quoted strings, e.g. "some text"), paths, and other names. For a path assignment, an expression can be any combination of literals, paths, names, or function calls. Whitespace between expression terms is ignored, and there are no operators. There are three functions:
- $require( expression ) forces the creation of the element or attribute identified by the path on the left side of the assignment, including all necessary parent elements. The value assigned to the element or attribute is the value of expression argument of the $require function.
- $remove() causes the element or attribute identified by the path on the left side of the assignment to be removed from the document.
- $uid( expression ) causes the value of the element to be remapped using the value of the expression as the new UID's root. The UID remapping function uses the same UID remapping table as is used by the DICOM anonymizer, allowing UIDs to be remapped while preserving the relationships between DICOM and XML objects.
Here is an example script for remapping UIDs of two different types in XML files whose root elements are called LidcReadMessage:
$UIDROOT = "1.2.3.4" /LidcReadMessage/ResponseHeader/SeriesInstanceUid = $uid($UIDROOT) /LidcReadMessage//imageSOP_UID = $uid($UIDROOT)
Here is an example script for doing the same work as above but at the same time renaming the SeriesInstanceUid element to SeriesInstanceUID:
$UIDROOT = "1.2.3.4" /LidcReadMessage/ResponseHeader/SeriesInstanceUid = $uid($UIDROOT) $temp = /LidcReadMessage/ResponseHeader/SeriesInstanceUid /LidcReadMessage/ResponseHeader/SeriesInstanceUID = $require($temp) /LidcReadMessage/ResponseHeader/SeriesInstanceUid = $remove() /LidcReadMessage//imageSOP_UID = $uid($UIDROOT)
To assist in debugging XML anonymizer scripts, there is a special name assignment command:
- $print = expression
This causes the value of the expression to be printed on the console. If you are running DicomEditor on a Windows system and want to use this feature, you should launch the program from a command window.