The CTP XML and Zip Filters
The CTP XmlFilter and ZipFilter are pipeline stages that provide preprocessing of XmlObjects and ZipObjects, quarantining those which do not meet the conditions of a script program. This article describes the script language, which is shared by both filter classes. The intended audience for this article is CTP administrators setting up a processing pipeline.
XmlObjects are XML files which contain data in any XML schema. CTP requires XmlObjects to provide certain identifiers (UIDs, patient IDs, etc.), and allows some flexibility in where they are stored. This flexibility is described in The CTP XmlObject.
ZipObects are Zip files which contain, in addition to any other files, an XML file called manifest.xml which contain the necessary identifiers to relate them to other objects in a study. The ZipObject manifest is described in The CTP ZipObject.
The Script Language
The script language interrogates object and computes a boolean result that, if true, results in the object being accepted for further processing in the pipeline, and if false, results in the object being quarantined, aborting further processing. In the case of an XmlObject, the script operates on all the XML data in the object. In the case of a ZipObject, the script operates only on the XML data the object's manifest.xml file.
An expression in the language consists of terms separated by operators and/or parentheses. There are three operators, listed in order of increasing precedence:
- + is logical or
- * is logical and
- ! is unary logical negation
- term + term * term
- term * (term + term) + term * !term
Terms in the language are either reserved words (true. or false.) (note the periods after the words) or expressions in the form:
An identifier is an XML element or attribute name specified as a full path starting with the root element. In most cases, a path can be specified as in these examples:
The leading slash is optional unless the name of the root element is a reserved word (true or false), in which case it is required.
If the name of any path element contains a period, the entire path must be enclosed in double-quotes, as in:
If the document contains multiple child elements at a path location, it is possible to select one of the children by placing a zero-based index in square brackets, as in:
The value of an identifier is the string value stored in the specified element or attribute. If an identifier is missing from the received object, an empty string is provided.
The language supports these methods:
- equals returns true if the value of the identifier exactly equals the string argument; otherwise, it returns false.
- matches returns true if the value of the identifier matches the regular expression specified in the string argument; otherwise, it returns false.
- contains returns true if the value of the identifier contains the the string argument anywhere within it; otherwise, it returns false.
- startsWith returns true if the value of the identifier starts with the string argument; otherwise, it returns false.
- endsWith returns true if the value of the identifier ends with the string argument; otherwise, it returns false.
All the methods use case-sensitive comparisons; therefore, ABC does not equal abc.
Suppose that objects are to be rejected if they are of type "SECONDARY". Such objects could be filtered out of the pipeline with a script like:
Note the unary negation operator, which is necessary to generate true for objects which do not contain the type string SECONDARY.
Suppose that objects are to be rejected if they are of type "SECONDARY" or of type "DERIVED". Such images could be filtered out of the pipeline with a script like:
- !(/firstname.lastname@example.org("SECONDARY") + /email@example.com("DERIVED"))
Note again the unary negation operator, and also note the parentheses and the logical or operator, all of which combine to generate true only if the type is neither SECONDARY nor DERIVED.
Finally, suppose that objects containing any non-empty value in the type attribute are to be rejected. Such images could be filtered out with a script like:
Note that in this case the unary negation operator is not used because if the attribute is missing or empty, the equals method will generate true, which is the value necessary to pass the object down the pipeline.