The CTP XML and Zip Filters
The CTP XmlFilter and ZipFilter are pipeline stages that provide preprocessing of XmlObjects and ZipObjects, quarantining those which do not meet the conditions of a script program. This article describes the script language, which is shared by both filter classes. The intended audience for this article is CTP administrators setting up a processing pipeline.
XmlObjects are XML files that contain data in any XML schema. CTP requires XmlObjects to provide certain identifiers (UIDs, patient IDs, etc.), and allows some flexibility in where they are stored. This flexibility is described in The CTP XmlObject.
ZipObects are Zip files that contain, in addition to any other files, an XML file called manifest.xml that contains the necessary identifiers to relate them to other objects in a study. The ZipObject manifest is described in The CTP ZipObject.
The Script Language
The script language interrogates an object and computes a boolean result that, if true, results in the object being accepted for further processing in the pipeline, and if false, results in the object being quarantined, aborting further processing. In the case of an XmlObject, the script operates on all the XML data in the object. In the case of a ZipObject, the script operates only on the XML data in the object's manifest.xml file.
An expression in the language consists of terms separated by operators and/or parentheses. There are three operators, listed in order of increasing precedence:
- + is logical or
- * is logical and
- ! is unary logical negation
- term + term * term
- term * (term + term) + term * !term
Terms in the language are either reserved words (true. or false.) (note the periods after the words) or expressions in the form:
An identifier is an XML element or attribute name specified as a full path starting with the root element. In most cases, a path can be specified as in these examples:
The leading slash is optional unless the name of the root element is a reserved word (true or false), in which case it is required.
If the name of any path element contains a period, the entire path must be enclosed in double-quotes, as in:
If the document contains multiple child elements at a path location, it is possible to select one of the children by placing a zero-based index in square brackets, as in:
The value of an identifier is the string value stored in the specified element or attribute. If an identifier is missing from the received object, an empty string is provided.
- equals returns true if the value of the identifier exactly equals the string argument; otherwise, it returns false.
- equalsIgnoreCase returns true if the value of the identifier equals the string argument; otherwise, it returns false. This method is not case-sensitive.
- matches returns true if the value of the identifier matches the regular expression specified in the string argument; otherwise, it returns false.
- contains returns true if the value of the identifier contains the the string argument anywhere within it; otherwise, it returns false.
- containsIgnoreCase is the case-insensitive version of contains.
- startsWith returns true if the value of the identifier starts with the string argument; otherwise, it returns false.
- startsWithIgnoreCase is the case-insensitive version of startsWith.
- endsWith returns true if the value of the identifier ends with the string argument; otherwise, it returns false.
- endsWithIgnoreCase is the case-insensitive version of endsWithIgnoreCase.
Suppose that objects are to be rejected if they are of type "SECONDARY". Such objects could be filtered out of the pipeline with a script like:
Note the unary negation operator, which is necessary to generate true for objects which do not contain the type string SECONDARY.
Suppose that objects are to be rejected if they are of type "SECONDARY" or of type "DERIVED". Such objects could be filtered out of the pipeline with a script like:
- !(/firstname.lastname@example.org("SECONDARY") + /email@example.com("DERIVED"))
Note again the unary negation operator, and also note the parentheses and the logical or operator, all of which combine to generate true only if the type is neither SECONDARY nor DERIVED.
Finally, suppose that objects containing any non-empty value in the type attribute are to be rejected. Such objects could be filtered out with a script like:
Note that in this case the unary negation operator is not used because if the attribute is missing or empty, the equals method will generate true, which is the value necessary to pass the object down the pipeline.