Difference between revisions of "The CTP XML and Zip Filters"

From MircWiki
Jump to navigation Jump to search
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
The CTP XmlFilter and ZipFilter are pipeline stages that provide preprocessing of XmlObjects and ZipObjects, quarantining those which do not meet the conditions of a script program. This article describes the script language, which is shared by both filter classes. The intended audience for this article is CTP administrators setting up a processing pipeline.
 
The CTP XmlFilter and ZipFilter are pipeline stages that provide preprocessing of XmlObjects and ZipObjects, quarantining those which do not meet the conditions of a script program. This article describes the script language, which is shared by both filter classes. The intended audience for this article is CTP administrators setting up a processing pipeline.
  
XmlObjects are XML files which contain data in any XML schema. CTP requires XmlObjects to provide certain identifiers (UIDs, patient IDs, etc.), and allows some flexibility in where they are stored. This flexibility is described in [[The CTP XmlObject]].
+
XmlObjects are XML files that contain data in any XML schema. CTP requires XmlObjects to provide certain identifiers (UIDs, patient IDs, etc.), and allows some flexibility in where they are stored. This flexibility is described in [[The CTP XmlObject]].
  
ZipObects are Zip files which contain, in addition to any other files, an XML file called <b>manifest.xml</b> which contain the necessary identifiers to relate them to other objects in a study. The ZipObject manifest is described in [[The CTP ZipObject]].
+
ZipObects are Zip files that contain, in addition to any other files, an XML file called <b>manifest.xml</b> that contains the necessary identifiers to relate them to other objects in a study. The ZipObject manifest is described in [[The CTP ZipObject]].
  
 
==The Script Language==
 
==The Script Language==
The script language interrogates object and computes a boolean result that, if <b>true</b>, results in the object being accepted for further processing in the pipeline, and if <b>false</b>, results in the object being quarantined, aborting further processing. In the case of an XmlObject, the script operates on all the XML data in the object. In the case of a ZipObject, the script operates only on the XML data the object's <b>manifest.xml</b> file.
+
The script language interrogates an object and computes a boolean result that, if <b>true</b>, results in the object being accepted for further processing in the pipeline, and if <b>false</b>, results in the object being quarantined, aborting further processing. In the case of an XmlObject, the script operates on all the XML data in the object. In the case of a ZipObject, the script operates only on the XML data in the object's <b>manifest.xml</b> file.
  
 
An expression in the language consists of terms separated by operators and/or parentheses. There are three operators, listed in order of increasing precedence:
 
An expression in the language consists of terms separated by operators and/or parentheses. There are three operators, listed in order of increasing precedence:
Line 22: Line 22:
  
 
::<b><font color=red>identifier</font>.method("<font color=blue>string</font>")</b>
 
::<b><font color=red>identifier</font>.method("<font color=blue>string</font>")</b>
 +
 +
<b>Identifiers</b>:
  
 
An identifier is an XML element or attribute name specified as a full path starting with the root element. In most cases, a path can be specified as in these examples:
 
An identifier is an XML element or attribute name specified as a full path starting with the root element. In most cases, a path can be specified as in these examples:
Line 41: Line 43:
 
The value of an identifier is the string value stored in the specified element or attribute. If an identifier is missing from the received object, an empty string is provided.
 
The value of an identifier is the string value stored in the specified element or attribute. If an identifier is missing from the received object, an empty string is provided.
  
The language supports these methods:
+
<b>Methods</b>:
 +
 
 
*<b>equals</b> returns <b>true</b> if the value of the <b>identifier</b> exactly equals the <b>string</b> argument; otherwise, it returns <b>false</b>.
 
*<b>equals</b> returns <b>true</b> if the value of the <b>identifier</b> exactly equals the <b>string</b> argument; otherwise, it returns <b>false</b>.
 +
*<b>equalsIgnoreCase</b> returns <b>true</b> if the value of the <b>identifier</b> equals the <b>string</b> argument; otherwise, it returns <b>false</b>. This method is not case-sensitive.
 +
 
*<b>matches</b> returns <b>true</b> if the value of the <b>identifier</b> matches the regular expression specified in the <b>string</b> argument; otherwise, it returns <b>false</b>.
 
*<b>matches</b> returns <b>true</b> if the value of the <b>identifier</b> matches the regular expression specified in the <b>string</b> argument; otherwise, it returns <b>false</b>.
 +
 
*<b>contains</b> returns <b>true</b> if the value of the <b>identifier</b> contains the the <b>string</b> argument anywhere within it; otherwise, it returns <b>false</b>.
 
*<b>contains</b> returns <b>true</b> if the value of the <b>identifier</b> contains the the <b>string</b> argument anywhere within it; otherwise, it returns <b>false</b>.
 +
*<b>containsIgnoreCase</b> is the case-insensitive version of <b>contains</b>.
 +
 
*<b>startsWith</b> returns <b>true</b> if the value of the <b>identifier</b> starts with the <b>string</b> argument; otherwise, it returns <b>false</b>.
 
*<b>startsWith</b> returns <b>true</b> if the value of the <b>identifier</b> starts with the <b>string</b> argument; otherwise, it returns <b>false</b>.
 +
*<b>startsWithIgnoreCase</b> is the case-insensitive version of <b>startsWith</b>.
 +
 
*<b>endsWith</b> returns <b>true</b> if the value of the <b>identifier</b> ends with the <b>string</b> argument; otherwise, it returns <b>false</b>.
 
*<b>endsWith</b> returns <b>true</b> if the value of the <b>identifier</b> ends with the <b>string</b> argument; otherwise, it returns <b>false</b>.
 
+
*<b>endsWithIgnoreCase</b> is the case-insensitive version of <b>endsWith</b>.
All the methods use case-sensitive comparisons; therefore, <b>ABC</b> does not equal <b>abc</b>.
 
  
 
<b>Script Examples</b>:
 
<b>Script Examples</b>:
Line 58: Line 67:
 
Note the unary negation operator, which is necessary to generate <b>true</b> for objects which do <b>not</b> contain the type string <b>SECONDARY</b>.
 
Note the unary negation operator, which is necessary to generate <b>true</b> for objects which do <b>not</b> contain the type string <b>SECONDARY</b>.
  
Suppose that objects are to be rejected if they are of type "SECONDARY" or of type "DERIVED". Such images could be filtered out of the pipeline with a script like:
+
Suppose that objects are to be rejected if they are of type "SECONDARY" or of type "DERIVED". Such objects could be filtered out of the pipeline with a script like:
  
 
::<b>!(/manifest/@type.contains("SECONDARY") + /manifest/@type.contains("DERIVED"))</b>
 
::<b>!(/manifest/@type.contains("SECONDARY") + /manifest/@type.contains("DERIVED"))</b>
Line 64: Line 73:
 
Note again the unary negation operator, and also note the parentheses and the logical <b>or</b> operator, all of which combine to generate <b>true</b> only if the type is neither <b>SECONDARY</b> nor <b>DERIVED</b>.
 
Note again the unary negation operator, and also note the parentheses and the logical <b>or</b> operator, all of which combine to generate <b>true</b> only if the type is neither <b>SECONDARY</b> nor <b>DERIVED</b>.
  
Finally, suppose that objects containing any non-empty value in the type attribute are to be rejected. Such images could be filtered out with a script like:
+
Finally, suppose that objects containing any non-empty value in the type attribute are to be rejected. Such objects could be filtered out with a script like:
  
 
::<b>manifest/@type.equals("")</b>
 
::<b>manifest/@type.equals("")</b>
  
 
Note that in this case the unary negation operator is not used because if the attribute is missing or empty, the <b>equals</b> method will generate <b>true</b>, which is the value necessary to pass the object down the pipeline.
 
Note that in this case the unary negation operator is not used because if the attribute is missing or empty, the <b>equals</b> method will generate <b>true</b>, which is the value necessary to pass the object down the pipeline.

Latest revision as of 20:09, 15 January 2015

The CTP XmlFilter and ZipFilter are pipeline stages that provide preprocessing of XmlObjects and ZipObjects, quarantining those which do not meet the conditions of a script program. This article describes the script language, which is shared by both filter classes. The intended audience for this article is CTP administrators setting up a processing pipeline.

XmlObjects are XML files that contain data in any XML schema. CTP requires XmlObjects to provide certain identifiers (UIDs, patient IDs, etc.), and allows some flexibility in where they are stored. This flexibility is described in The CTP XmlObject.

ZipObects are Zip files that contain, in addition to any other files, an XML file called manifest.xml that contains the necessary identifiers to relate them to other objects in a study. The ZipObject manifest is described in The CTP ZipObject.

The Script Language

The script language interrogates an object and computes a boolean result that, if true, results in the object being accepted for further processing in the pipeline, and if false, results in the object being quarantined, aborting further processing. In the case of an XmlObject, the script operates on all the XML data in the object. In the case of a ZipObject, the script operates only on the XML data in the object's manifest.xml file.

An expression in the language consists of terms separated by operators and/or parentheses. There are three operators, listed in order of increasing precedence:

  • + is logical or
  • * is logical and
  • ! is unary logical negation

Expression Examples:

  • term
  • !term
  • term + term * term
  • term * (term + term) + term * !term

Terms in the language are either reserved words (true. or false.) (note the periods after the words) or expressions in the form:

identifier.method("string")

Identifiers:

An identifier is an XML element or attribute name specified as a full path starting with the root element. In most cases, a path can be specified as in these examples:

  • root/child
  • /root/child
  • /root/child/grandchild/@attribute

The leading slash is optional unless the name of the root element is a reserved word (true or false), in which case it is required.

If the name of any path element contains a period, the entire path must be enclosed in double-quotes, as in:

  • "root.2.1"
  • "/root/child.3"
  • "root/child/@attr.4"

If the document contains multiple child elements at a path location, it is possible to select one of the children by placing a zero-based index in square brackets, as in:

  • /root/child[2]
  • /root/child[2]/grandchild[6]/greatgrandchild/@attribute

The value of an identifier is the string value stored in the specified element or attribute. If an identifier is missing from the received object, an empty string is provided.

Methods:

  • equals returns true if the value of the identifier exactly equals the string argument; otherwise, it returns false.
  • equalsIgnoreCase returns true if the value of the identifier equals the string argument; otherwise, it returns false. This method is not case-sensitive.
  • matches returns true if the value of the identifier matches the regular expression specified in the string argument; otherwise, it returns false.
  • contains returns true if the value of the identifier contains the the string argument anywhere within it; otherwise, it returns false.
  • containsIgnoreCase is the case-insensitive version of contains.
  • startsWith returns true if the value of the identifier starts with the string argument; otherwise, it returns false.
  • startsWithIgnoreCase is the case-insensitive version of startsWith.
  • endsWith returns true if the value of the identifier ends with the string argument; otherwise, it returns false.
  • endsWithIgnoreCase is the case-insensitive version of endsWith.

Script Examples:

Suppose that objects are to be rejected if they are of type "SECONDARY". Such objects could be filtered out of the pipeline with a script like:

!/manifest/@type.contains("SECONDARY")

Note the unary negation operator, which is necessary to generate true for objects which do not contain the type string SECONDARY.

Suppose that objects are to be rejected if they are of type "SECONDARY" or of type "DERIVED". Such objects could be filtered out of the pipeline with a script like:

!(/manifest/@type.contains("SECONDARY") + /manifest/@type.contains("DERIVED"))

Note again the unary negation operator, and also note the parentheses and the logical or operator, all of which combine to generate true only if the type is neither SECONDARY nor DERIVED.

Finally, suppose that objects containing any non-empty value in the type attribute are to be rejected. Such objects could be filtered out with a script like:

manifest/@type.equals("")

Note that in this case the unary negation operator is not used because if the attribute is missing or empty, the equals method will generate true, which is the value necessary to pass the object down the pipeline.