The CTP Module

From MircWiki
Revision as of 12:56, 4 February 2014 by Johnperry (Talk | contribs) (org.rsna.ctp.objects)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The CTP module (CTP.jar) contains the code of the CTP application, including all the standard stages and plugins. This article describes the theory of operation of the packages contained in the CTP module. The CTP module is used in CTP, TFS, CTPClient, ISN, and other MIRC software. The intended audience for this article is software engineers extending or maintaining any of that software. The prerequisite for understanding this article is The Util Module.

See Setting Up a MIRC Development Environment for information on obtaining and building the CTP module.

The CTP module contains these packages:

  • org.rsna.ctp contains the CTP main class and the Configuration singleton class.
  • org.rsna.ctp.objects contains classes that implement the four types of objects that CTP can process.
  • org.rsna.ctp.pipeline contains the Pipeline class, interfaces that define the types of pipeline stages, and a set of abstract classes that provide starting points for developing each pipeline stage type.
  • org.rsna.ctp.plugin contains the interface that defines a plugin and an abstract class that provides a starting point for developing a plugin.
  • org.rsna.ctp.servlets contains the standard servlets used in CTP.
  • org.rsna.ctp.stdstages contains the standard pipeline stages included in CTP.

Each of these packages will be described in the sections below, with example code where necessary to illustrate the theory of operation. It will be helpful to get the source code and build it so you can reference the Javadocs as you go along.

In addition to the packages listed above, the source code tree also contains packages that are part of helper applications used in CTP:

  • org.rsna.installer contains the the installer program for CTP, TFS, and ISN.
  • org.rsna.launcher contains the program used to manually launch CTP TFS, and ISN.
  • org.rsna.runner contains the program that can be used in a batch job to start or stop CTP, TFS, and ISN. This program is typically used to implement a CTP Linux service.

1 Overview

CTP is a framework for pipelines and plugins.

  • A pipeline is an ordered sequence of processing stages. Data objects enter a pipeline at the head end and encounter the stages in sequence.
  • A pipeline stage performs some task on a data object when it arrives at the stage as it flows down the pipe.
  • A plugin is a module that adds capability to the program outside the scope of a pipeline.

CTP includes a servlet container to provide a user interface for configuring certain features and for accessing data.

CTP is installed using the CTP-installer.jar program that is created when CTP is built. See Setting Up a MIRC Development Environment for information on building CTP. The installer creates a folder called CTP and puts all the necesssary files and subdirectories in it.

CTP is configured by a single XML file. CTP imposes no limits on the number of pipelines or plugins that may be configured or on the number of stages that may be configured in a single pipe. The structure of the configuration file is described in detail in CTP-The RSNA Clinical Trial Processor.

CTP can be run manually through the Launcher.jar program, or it can be run as a Windows or Linux service. See Running CTP as a Windows Service or Running CTP as a Linux Service for details.

The Launcher.jar program is located in the CTP directory. All the modules required for running CTP are located in the CTP/libraries directory or one of its subdirectories. The manifest in the CTP.jar module includes a classpath listing only the util.jar module. When CTP starts, it calls ClasspathUtil.addJars to add all the jar files in the CTP/libraries directory tree to the classpath. By constructing the classpath in this way, developers can add pipeline stages and plugins into CTP without having to modify the CTP build itself.

After setting up the classpath, CTP then loads the configuration. The configuration is encapsulated in the singleton org.rsna.ctp.Configuration class. The Configuration class parses the configuration file and loads the plugins and pipelines. Pipelines load their stages.

Pipelines, stages, and plugins have constructors that take their configuration file XML Element as an argument, and they obtain all their configuration information from it. When these objects are instantiated, the Configuration object is not yet constructed; therefore, the constructors are just for instantiation of the objects themselves, and they may not reference the Configuration object or any other stages or plugins.

After loading the configuration and instantiating the plugins, pipelines, and stages, CTP instantiates the servlet container (org.rsna.server.HttpServer in the util module). This must be done after the configuration is loaded because the configuration determines whether to run the server on HTTP or HTTPS. Because the server is not available when the plugins and stages are instantiated, those classes that add servlets into the server must not do so in their constructors.

Once the configuration has been loaded and the server has been instantiated, CTP calls the start method of the Configuration class. That, in turn, results in calling the start methods of the plugins and stages. The start methods can reference anything in the Configuration, including other stages and plugins. This two-phase startup is necessary because certain stages are designed to interact with other stages and plugins, so everything has to be constructed before any such interaction can be allowed to occur. For the same reason, plugins and stages that add servlets to the server must do so in their start methods.

CTP calls the start methods of all the plugins first, followed by the start methods of all the pipelines. The start methods of pipelines call the start methods of their stages. This sequence is important because a stage may depend on a plugin, and therefore a plugin must be completely running before it is referenced.

Pipelines are implemented as asynchronous threads. A pipeline object maintains an ordered list of all its stages. There are four types of stages:

  • ImportService
  • Processor
  • StorageService
  • ExportService

These stage types are described in CTP-The RSNA Clinical Trial Processor and Extending CTP. Each stage implements its specific interface. Data objects enter a pipe through an ImportService stage. A pipeline object maintains a separate list of its ImportService stages. When a pipeline is ready to process a new data object, it polls its ImportServices in turn until it finds one that has a data object available. As data objects flow down the pipe, they skip any ImportServices; thus, although ImportServices are configured into the pipeline, they can be thought of as being separate from it, and the order in which they appear - and even where they appear - in the pipeline is not important. (Nevertheless, for human readability, it is best to put the ImportServices first in the pipe.)

When running CTP as a Windows service, the Windows Service Manager calls the startService method in org.rsna.ctp.ClinicalTrialProcessor to start the service and the stopService method to stop it. The stopService method makes an HTTP or HTTPS connection (as defined in the configuration) to the org.rsna.ctp.servlets.ShutdownServlet servlet to actually shut down. The ShutdownServlet first stops the pipelines and then the plugins in order to ensure that the stages have access to the plugins throughout the shutdown process.

2 org.rsna.ctp

This package contains the CTP main class (org.rsna.ctp.ClinicalTrialProcessor) and the Configuration class (org.rsna.ctp.Configuration) .

3 org.rsna.ctp.objects

This package defines the four object types that CTP recognizes.

Note: To be useful in a clinical trial, a data object must contain a set of identifiers that allow the object to be related to other data objects in the trial.
  • FileObject represents a file of unknown format. It is the parent class of the other object types. It supplies generic file handling methods. Since the FileObject cannot know anything about the contents of the file, it cannot provide the IDs necessary for a clinical trial; therefore, a FileObject by itself is not useful. Its utility is only as the parent class of the other object types.
  • DicomObject represents a DICOM dataset. It contains a great many methods that provide access to the individual elements in the dataset, including the IDs and UIDs defined in the DICOM standard that are important in a clinical trial. It also provides methods for creating browser-viewable images from the pixels stored in the dataset.
  • XmlObject represents an XML file. It provides generic methods for accessing the important identifiers in a clinical trial. Since the schemas of XML data objects used in clinical trials vary, these methods are designed to look in a sequence of common places to find the IDs. The sequences are described in the Javadocs.
  • ZipObject represents a zip file. In addition to whatever other files it may contain, a ZipObject must contain a manifest.xml file that carries the necessary clinical trial IDs. CTP treats ZipObjects as atomic data objects, although the ZipObject class provides methods for obtaining the individual files it contains.

The DicomObject, XmlObject, and ZipObject classes all contain matches methods that compute a boolean result from a script that interrogates the object. These methods are used in many pipeline stages to determine whether an object is to be processed. See The CTP DICOM Filter and The CTP XML and Zip Filters articles for information on the script languages.

Because DicomObjects are so critical to imaging trials, and because various manufacturers implement DICOM in various ways, the DicomFilter pipeline stage provides a way to list the operand values it sees when computing its boolean result. To pass those values to the DicomFilter, the matches method of the DicomObject returns a MatchResult object that encapsulates the operand values and the boolean result. (The matches methods of the other two object types just return the boolean result.)

The MatchResult class is also contained in this package, along with a SopClass class that is used by DicomObject to detect certain types of SOP Classes.

4 org.rsna.ctp.pipeline

This package contains:

  • the Pipeline class
  • the interface definitions for the four stage types
  • abstract classes to implement stages
  • the QueueManager class used by all the stages that maintain queues of data objects for import or export
  • the typesafe enum Status class for returning results from export operations
  • the Quarantine class for managing files that are removed from a pipe by a stage

The Status servlet displays a web page summarizing the status of all the plugins, pipelines, and stages. It obtains status information by calling the getStatusHTML methods of all the objects in the configuration. Similarly, the Configuration servlet displays the configuration parameters, and obtains that infomation by calling the getConfigHTML methods. The AbstractPipelineStage provides implementations of these methods to simplify the implementation of stages.

5 org.rsna.ctp.plugin

This package contains:

  • the interface definition for a plugin
  • an abstract class to implement plugins

As in the case of the AbstractPipelineStage, the AbstractPlugin class provides implementations of the getStatusHTML and getConfigHTML methods to simplify the implementation of plugins.

6 org.rsna.ctp.servlets

This package contains servlets that are used either for the administration of CTP or for access to data maintained by specific plugins or stages. These are the servlets that are loaded by CTP, along with their contexts:

Servlet Context Module Description
ApplicationServer webstart util Automatically generates jnlp content from query parameters and launches webstart applications.
AuditLogServlet {id attribute} CTP Searches the AuditLog database, providing access to individual entries.
ConfigurationServlet configuration CTP Displays the configuration on a web page.
DBVerifierServlet databaseverifier CTP Provides access to the database maintained by the DatabaseVerifier stage. This stage is used to track whether objects submitted to sites running the NBIA software (for example, The Cancer Imaging Archive) have been successfully received and added to the collection.
DecipherServlet decipher CTP Provides a restful web service to decipher encrypted elements in DICOM objects.
DicomAnonymizerServlet daconfig CTP Provides a web-based editor for DicomAnonymizer scripts.
IDMapServlet idmap CTP Searches the IDMap database, providing translations between PHI and anonymized values for certain IDs and UIDs.
LoggerLevelServlet level util Provides a UI for controlling the log4j logger levels while CTP is running.
LoginServlet login util Provides a UI for logging into the server.
LogServlet logs util Provides a UI for viewing the CTP logs (in CTP/logs).
LookupServlet lookup CTP Provides a web-based editor for DicomAnonymizer lookup tables.
ObjectTrackerServlet objecttracker CTP Searches the ObjectTracker database, providing access to IDs and UIDs organized by patient, study date, study, and series.
QuarantineServlet quarantines CTP Supports management of the quarantines.
ScriptServlet script CTP Provides a web-based editor for filter scripts.
ShutdownServlet shutdown CTP Gracefully shuts down CTP.
StatusServlet status CTP Displays the current status of all the plugins, pipelines, and stages.
SysPropsServlet system util Provides a UI for viewing the Java system properties.
UserManagerServlet users util Provides a web-based editor for creating or modifying user accounts on the server.
UserServlet user util Provides a restful web service for obtaining information about a user.

Each servlet is responsible for enforcing whatever user requirements are necessary for the function it performs. Thus, some servlets are accessable by unauthenticated users while others require that the user be authenticated and possess specific roles.

To honor a shutdown request, the ShutdownServlet imposes these conditions:

  • If the request comes from the service manager on the same computer that is running CTP (as in a Windows service shutdown), it is honored.
  • If the request comes from an authenticated user who has the shutdown role, it is honored.
  • If the request comes from an authenticated user on the same computer that is running CTP, it is honored.

Parenthetically, the UserManagerServlet (in the util module) does not allow a user to be granted the shutdown role unless the user granting the shutdown role possesses the shutdown role.

7 org.rsna.ctp.stdplugins

There are two standard plugins:

  • AuditLog provides a logging mechanism for use in 21CFR11-compliant clinical trials.
  • Redirector runs an HTTP service that redirects an HTTP connection to an HTTPS port.

The AuditLog class maintains a JDBM database of entries that are searchable through the AuditLogServlet. To allow for multiple AuditLog plugins to appear in a single configuration, the AuditLog installs the servlet on a context that is the same as the id attribute in its configuration element

8 org.rsna.ctp.stdstages

This package contains all the standard pipeline stages that are included in the CTP release. These are all described briefly in CTP-The RSNA Clinical Trial Processor. The names of the stages are intended to be descriptive. To fully understand the stages, it is necessary to read the source code. This section will therefore only cover a few key concepts underlying the designs of the stages.

All the standard stages extend one of the abstract classes in the org.rsna.ctp.pipeline package. Those classes provide common functions used by all stages.

8.1 Object Type Selection

Many stages are designed to select which object types to accept for processing. These stages use a common set of configuration file attributes:

  • acceptDicomObjects
  • acceptXmlObjects
  • acceptZipObjects
  • acceptFileObjects

These attributes are obtained and converted to booleans by the AbstractPipelineStage class, which is the root class of all the abstract stage classes. That class also provides an acceptable method to make it easy for a stage to know whether it is configured to proces an object it has received.

Some stages also have the ability to select objects for processing based on filter scripts. The object classes implement matches methods specific to their types, and the stages simply call the methods if they have been configured with script files. These files are unique to the object types, and have a common set of configuration file attributes:

  • dicomScript
  • xmlScript
  • zipScript

For information on the script languages, see The CTP DICOM Filter and The CTP XML and Zip Filters.

8.2 Interfaces for Scripting

Stages that support scripts implement one or more interfaces that identify the stages to the various script editor servlets:

  • org.rsna.ctp.stdstages.Scriptable identifies a stage as having a script file that can be edited by the ScriptServlet.
  • org.rsna.ctp.stdstages.ScriptableDicom identifies a stage as having a script file that can be edited by the DicomAnonymizerServlet or a lookup table file that can be edited by the LookupServlet.

8.3 DICOM Storage SCP and SCU

The DicomImportService and DicomExportService stages use these SCU and SCP classes:

  • stdstages.dicom.DicomStorageSCU
  • stdstages.dicom.DicomStorageSCP

These classes are designed to be generally useful, but they contain CTP-specific features, so they would have to be modified for use in other programs.

8.4 Whitelists and Blacklists

Certain ImportService stages have the ability to accept or reject connections based on information in the connection itself. This capability is implemented in two classes:

  • org.rsna.ctp.stdstages.WhiteList
  • org.rsna.ctp.stdstages.BlackList

Depending on the specific stage, these classes are used to filter on IP address, called AE title, and calling AE Title.

8.5 Queue Management

ImportServices and ExportServices generally queue objects so that they can run asynchronously. (The exception is the PollingHttpImportService.) The queues are implemented by the org.rsna.ctp.pipeline.QueueManager class. This class stores queued files in a directory tree that is designed to prevent any single directory in the hierarchy from growing large enough that the operating system will slow down when accessing the files. Generally, it is best to keep that number smaller than 1000. There are two parameters that control the hierarchy:

  • nLevels sets the number of levels in the hierarchy
  • maxSize sets the maximum number of files in a single directory

All the standard stages use the default parameters, which provide good performance for up to tens of millions of files.

The BasicFileStorageService uses a similar mechanism, although it does not use the QueueManager class to manage it. The BasicFileStorageService provides attributes to allow the parameters to be set in the configuration.

8.6 Anonymizer Implementations

There are several standard anonymizer stages for the de-identification of specific object types:

  • DicomAnonymizer de-identifies the non-image part of a DicomObject.
  • DicomPixelAnonymizer blanks out regions of the pixels in DicomObjects containing burned-in PHI in the pixels.
  • XmlAnonymizer de-identifies the XmlObjects.
  • ZipAnonymizer de-identifies the manifests of ZipObjects.

Since the DicomPixelAnonymizer requires decompressed DicomObjects, there is a related stage for decompression:

  • DicomDecompressor decompresses a compressed DicomObject, producing an EVRLE DicomObject.

Each stage calls a class in the anonymizer source tree to do the hard work of de-identification for each object type. The classes are:

  • org.rsna.ctp.stdstages.anonymizer.dicom.DICOMAnonymizer
  • org.rsna.ctp.stdstages.anonymizer.xml.XMLAnonymizer
  • org.rsna.ctp.stdstages.anonymizer.zip.ZIPAnonymizer

These classes are designed to produce a de-identified file from an original file. They are designed to be used outside CTP as well.

8.7 Databases

Many stages build local databases to capture information from data objects they process. All the databases are built on the Apache JDBM module, which provides a light-weight, disk-based implementation of B-trees and H-trees. The stages use the methods in org.rsna.util.JdbmUtil for the creation and management of the databases.

9 config

The source/config directory in the CTP development tree contains the default configuration file (config.xml). This file is only copied to the disk by the CTP-installer.jar program during a new CTP installation. The file is just an example; it isn't designed to do anything particularly useful.

10 files

The source/files directory in the CTP development tree contains the files that are copied to the disk by the CTP-installer.jar program when CTP is either installed or upgraded. These files are therefore overwritten during an upgrade.

10.1 examples

Certain pipeline stages are driven by scripts. When such a stage is first instantiated, it looks for the script file defined in its configuration element. If the file is not present on the disk, it copies the appropriate file from the examples directory as the starting point.

This directory also contains an example-config.xml file that CTP uses as the starting point if it cannot find the config.xml file when the program starts.

10.2 pages

The pages directory contains XSL files that are used by the StorageServlet to create web pages for viewing studies stored by the FileStorageService stage.

10.3 profiles

The profiles directory contains the DicomAnonymizer script files that implement the DICOM Supplement 142 De-identification These are contained in the dicom subdirectory. These profiles are overwritten by the CTP-installer.

In a running system, the admin user can create and store de-identification profiles through the DicomAnonymizerServlet. Such profiles are saved in the saved subdirectory, which does not exist in the development tree and which is created by the DicomAnonymizerServlet when necessary. Files in this directory are not overwritten by the CTP-installer.

The DicomAnonymizerServlet does not allow the user to overwrite a Supplement 142 profile.

10.4 scripts

The profiles directory contains standard scripts used in various applications. Although these scripts can be edited by the admin user, they are overwritten by the CTP-installer, so it is generally best for the user to create a new script file if it is to bed preserved.

10.5 ROOT

This is the root of the servlet container. It is the default root of the contexts of all servlets. (Because a servlet may move its root in its constructor, however, be careful to look for this possibility when reading a servlet's code. Almost all the TFS servlets move their roots so they lie under the directory defined in the root attribute of the MIRC configuration element.)

Note that the Servlet base class is responsible for file serving, so a servlet serves files from whatever root it is configured to use. See the org.rsna.servlets description in the Util wiki article for more details.

The only file included in the ROOT directory is example-index.html. When the server receives a request for index.html and it cannot find the file, it copies the example-index.html file into place and serves it. The reason for this mechanism is to allow the CTP-installer to overwrite the example-index.html file while preserving any changes the site admin user may have made in the index.html. In practice, nobody in the field seems to use this feature.

10.6 linux

This directory contains the ctpService.sh file that is used to configure Linux to run CTP as a service. See Running CTP as a Linux Service for details.

10.7 windows

This directory contains the files used to install and uninstall CTP as a Windows service:

  • install.bat is the install batch job. This file is constructed by the CTP-installer program using the specific parameters of the installation (location on disk, etc.). This file is overwritten by the installer.
  • uninstall.bat is the uninstall batch job. This file is overwritten by the installer.
  • CTP.exe is the Windows service runner for CTP. It is based on the Apache Commons Daemon. This file is overwritten by the installer.
  • CTPw.exe is the Windows Service Monitor for CTP. It allows the Windows administrator to adjust the memory allocation and many other parameters of the service. This file is overwritten by the installer.

For more information, see Running CTP as a Windows Service.

11 resources

Files in this directory are included in the CTP.jar file. They are not copied to the disk by the CTP-installer. They are loaded from the classpath when they are required. They are XSL, CSS, and Javascript files used by various servlets.