CTP Configuration for Study Distribution

From MircWiki
Revision as of 14:54, 21 November 2008 by Johnperry (Talk | contribs) (New page: This article describes how the CTP application can be configured to work with the National Cancer Imaging Archive (NCIA). The intended audience for this ...)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This article describes how the CTP application can be configured to work with the National Cancer Imaging Archive (NCIA). The intended audience for this article is system administrators at sites running instances of NCIA. There are several articles on CTP on the RSNA MIRC wiki. All would be useful references when reading this article.

1 A Simple Configuration File

The configuration of a CTP instance is specified by an XML file called config.xml located in the same directory as the program. The example configuration shown below assumes that anonymized data objects are transmitted to CTP by an external application using the HTTPS protocol.

<Configuration>

   <Server port="1080" />

   <Pipeline name="Main Pipeline">

        <ImportService 
            name="HTTP Import"
            class="org.rsna.ctp.stdstages.HttpImportService"
            root="roots/http-import" 
            port="7777"
            ssl="yes"
            acceptDicomObjects="yes"
            acceptXmlObjects="yes"
            acceptZipObjects="yes"
            acceptFileObjects="yes"
            quarantine="quarantines/http-import" />

        <StorageService
            name="Storage"
            id="storage"
            class="org.rsna.ctp.stdstages.FileStorageService"
            root="D:/storage" 
            port="80"
            return-stored-file="yes"
            quarantine="quarantines/storage" />

        <ExportService
            name="Database Export"
            class="org.rsna.ctp.stdstages.DatabaseExportService"
            adapter-class="org.rsna.ctp.stdstages.database.DatabaseAdapter"
            file-storage-service-id="storage"
            root="roots/database-export"
            quarantine="quarantines/database-export" />

    </Pipeline>
</Configuration>

1.1 Commentary

This section draws attention to certain attributes in the configuration and explains the reasons for them.

1.1.1 Server

The configuration file above places the CTP admin web server on port 1080. This web server is used by administrators to monitor the status of the system. The choice of 1080 rather than the standard port 80 is made to preserve port 80 for the FileStorageService's web server.

1.1.2 HttpImportService

The configuration file puts the HttpImportService on port 7777 and enables secure sockets layer.

The HttpImportService is configured to accept all four object types. The default values of the accept... attributes are yes, so if all objects types are to be accepted, the attributes are optional. They are included in the configuration above to draw attention to the fact that filtering can be applied on reception of objects, so filtering in subsequent stages is not necessary.

If an object is received which is not accepted, it is quarantined. If it is not desired to keep copies of such objects, the quarantine attribute can be omitted.

1.1.3 Storage Service

The id attribute is critical in this configuration. It provides a way for the DatabaseExportService access the FileStorageService to obtain the URL of an object being exported. The NCIA database requires the URL to provide access to the object as well as a thumbnail image of the object (if the object is an image).

The return-stored-file attribute is also critical in this configuration. It ensures that the object which is passed down the pipeline from the FileStorageService points to the stored file rather than to the file which was originally in the HttpImportService's queue. This is necessary to allow the DatabaseExportService to find the file in the FileStorageService and obtain its URL.

The root attribute points to any location where the FileStorageService can build its directory structure. It is shown in the configuration above as an absolute path to some device, and the device could be on another drive or another computer from the one where the CTP application is stored. The root could also be placed under the CTP directory tree, for example, by specifying it to be roots/storage. In the configuration shown above, it is assumed that the files will be stored outside the CTP tree.

The FileStorageService web server is enabled in this configuration by specifying a port attribute. The value, 80, assumes that this will be the most convenient port for users.

The require-authentication attribute is not included on the assumption that objects in the NCIA database are open to any user who has access to the network. In situations where that is not true, the attribute can be included with the value yes.

1.1.4 DatabaseExportService

The adapter-class attribute specifies the fully qualified name of the Java class which implements the NCIA DatabaseAdapter.

The file-storage-service-id attribute is critical. It tells the DatabaseExportService which FileStorageService to query for the URL of the object being exported.

Note that when an object is received by an ExportService for export, a copy of the object, not a reference to it, is entered into the ExportService's queue. Thus, the object in the queue is independent of the one which was received, and subsequent stages, if any, can modify the object being passed down the pipeline without affecting the object being exported. This is important when removing provenance information as explained below.

2 A Slightly More Complex Configuration File

The example configuration shown below assumes that the anonymized data objects received by the HttpImportService contain provenance information which must be removed from the stored objects after the information has been passed to the NCIA database.

<Configuration>

   <Server port="1080" />

   <Pipeline name="Main Pipeline">

        <ImportService 
            name="HTTP Import"
            class="org.rsna.ctp.stdstages.HttpImportService"
            root="roots/http-import" 
            port="7777"
            ssl="yes"
            acceptDicomObjects="yes"
            acceptXmlObjects="yes"
            acceptZipObjects="yes"
            acceptFileObjects="yes"
            quarantine="quarantines/http-import" />

        <StorageService
            name="Storage"
            id="storage"
            class="org.rsna.ctp.stdstages.FileStorageService"
            root="D:/storage" 
            type="day"
            port="80"
            return-stored-file="yes"
            quarantine="quarantines/storage" />

        <ExportService
            name="Database Export"
            class="org.rsna.ctp.stdstages.DatabaseExportService"
            adapter-class="org.rsna.ctp.stdstages.database.DatabaseAdapter"
            file-storage-service-id="storage"
            root="roots/database-export"
            quarantine="quarantines/database-export" />

        <Processor
            name="DICOM Provenance Remover"
            class="org.rsna.ctp.stdstages.DicomAnonymizer"
            root="roots/provenance-remover" 
            script="roots/provenance-remover/dicom-anonymizer.script
            quarantine="quarantines/provenance-remover" />

        <Processor
            name="XML Provenance Remover"
            class="org.rsna.ctp.stdstages.XmlAnonymizer"
            root="roots/provenance-remover" 
            script="roots/provenance-remover/xml-anonymizer.script
            quarantine="quarantines/provenance-remover" />

    </Pipeline>
</Configuration>

2.1 Commentary

The commentary in the previous section applies to this configuration as well, but it is important to note several points about the Anonymizer stages.

2.2 FileStorageService

It is important that the FileStorageService's return-stored-file attribute be set to yes in order that the object received by the Anonymizer stages point to the stored file. This allows the Anonymizer stages to anonymize the object in place in the FileStorageService. If the FileStorageService did not return the stored file, then an Anonymizer would anonymize the object in the HttpImportService queue, leaving the non-anonymized version in the FileStorageService.

Note that the type attribute is set to day. This stores each day's received studies in a separate directory under a year directory. This is an appropriate setting for all uses, but it is especially important when many studies are to be inserted at once. By using the day setting, the next higher directory will not grow too large for efficient searching.

2.2.1 Anonymizer

CTP actually contains three types of anonymizers (DICOM, XML, and Zip), each configured as a separate Processor stage with its own script file.

Note that the Anonymizer appears after the DatabaseExportService. This ensures that the database gets a copy of the object containing the provenance information.

When CTP starts, it looks for the script files in the specified locations, and if they are not present, example files are copied into place. The CTP admin web server includes a special servlet for editing DICOM anonymizer scripts. It also includes a servlet for editing all other script types, including those for the ZML and Zip anonymizers as well as the DICOM, XML, and Zip filters). See The MIRC XML Anonymizer for information on configuring the anonymizers for XmlObjects and the manifests in ZipObjects. The DICOM anonymizer is described in The CTP DICOM Anonymizer.

3 A Simple Configuration File for Development Testing

The example configuration shown below would be a good starting point for testing the development of an NCIA DatabaseAdapter. It uses HTTP to receive data objects so it supports all data object types. It does not use SSL for communication. To pass objects to the HttpImportService, one could use a tool like the MIRC FileSender application.

<Configuration>

   <Server port="1080" />

   <Pipeline name="Main Pipeline">

        <ImportService 
            name="HTTP Import"
            class="org.rsna.ctp.stdstages.HttpImportService"
            root="roots/http-import" 
            port="7777"
            acceptDicomObjects="yes"
            acceptXmlObjects="yes"
            acceptZipObjects="yes"
            acceptFileObjects="yes"
            quarantine="quarantines/http-import" />

        <StorageService
            name="Storage"
            id="storage"
            class="org.rsna.ctp.stdstages.FileStorageService"
            root="D:/storage" 
            port="80"
            return-stored-file="yes"
            quarantine="quarantines/storage" />

        <ExportService
            name="Database Export"
            class="org.rsna.ctp.stdstages.DatabaseExportService"
            adapter-class="org.rsna.ctp.stdstages.database.DatabaseAdapter"
            file-storage-service-id="storage"
            root="roots/database-export"
            quarantine="quarantines/database-export" />

    </Pipeline>
</Configuration>

When testing, one would use FileSender to send data files to port 7777 with the HTTP protocol (not SSL). After sending some files, the CTP log could be accessed with a browser on port 1080, and the stored objects could be accessed on port 80. Note that the FileStorageService is configured to have a flat structure, so each study will appear in its own child directory of the D:/storage/__default directory.

4 Advanced Configuration

This section briefly notes two other points which may be of interest in special cases.

4.1 Importing from a File Tree

CTP is designed to be extended. One possible extension, which might be useful in capturing the contents of a large archive for the NCIA database, would be an ImportService which would walk a directory tree and supply objects to its pipeline. See the CTP Javadocs for org.rsna.ctp.pipeline.ImportService and org.rsna.ctp.pipeline.AbstractPipelineStage for more information.

4.2 Preprocessing

There are three other pipeline stages which might be of use in filtering out data objects which are inappropriate for inclusion in the database. They are:

  • org.rsna.ctp.stdstages.DicomFilter
  • org.rsna.ctp.stdstages.XmlFilter
  • org.rsna.ctp.stdstages.ZipFilter

These stages are described in the main article on CTP.

If preprocessing is desired which goes beyond the capabilities of the filters' script languages, it is possible to create a special-purpose preprocessing stage. See the CTP Javadocs for org.rsna.ctp.pipeline.Processor and org.rsna.ctp.pipeline.AbstractPipelineStage for more information.