The CTP DICOM Anonymizer

From MircWiki
Jump to navigation Jump to search

This article describes how to configure the DICOM anonymizer used in the ClinicalTrialProcessor (CTP) application. The intended audience for this information is clinical trial coordinators at principal investigator sites.

Important note: The CTP DICOM anonymizer is different from the anonymizer that is included in the Tomcat/MIRC site and FieldCenter applications. Specifically:

  • All the functions which use remapping tables have been replaced with ones that use hashing.
  • Some functions have been removed and replaced with others that are faster.

For information about how to include the DICOM Anonymizer in a clinical trial pipeline, see CTP-The RSNA Clinical Trial Processor.

For information about how to use the CTP Pixel Anonymizer, see The CTP DICOM Pixel Anonymizer.


1 Accessing the CTP Anonymizer Configurator

The ClinicalTrialProcessor application includes a webserver which is normally configured to listen on the standard port used by most servers (80). Accessing the server with a browser displays a home page containing buttons which link to servlets providing status and configuration information. The DICOM Anonymizer Configurator button displays a page listing all the anonymizers which are currently configured in the application, with their pipeline and stage names and a link pointing to the anonymizer script file. Clicking the link to a script file displays a page containing a table of all the DICOM elements, with Select checkboxes and Replacement text fields. At the bottom of the page is a button which saves any changes that have been made on the page.

2 Modifying DICOM Elements

The anonymizer has a simple scripting language. Each DICOM element can have its own replacement script containing contents and instructions for what to do with the element when it is processed.

To cause the anonymizer to take direct action on an element when a DICOM object is received, place a check in the Select checkbox for the element. Elements that are unchecked are left intact unless they qualify for global action as described later.

To replace the contents of an element with new static text, enter the text in the Replacement text field for the element.

To remove an element from the DICOM object, use the remove( ) function described below.

To insert an empty element or replace the contents of an element with an empty (zero-length) string, use the empty( ) function described below.

Leading and trailing blanks in all Replacement fields are removed before processing.

2.1 Element Names

Some of the functions described below have arguments that specify the name of a DICOM element. Element names can be specified in several ways:

  • An element can be specified by using its DICOM keywork, or more precisely, the derivative of the keyword that is used by the DICOM library used by CTP (dcm4che). The dcm4che keywords are listed in the tag dictionary on the RSNA MIRC site. Element names are also shown on the DICOM Anonymizer Configurator page next to their numeric tags.
  • An element can also be specified by its numeric tag. Tags can be specified in the following forms:

  • ggggeeee
  • (gggg,eeee)
  • [ggggeeee]
  • [gggg,eeee]

where g and e are hexadecimal values.

  • For elements in private groups, a special syntax is provided:

  • gggg[BlockID]ee
  • gggg00[BlockID]

where BlockID is the value claimed in the Private Creator Data Element of the block. BlockID values are not case-sensitive.

If a script defines a name for an element in a private group, that name can be used in scripts wherever an ElementName is required. For additional information, see The CTP DICOM Anonymizer Configurator.

Wherever an ElementName is required, the keyword this may be used to indicate the element whose replacement value is being constructed.

It is possible to reference elements in item datasets of sequence elements (elements with VR=SQ) by listing a series of element names separated by double-colons (::). Each element in the list except the last one must be an SQ element. An element's first item dataset is used to obtain the next element in the series.

It is possible to reference an element in the root dataset by prefixing its specifier with "root:", e.g. root:PatientID or root:[0010,0010]. This may be of use when processing item datasets of sequence elements.

This is an example that references an element in the first item dataset of element (0008,1140):

  • [0008,1140]::(0008,1150)

Note that both the square bracket and parenthesis notation is used in the example above. Both are equivalent. Note also that keywords could also have been used for either or both of the element names:

  • RefImageSeq::RefSOPClassUID

This is an example that references an element buried two levels down in a private group:

  • (0029[XYZ CT HEADER]40)::(0017[ALIGNMENT HEADER]42)

In the above example, group 29 exists in the root dataset of the object. In that group, element (0029,0011) contains the text, XYZ CT HEADER, thus reserving the block of elements from (0029,1100) through (0029,11FF). In that block, there is an SQ element (0029,1140). This is the element referenced by (0029[XYZ CT HEADER]40). The first item dataset of that element contains private group 17, and in that group, there is an element (0017,0010) containing the text, ALIGNMENT HEADER, which reserves the block of elements from (0017,1000) through (0017,10FF). In that block, there is an element (0017,1042). This is the element referenced by (0017[ALIGNMENT HEADER]42).

2.2 Functions

The anonymizer provides several functions that can be used to modify elements. Functions are invoked by a leading @, followed by the name of the function, followed by the arguments (if any) in parentheses. Function calls can be embedded in static text in the Replacement text field. Multiple function calls can appear in one element.

To allow @ characters to appear as static text, the anonymizer recognizes the \ escape character, which forces the next character to be taken literally. To insert a \ character, it is necessary to escape it, e.g. \\.

When parsing function arguments, the anonymizer also recognizes the \ escape character. The comma, parenthesis, and bracket characters must be escaped if they are part of an argument.

2.2.1 @always()

The always function forces the anonymizer to execute a script even if the target element is not present in the DicomObject, creating the target element if necessary. Unless this function is the first instruction in the script for an element, the script is not executed when the element is not present. The remainder of the script is executed like any other script, and it can contain function calls, text, parameter references, etc. For example, if it is desired to insert the current date in an element, creating the element if necessary, the function can be used as follows:

@always()@date()

2.2.2 @append(){script}

The append function adds the value of a script to a multi-valued element. This function is provided to allow an anonymizer to update the DeIdentificationMethod element (0012,0063) with a string describing the anonymization that was done. The script contained in braces is executed like any other script, and it can contain function calls, text, parameter references, etc. For example, an anonymizer script which removes provenance information received from a remote site might use this script for the DeIdentificationMethod element:

@append(){CTP: provenance data removed: @date() - @time()}

The append can be used to append multiple values by separating the values by double-backslashes:

@append(){value1\\value2\\value3}

Note that the DICOM standard accords each value the full length allowed by the Value Representation of the element to which the values are being appended.

2.2.3 @call(id, args)

The call function provides access to anonymizer extensions contained in CTP plugins. The first argument must be the id attribute value of the referenced plugin. The remaining arguments are specified by the plugin. All the arguments are passed to the plugin.

2.2.4 @blank(n)

The blank function returns a string of blanks of length n. This function is provided to allow a fixed-length field to be blanked. The function call @blank(0) is equivalent to @empty().

2.2.5 @contents(ElementName)

This contents function returns the contents of the DICOM element named by the argument.

2.2.6 @contents(ElementName,"regex")

This contents function returns the contents of the DICOM element named by the argument, after removing all the characters selected by the regular expression. If you are not familiar with regular expressions, get an experienced programmer to help you. The effect of the operation is the same as the Java statement:

String.replaceAll("regex","");

2.2.7 @contents(ElementName,"regex","replacement")

This contents function returns the contents of the DICOM element named by the argument, after replacing all the characters selected by the regular expression with the characters contained in the replacement string. If you are not familiar with regular expressions, get an experienced programmer to help you. The effect of the operation is the same as the Java statement:

String.replaceAll("regex","replacement");

2.2.8 @date(separator)

The date function returns the current date in the format YYYY-MM-DD where the “-“ character is replaced by the separator string. The value corresponds to the local date at the instant the anonymizer calls the function. To generate a DICOM-compliant date, use an empty separator string, e.g @date().

2.2.9 @dateinterval(DateElementName,KeyType,KeyElementName)

The dateinterval function with three arguments returns the number of days between a date in an element specified by DateElementName and a date stored in the anonymizer's lookup table under the KeyType specified by the second argument and the key stored in the element specified by the value of KeyElementName. The KeyElement is always obtained from the root dataset, even when processing elements in the item datasets of SQ elements.

The format of dates stored in the lookup table must be M/D/YYYY. See Using_the_DicomAnonymizer_dateinterval_Function for more information.

2.2.10 @dateinterval(DateElementName,KeyType,KeyElementName,origindate)

The dateinterval function with four arguments first computes the same interval as in the version with three arguments. It then adds the computed number of days to the date specified in the fourth argument and returns the resulting date. The date in the origindate argument must be in DICOM format (YYYYMMDD). The function returns dates in DICOM format.

2.2.11 @dateinterval(DateElementName,KeyType,KeyElementName,@ParameterName)

This version of the four-argument dateinterval function accepts the fourth argument (origindate) as a parameter reference.

2.2.12 @empty( )

The empty function returns a zero-length string. This function is provided to allow differentiation between a blank Replacement text field, which causes deletion of the element from the DICOM object, and an empty element.

2.2.13 @encrypt(ElementName,"key")

This encrypt function returns the contents of the DICOM element named by the argument, encrypting the value with the specified key. The key is a single-word string of any length.

2.2.14 @encrypt(ElementName,@ParameterName)

This encrypt function returns the contents of the DICOM element named by the argument, encrypting the value using the value of the specified parameter as the key.

2.2.15 @hash(ElementName)

The hash function computes the MD5 hash of an element's value and returns it as a base-10 digit string.

2.2.16 @hash(ElementName,maxCharsOutput)

This version of the hash function computes the MD5 hash of an element's value and returns it as a base-10 digit string of the specified maximum length.

2.2.17 @hashdate(ElementName,HashElementName)

The hashdate function adds a negative offset to a date. The offset is calculated by hashing the value of the element specified in the HashElementName argument, converting the hash value to an integer, computing that value modulus 3650, and negating it. The result is then used as the increment of the ElementName element in the incrementdate function.

To increment dates differently for different patients while preserving the longitudinal relationships among that patient's studies, use this script:

@hashdate(this,PatientID)

To increment dates differently for every study, use this script:

@hashdate(this,StudyInstanceUID)

To increment dates differently for every object, use this script:

@hashdate(this,SOPInstanceUID)

2.2.18 @hashname(ElementName,maxCharsOutput)

The hashname function returns a numeric string of the specified length by computing the secure hash of the identified element's text. The algorithm is:

  1. combine all the words into one string;
  2. remove whitespace, apostrophes, and periods;
  3. convert to uppercase;
  4. compute the secure hash of the resulting string;
  5. convert the binary result to a base-10 string;
  6. return maxCharsOutput characters from the low-order end of the string.

2.2.19 @hashname(ElementName,maxCharsOutput,maxWordsInput)

This version of the hashname function operates like @hashname(ElementName,maxCharsOutput) except that it only accepts the first maxWordsInput words in the input element. This is the preferred method for producing hashed patient names because it can be used to suppress middle names, which may be absent, present as a full name, or present as an initial. In this case, a good approach would be @hashname(PatientName,6,2).

2.2.20 @hashptid(siteID,ElementName)

The hashptid function is designed to re-identify patients, replacing their clinical PatientID field with a trial PatientID field that is generated from the old value. When the hashptid function is called, the anonymizer obtains the contents of the element identified by ElementName (typically PatientID), computes the MD5 hash of the value, and converts it to a base-10 digit string.

The hashptid function recognizes a parameter reference for the siteID, and is typically coded as:

@hashptid(@SITEID,this)

2.2.21 @hashptid(siteID,ElementName,maxCharsOutput)

This version of the hashptid function operates like @hashptid(siteID,ElementName) except that it limits the length of the output string to the specified value.

2.2.22 @hashuid(root,ElementName)

The hashuid function is designed to create replacement UIDs from existing ones. The root argument is a text string containing the UID root for the institution (for example, 1.2.840.4267.32.). The hashuid function creates a new UID by computing the MD5 hash of the existing UID, converting it to a base-10 digit string and prepending the root. If the root does not end in a period, the anonymizer appends a period.

The hashuid function recognizes a parameter reference in the root argument, and is typically coded as:

@hashuid(@UIDROOT,this)

2.2.23 @hashuid(root,ElementName,ElementName2)

This is a specialized version of the hashuid function. Before computing the hash, it first obtains the anonymized value of ElementName2. It then appends that to the unanonymized value of ElementName and computes the hash. This version of the function is designed to allow a dataset to be used for multiple trials using a script for the SOPInstanceUID and StudyInstanceUID like:

@hashuid(@UIDROOT,this,PatientID)

where the PatientID script involves an @lookup function call and the lookup table is changed for each trial.

2.2.24 @incrementdate(ElementName,incInDays)

The incrementdate function adds a constant offset to a date. The offset is specified in days in the incInDays argument. The offset can be positive or negative, with positive increments generating later dates.

The incrementdate function recognizes a parameter reference in the incInDays argument, and is typically coded as:

@incrementdate(this,@DATEINC)

2.2.25 @initials(ElementName)

The initials function returns a string of uppercase characters constructed from the contents of the named element by taking the first letter of each field in the element and then placing the first character last in the string. The purpose of this function is to generate the patient’s initials from the contents of a PatientName element which is encoded as Last^First^Middle. In this example, the @initials(PatientName) function call would return FML.

2.2.26 @initials(ElementName, offset)

This initials function obtains the initials and then encrypts them using a Caesar cipher with the specified offset. A positive offset cycles in alphabetic order; a negative offset cycles in reverse alphabetic order. Characters are encrypted in three separate groups: upper case, lower case, and numeric. Offset characters remain in their groups. No other characters (punctuation, whitespace, etc.) are offset.

2.2.27 @integer(ElementName,KeyType,width)

The integer function returns a numeric string associated with the text of the named element. The purpose of this function is to generate a replacement string for an identifier. Replacement strings start at 1 and increment for each new value of the named element. The minimum width of the replacement string is specified by the width argument. If the numeric value required for a specific text is shorter than the width, the replacement string is padded with leading 0 characters. The KeyType argument provides separate streams of integers for different types of text. For example, the @integer(this,ptid,3) function call for the PatientID element would generate a sequence 001, 002, 003, etc. as different values of the PatientID element were encountered. If there were a @integer(this,ptname,3) function call for the PatientName element in the same script, it would independently generate the same sequence as different patient names were encountered. If the width argument is missing or zero or negative, no padding of the resulting integer string is done.

2.2.28 @keep( )

The keep function forces the element to be preserved in the DICOM object. This function is provided to make it easy to preserve elements that would otherwise be removed by a global action. This function is equivalent to @contents(this), but the keep function is preferred because it is less costly and it handles sequence elements that the contents function does not.

2.2.29 @lookup(ElementName,KeyType)

The lookup function maps values through a local lookup table. It is intended to be used for mapping values that are known to the local site. For instance, it can be used to map patient ID values to case numbers by preloading the lookup table with values matching each patient ID with the corresponding case number.

To allow for mapping multiple types of values in one anonymization step, the KeyType argument identifies the category. Its value is any text string that does not contain a colon or equals sign. It is best to use a single descriptive word or abbreviation.

The lookup table is a properties file. The format of the lookup table file is:

KeyType/value = replacement value

For example, if you are remapping patient IDs to case numbers, you might have a lookup table file that looks like:

ptid/22 = 400
ptid/23 = 401
ptid/24 = 402
ptid/25 = 403
ptid/26 = 404
ptid/27 = 405

If the Replacement field for the PatientID element is coded as @lookup(this,ptid) then a PatientID element with the value 25 will be mapped to the value 403.

If no value exists in the lookup table for an element, the anonymizer quarantines the object being anonymized.

2.2.30 @lookup(ElementName,KeyType,action)

This version of the lookup function provides the option to take other actions if the lookup table does not contain an entry for the value of the specified element.

  • If the the lookup attempt fails and the action argument has the value remove, the element being modified is removed from the anonymized object.
  • If the the lookup attempt fails and the action argument has the value keep, the element being modified is left unmodified in the anonymized object.
  • If the the lookup attempt fails and the action argument has the value empty, the element being modified is replaced with an empty string.
  • If the the lookup attempt fails and the action argument has the value skip, the anonymization of the entire object is aborted and the unmodified object is passed to the next stage.
  • If the lookup attempt fails and the action argument has the value default and there is a fourth argument, the element being modified is replaced by the fourth argument.
  • If the lookup attempt fails and the action argument has the value ignore and there is a fourth argument and the value of the first argument matches the regular expression specified by the fourth argument, the element being modified is replaced by the value of the first argument. The regular expression must be enclosed in double-quotes.
  • If the lookup attempt fails and the action argument has any other value, the object is quarantined.

2.2.31 @lowercase(ElementName)

This lowercase function returns the contents of the specified DICOM element, converted to lower case. If the element is missing or empty, it returns the empty string.

2.2.32 @modifydate(ElementName,year,month,day)

The modifydate function modifies the individual fields in a date. The year, month, and day parameters replace the corresponding values in the date. If a parameter is an asterisk, the corresponding value in the original date is preserved. The modifydate function recognizes parameter references in the arguments.

For example, if the StudyDate element is coded as @modifydate(this,*,1,1), the StudyDate will be reset to the first of January, leaving the year unmodified.

2.2.33 @param(@ParameterName)

The param function returns the contents of the named parameter. Parameters are stored in the script file and can be accessed by name, allowing their contents to be defined once and used many times in various elements. These parameter names are predefined:

  • TRIAL
  • SPONSOR
  • SITEID
  • SITENAME
  • PREFIX
  • SUFFIX
  • UIDROOT
  • DATEINC
  • KEY

Other parameter names can be added manually using the DICOM Anonymizer Configurator.

2.2.34 @pathelement(ElementName,index)

The pathelement function returns the specified path element in a string of path elements separated by the '/' character. If index is zero or positive, it counts from the root of the path. Thus, 0 refers to the root element, 1 refers to the second, etc. If index is negative, it counts backward from the end of the path. Thus, –1 refers to the last path element, -2 refers to the penultimate, etc. If the indexed path element does not exist, the entire path is returned.

2.2.35 @process( )

The process function forces the anonymization of each of the item datasets in a sequence (SQ) element. This function is the equivalent of @keep() for all other element types.

2.2.36 @remove( )

The remove function forces the element to be removed from the DICOM object. It is equivalent to a blank Replacement field, but it is preferred because it is more visually apparent on the Anonymizer Configurator page.

2.2.37 @require( )

This require function creates an empty element if the current element does not exist in the object.

2.2.38 @require(ElementName)

This require function creates an element if the current element does not exist in the object. The current element’s contents are set to the contents of the named element. If the named element does not exist in the object, the created element is empty.

2.2.39 @require(ElementName,"default value")

This require function creates an element if the current element does not exist in the object. The current element’s contents are set to the contents of the named element. If the named element does not exist in the object, the created element’s contents are set to the default value.

2.2.40 @round(ElementName,groupsize)

The round function is intended for use on patient age elements to allow them to be binned into groups of groupsize size. The center of the first group is always at zero. Therefore, if the PatientAge element contains 57, the function call @round(PatientAge,10) returns 60.

2.2.41 @time(separator)

The time function returns the current 24-hour time in the format HH:MM:SS where the “:” character is replaced by the separator string. The time corresponds to the local time at the instant the anonymizer calls the function. To generate a DICOM-compliant date, use an empty separator string, e.g. @time().

2.2.42 @truncate(ElementName,n)

The truncate function returns a substring of the contents of the DICOM element named by the argument. If n is positive, it returns the first n characters. If n is negative, it returns the last n characters. If n is larger than the length of the string, it returns the whole string. (Note that if n is zero, it returns an empty string.)

2.2.43 @uppercase(ElementName)

This uppercase function returns the contents of the specified DICOM element, converted to upper case. If the element is missing or empty, it returns the empty string.

2.2.44 @value(ElementName)

This value function returns the contents of the specified DICOM element. If the element is missing, the empty string is returned. This function is equvalent to @contents(ElementName).

2.2.45 @value(ElementName,"default value")

This value function returns the contents of the specified DICOM element. If the element is missing or empty, it returns the specified default value.

2.3 Global Actions

The anonymizer supports global commands that either keep or remove entire groups or classes of groups. The format of these commands is described in the advanced sectioin below.

2.3.1 Keep group 18

Checking the "Keep group 18" box causes the anonymizer to preserve all group 18 elements. This selection overrides the "Remove unchecked elements" selection. Actions specified for checked group 18 elements take precedence over all global actions.

2.3.2 Keep group 20

Checking the "Keep group 20" box causes the anonymizer to preserve all group 20 elements. This selection overrides the “Remove unchecked elements” selection. Actions specified for checked group 20 elements take precedence over all global actions.

2.3.3 Keep group 28

Checking the "Keep group 28" box causes the anonymizer to preserve all group 28 elements. This selection overrides the “Remove unchecked elements” selection. Actions specified for checked group 28 elements take precedence over all global actions.

2.3.4 Keep safe private elements

Checking the "Keep safe private elements" box causes the anonymizer to preserve all private elements known not to contain PHI. It also preserves all Private Creator Elements. This selection overrides the “Remove private groups” selection. The index of safe private elements is maintained by Mallinckrodt Institute of Radiology on berhaslf of the National Cancer Institute. See The DICOM Anonymizer Keep Safe Private Elements Feature.

2.3.5 Remove private groups

Checking the "Remove private groups" box causes the anonymizer to remove all elements in odd-numbered groups. These are private groups whose contents are not specified by the DICOM standard. Because these groups often contain PHI, they are usually removed when fully de-identifying a DICOM object. If the box is not checked, elements in private groups are kept.

2.3.6 Remove unchecked elements

Checking the "“Remove unchecked elements" box causes the anonymizer to remove all elements that have not been selected in the table for special handling. There are several exceptions to this action, however, where unselected elements are still preserved by default, even when removing unspecified elements:

  1. The SOP Class UID
  2. The SOP Instance UID
  3. The Study Instance UID
  4. Group 28 (the parameters describing the pixels)
  5. Groups 60xx (overlays)

To remove the first three elements requires specific action in their scripts. Generally, those elements are re-identified using the hashuid function or simply preserved without modification

2.3.7 Remove curves

Checking the "Remove curves" box causes the anonymizer to remove all elements in 50xx groups. These are groups which contain curve data.

2.3.8 Remove overlays

Checking the "Remove overlays" box causes the anonymizer to remove all elements in 60xx groups. These are overlays and are sometimes removed when fully de-identifying an object because they can contain PHI as annotations. The notation “not recommended” is simply to discourage an administrator from removing these groups unless he knows exactly what he is doing.

2.4 Conditional Functions

The anonymizer has a limited conditional capability designed to allow it to perform different actions depending on the content of an element. The form of the conditional statement is:

     @if(ElementName, condition, x) {true clause} {false clause}

where the third argument, x, is used only if the condition requires it. The third argument can be a quoted string or a parameter reference (@NAME).

Both clauses are required in the statement or the anonymizer will ignore any commands that appear in the replacement script after the true clause. Whitespace within the arguments or between the clauses is ignored.

Multiple if statements are allowed in one Replacement field, but nested if statements are not supported. Function calls are allowed within the conditional clauses.

2.4.1 @if(ElementName,exists)

The exists conditional statement executes the true clause if the named element exists in the object, no matter what its value; otherwise, it executes the false clause.

2.4.2 @if(ElementName,isblank)

The isblank conditional statement executes the true clause if the named element is missing from the object or appears with a zero length or with a non-zero length and contains only blank characters; otherwise, it executes the false clause.

2.4.3 @if(ElementName,equals,"string")

The equals conditional statement executes the true clause if the value of the named element exactly equals the specified string; otherwise, it executes the false clause. The test is not case-sensitive.

2.4.4 @if(ElementName,contains,"string")

The contains conditional statement executes the true clause if the value of the named element contains the specified string; otherwise, it executes the false clause. The test is not case-sensitive.

2.4.5 @if(ElementName,matches,"regex")

The matches conditional statement executes the true clause if the contents of the named element match the regular expression; otherwise, it executes the false clause. If you are not familiar with regular expressions and you need to use this function, get an experienced programmer to help you. This function can be used to execute very complex tests on the contents of an element.

2.4.6 @if(ElementName,greaterthan,value)

The greaterthan conditional statement executes the true clause if the contents of the named element are numerically greater than the third argument; otherwise, it executes the false clause. Comparisons are done by filtering all non-numeric characters in both the element and the comparison value, converting the results to integers, and comparing the integers.

2.4.7 @quarantine( )

The quarantine function causes the anonymizer to abort the anonymization process and place the unmodified object in the quarantine for manual processing. The quarantine function must appear in a conditional clause of an if statement, but this is not enforced programmatically. If it were to appear in script that is executed during every anonymization, it would force the quarantining of every object.

2.4.8 @select( )

The select function is a conditional statement that provides two clauses. The first is executed if the dataset being processed is the top-level (root) dataset of the object. The second is executed if the dataset being processed is an item of a sequence element. The form of the select statement is:

     @select() {root clause} {item clause}

Both clauses are required in the statement. Whitespace between the clauses is ignored. Conditional statements are not allowed in either clause. Function calls are allowed within the clauses.

2.4.9 @skip( )

The skip function causes the anonymizer to abort the anonymization process and to allow the unmodified object to continue through the system. It is intended to be used when it is possible to detect that an object has already been anonymized, thus preventing it from being anonymized a second time. The skip function must appear in a conditional clause of an if statement, but this is not enforced programatically. If it were to appear in a script that is executed during every anonymization, it would allow PHI through the process.

2.5 The DeIdentificationMethodCodeSeq Element

The DeIdentificationMethodCodeSeq element (0012,0064) is processed specially. If a script for the element is provided, it is always processed, even if the element is not present in the DicomObject. The @always() function is not allowed in the script for this element.

The script for the element is required to be one or more de-identification codes as defined in the De-Identification Method table (CID 7050) of DICOM PS3.16. If multiple methods are to be recorded, their code values are separated by single slash characters ( / ). Whitespace in the script for this element is allowed for readability, but it is ignored during processing.

When this script is processed, the DeIdentificationMethodCodeSeq element is created if necessary, and one item dataset is added to the element for each code in the script. Each item dataset includes the CodingSchemeDesignator (defined to be "DCM"), the CodeValue, and the CodeMeaning. The CodeMeaning is provided automatically from the table in the DICOM standard:

113100 Basic Application Confidentiality Profile
113101 Clean Pixel Data Option
113102 Clean Recognizable Visual Features Option
113103 Clean Graphics Option
113104 Clean Structured Content Option
113105 Clean Descriptors Option
113106 Retain Longitudinal With Full Dates Option
113107 Retain Longitudinal With Modified Dates Option
113108 Retain Patient Characteristics Option
113109 Retain Device Identity Option
113110 Retain UIDs
113111 Retain Safe Private Option

For example, if an anonymization script file implements the Basic Application Confidentiality Profile and the Retain Device Identity Option, the script for this element should be:

113100 / 113109

There is a special code, RESET. When it appears as the first component of the script, all previous codes are removed from the element before processing the rest of the components in the script. In the example above, to remove all previous codes and then insert the two new codes, the script should be:

RESET / 113100 / 113109

2.6 Examples

2.6.1 Patient and Trial Identifiers

The best approach for a multi-center trial is to use the hashptid function:

@hashptid(@SITEID,PatientID)

This generates a fairly long identifier, however. An alternative is to use the integer function:

@integer(PatientID,ptid,4)

This can be combined with other text like this:

C-@integer(PatientID,ptid,4)

The result would be values like: C-0001, C-0002, C-0003, etc.

Many other methods have been used for the automatic generation of replacements for IDs and names. For an ACCORD trial, the PatientName element must contain the case number followed by a delimiter character (“^”) and the field center identifier. If the case number is stored by the modality operator in the PatientComments element and the field center identifier is CWR, the Replacement text field for the PatientName element would read:

@contents(PatientComments)^CWR

For an ACCORD trial, the OtherPatientIds element must contain the word ACCORD. The Replacement field for the OtherPatientIds element would then read:

     ACCORD

For the WHIMS trial, the PatientName element must contain the patient’s initials followed by a dash, the name of the trial, another dash, and the site’s identifier, which is configured in the SITEID parameter. The Replacement field for the PatientName element would then read:

     @initials(PatientName)-WHIMS-@param(@SITEID)

2.6.2 UID Remapping

To generate new UIDs for the StudyInstanceUID using the UID root 1.2.840.123.321, the Replacement field for the StudyInstanceUID element would then read:

@hashuid(1.2.840.123.321.,StudyInstanceUID)

If one were remapping UIDs as in the function call above, it would be more efficient to define the UIDROOT parameter to have the value “1.2.840.123.321.” and code the calls as:

@hashuid(@UIDROOT,StudyInstanceUID)

If all UID replacements are generated in this way, it ensures that all UIDs are mapped to the same root. If the root does not end in a period, the anonymizer appends a period, but it is good form to supply it.

2.6.3 Keeping and Removing Elements

If the Remove unspecified elements box is checked and the value of an element must be preserved, the Replacement field for the element would then read:

     @keep()

If the Keep group 18 box is checked, but a specific group 18 element must be removed, the Replacement field for that element would then read:

     @remove()

2.6.4 Conditionally Modifying Elements

If the InstitutionName element is to be kept if it is present and non-blank, but replaced with static text if it is missing or blank, the Replacement field for the element would read:

@if(InstitutionName,isblank){My Hospital}{@keep()}

If the StudyComments element is being used to contain a trial patient ID and the ID must have exactly seven numeric digits, and if this element is to be copied to the PatientID element, the Replacement field for the PatientID element would read:

@contents(StudyComments)

And the Replacement field for the StudyComments element would read:

@if(StudyComments,matches,"\\d{7}.*"){@remove()}{@quarantine()}

Note that the coding of the regular expression in this case looks odd because the escape character is doubled. This is necessary because the anonymizer and the regular expression processor both use the same escape character, the backslash. Thus, to get one escape character, it must itself be escaped.

Note also that the true clause will force the StudyComments element to be deleted from the object, which would be reasonable, since its contents are being moved to the PatientID field. If other processing were desired in this situation, it could be placed in the true clause.

In this example, a better script for the PatientID element might be:

@contents(StudyComments,"\\D")

This will delete all non-numeric characters from the string used for the PatientID. Some modalities insert a newline character at the end of entry fields when the operator ends an entry with the Enter key. This script filters out those characters and anything else in the field that is not numeric. Note that the regular expression in the StudyComments script above ended with “.*”. That script will match a seven-digit string ending in a newline.

2.6.5 Conditionally Processing Files

The skip function can be used in the following way to avoid processing files that have already been processed. Suppose that the ReferringPhysicianName element is not used in the clinical trial. Its Replacement field could be coded as:

     @if(ReferringPhysicianName,matches,"DONE"){@skip()}{DONE}

This will cause the anonymizer to insert the word DONE in the field on the first pass. If the object were to be processed again, the anonymizer would detect the word and skip the anonymization process.

2.6.6 Filtering Element Content

The contents(ElementName,"regex") function can be used to filter the contents of an element, selecting only a portion of its value. For example, suppose that the StudyComments element is populated by a modality with specially formatted content: a numeric code followed by other information including a user ID:

78.7812 [ADJUSTED: HE41328 - 01/02/2007 13:00:26]

The following function call would retrieve the leading code (78.7812):

@contents(StudyComments,"\\s.*")

The following function call would retrieve the user's ID (HE41328):

@contents(PatientName,"([^:]*:\\s+)|(\\s*-.*)")

2.6.7 Processing SQ Elements

The anonymizer has three functions that apply to SQ elements:

  • @keep() retains the element in the anonymized object without modification of any of the elements in any of its item datasets.
  • @remove() removes the element and all its item datasets from the anonymized object.
  • @process() anonymizes all the item datasets of the element.

When processing the root dataset, the anonymizer modifies existing elements in accordance with their scripts. It also creates new elements when instructed to do so by a script. When processing an item dataset, the anonymization scripts are used, but no new elements are allowed to be created. This feature must be kept in mind when coding the scripts for elements that are to be processed in item datasets to prevent the inadvertent creation of an element in the root dataset. Here is a contrived example to illustrate the point:

Suppose the [0010,4000] PatientComments element appears in an SQ item and it is to be replaced with static text. One way to code the script for the element is:

new text

This will replace the contents of all PatientComments elements anywhere in the object, whether in the root dataset or in any processed SQ elements' item datasets. And if the element does not appear in the root dataset, this script will create it. To prevent this action, the select function should be used to distinguish the two cases. If the intent is not to modify the element if it appears in the root dataset, the script would then be:

@select(){@keep()}{new text}

Note that in the root clause, the keep function will keep the element if it exists, but it will not create it if it doesn't.

As a further example, suppose the [0008,1155] RefSOPInstanceUID element appears in an SQ item and it is to be replaced with a hashed UID. Again, one way to code the script for the element is:

@hashuid(@UIDROOT,this)

As in the previous example, this will affect all RefSOPInstanceUID elements in the object, whether in the root dataset or in any processed SQ elements' item datasets. And as before, if the element does not appear in the root dataset, the anonymizer will try to create it. In this case, however, the missing element will not be created because of a unique feature of the hashuid function, which generates a remove function call if it fails. (It fails here because the this keyword refers to a non-existent element. The remove function call that is generated by the hashuid function does nothing here because the element doesn't exist.) Thus, the simple script above is protected from inadvertently creating elements.

For scripts containing other function calls or scripts that replace element values with static text, the select function must be used to prevent the creation or modification of the element in the root dataset.

When processing an item dataset, all references to elements are limited to the item dataset being processed. Thus, it is not possible to reference elements in the root dataset or in any other item dataset in the object.

When processing an item dataset, all parameters are available. Thus, when hashing UIDs, the $UIDROOT parameter is available to ensure that all UIDs are created with the same root.

3 Advanced Configuration

3.1 Extending the Anonymizer

The anonymizer can be extended to meet specialized requirements by editing the script file. A word to the wise: a certain amount of caution should be observed when editing powerful files.

The script file is a text file that can be edited with any good text editor like TextPad. The content of the file is a set of properties, one per line, in the form:

key = value

Properties beginning with # are disabled. Do not remove disabled properties or the anonymizer configurator will lose knowledge of the property. The order of the lines in the file determines the order in which the anonymizer configurator presents them to the user. There are four basic kinds of keys:

  • Keys beginning with param. are parameters. Traditionally, parameter names are all in upper case and all the parameters are defined at the top of the file, but there is no programmatic requirement to do so. If you want to define additional parameters for use in the DICOM element scripts, you can add them by appending the parameter name to the prefix, like this:
param.NEWPARAM = value
The = sign is required. The value is optional.
  • Keys beginning with set. provide replacement scripts for individual DICOM elements. Additional elements can be added. It is best to add them in sequence to make it easy to find them in the anonymizer configurator table, but there is no programmatic requirement to do so. Set keys have the form:
set.[gggg,eeee]ElementName = value
The ElementName is traditionally the name recognized by the dcm4che DICOM class library for the element, although it is the [group,element] designation that determines which element is modified by the script. When adding an element for a private group, you can pick any name you wish, but scripts cannot reference the element by name. The value is optional.
  • Keys beginning with keep.group immediately followed by the hex value of a DICOM group number, as in keep.group18, are global keep commands. They do not contain scripts. To provide a label for the command in the anonymizer configurator, the value of the property can be supplied, like this:
keep.group18 = Keep group 18 [recommended]
The standard dicom-anonymizer.properties file contains keep commands for groups 18, 20, and 28, and default label values for those groups are defined in the program. They may be overridden by specifying values in the script file. A typical use of this type of property is to provide a convenient way to keep a specific private group, but standard DICOM groups can be added as well.
  • Keys beginning with remove. are global remove commands. The anonymizer cannot be extended with remove commands.

3.2 Special Lookup Functions

The lookup function has two special features that can be useful in certain situations. These features are available in both versions of the function (@lookup(ElementName,KeyType) and @lookup(ElementName,KeyType,action)).

The first special feature is that the ElementName argument can be a sequence of element names, separated by the pipe character ("|"). For example, consider this function call:

@lookup(PatientID|StudyDate,xyz,skip)

When processing an object with a PatientID of 23456 and a StudyDate of 20120825, the anonymizer will construct this key into the lookup table:

xyz/23456|20120825.

This feature can be used to generate a replacement value only for specific combinations of elements. There is no limit to the number of elements that can be combined in this way.

If the key that is generated does not appear in the lookup table, the anonymizer takes the action specified in the action argument, if present.

The second special feature is that the lookup table has an indirection feature. If the value in the lookup table for a certain key starts with an at-sign ("@") and also contains a slash ("/"), then when the lookup occurs, the value (after removing the at-sign) is treated as a key and another lookup is done. This process is recursive, and it only stops when a retrieved value does not match an indirection key. (To prevent infinite loops, a limit of 10 indirection lookups is imposed.)

As an example of the use of these features in the retrospective processing of clinical trial data, suppose it is desired to alter the StudyDate element for DicomObjects based on the PatientID and the StudyDate. Suppose that there are three replacement StudyDate values, 20010201, 20020201, and 20030201. The StudyDate element might be coded as shown above. The lookup table might be constructed like this:

xyz/22|19980508 = @year/1
xyz/22|19990615 = @year/2
xyz/22|20000501 = @year/3
...similar rows for other patients
year/1 = 20010201
year/2 = 20020201
year/3 = 20030201

With this lookup table, the actual replacement dates can be specified in one place in the table, and through indirection, referenced from many places.

The script shown above assumes that if an object doesn't have a matching entry in the lookup table, the object is not to be anonymized. In some situations, it might be desirable to allow other de-identification to be done in the object and just to keep the original value for the StudyDate. In that case, the StudyDate element would be coded as:

@lookup(PatientID|StudyDate,xyz,keep)

The this keyword can be used anywhere in the ElementName argument. For the example above, the code could have been:

@lookup(PatientID|this,xyz,skip)

4 Precedence

It is possible to create a set of instructions that appear to be self-contradicting, so an instruction precedence is required. The principle for defining precedence is:

  1. A command specific to an element takes precedence over global commands.
  2. Global keep commands take precedence over global remove commands.

Thus, if an element is part of a private group and private groups are to be removed, but the element has a script requiring it to be kept, it is kept.

If an element is not selected (e.g., unchecked) and unchecked elements are to be globally removed, but the element is part of a group to be kept, it is kept.

If an element that is part of a private group is not selected and private groups are to be globally removed, but the element’s group is to be kept, the element is kept.

There is one exception to the principle: if overlays are to be globally removed, that command takes precedence over any keep commands that have been defined for individual overlay groups.