Difference between revisions of "The MIRC DICOM Anonymizer"
m (Protected "The MIRC DICOM Anonymizer" [edit=sysop:move=sysop]) |
|||
(39 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
This article describes how to configure the anonymizer on the DICOM Services contained in MIRC Storage Services. The anonymizer provides automatic modification of the header elements in DICOM objects received by the DICOM Service. Typical applications for the anonymizer are in MIRC Clinical Trial Services and File Services. The intended audience for this information is MIRC system administrators or clinical trial coordinators at field center sites. | This article describes how to configure the anonymizer on the DICOM Services contained in MIRC Storage Services. The anonymizer provides automatic modification of the header elements in DICOM objects received by the DICOM Service. Typical applications for the anonymizer are in MIRC Clinical Trial Services and File Services. The intended audience for this information is MIRC system administrators or clinical trial coordinators at field center sites. | ||
==DICOM Services== | ==DICOM Services== | ||
− | Each MIRC site may have one File Service and multiple Storage Services. | + | Each MIRC site may have one File Service and multiple Storage Services. Every service (File or Storage) has its own DICOM service, and each DICOM service has its own anonymizer and anonymizer configuration, allowing for separate processing of DICOM objects for each purpose. |
+ | |||
==Accessing the Anonymizer Configurator for a Storage Service== | ==Accessing the Anonymizer Configurator for a Storage Service== | ||
The Anonymizer Configurator is part of the Storage Service’s admin service. | The Anonymizer Configurator is part of the Storage Service’s admin service. | ||
Line 41: | Line 42: | ||
====@contents(ElementName,”regex”,”replacement”)==== | ====@contents(ElementName,”regex”,”replacement”)==== | ||
This <b>contents</b> function returns the contents of the DICOM element named by the argument, after replacing all the characters selected by the regular expression with the characters contained in the replacement string. If you are not familiar with regular expressions, get an experienced programmer to help you. The effect of the operation is the same as the Java statement: <b>String.replaceAll("regex","replacement");</b>. | This <b>contents</b> function returns the contents of the DICOM element named by the argument, after replacing all the characters selected by the regular expression with the characters contained in the replacement string. If you are not familiar with regular expressions, get an experienced programmer to help you. The effect of the operation is the same as the Java statement: <b>String.replaceAll("regex","replacement");</b>. | ||
+ | |||
+ | ====@encrypt(ElementName,”key”)==== | ||
+ | This <b>encrypt</b> function returns the contents of the DICOM element named by the argument, encrypting the value with the specified key. The key is a single-word string of any length. | ||
+ | |||
+ | ====@encrypt(ElementName,@ParameterName)==== | ||
+ | This <b>encrypt</b> function returns the contents of the DICOM element named by the argument, encrypting the value using the value of the specified parameter as the key. | ||
====@require( )==== | ====@require( )==== | ||
Line 60: | Line 67: | ||
*SITENAME | *SITENAME | ||
*SITEID | *SITEID | ||
− | Other parameter names can be added manually by editing the anonymizer.properties file and adding properties of the form: | + | Other parameter names can be added manually by editing the <b>dicom-anonymizer.properties</b> file and adding properties of the form: |
<b>param.NAME=</b> | <b>param.NAME=</b> | ||
Line 76: | Line 83: | ||
If the skip value for a word is negative, it counts from the end of the word. For example, to select the last 3 characters of a word, the pair would be <b>-3,3</b>. | If the skip value for a word is negative, it counts from the end of the word. For example, to select the last 3 characters of a word, the pair would be <b>-3,3</b>. | ||
− | While the function produces a scrambled text string, its output is more likely to result in duplicate values for similar names than the <b> | + | While the function produces a scrambled text string, its output is more likely to result in duplicate values for similar names than the <b>alphabetichash</b> or <b>numerichash</b> functions. For this reason, they are preferred. |
− | ====@ | + | ====@alphabetichash(ElementName,maxCharsOutput)==== |
− | + | The <b>alphabetichash</b> function returns an uppercase text string of the specified length by computing the secure hash of the identified element's text. The algorithm is: | |
#combine all the words into one string; | #combine all the words into one string; | ||
#remove whitespace, apostrophes, and periods; | #remove whitespace, apostrophes, and periods; | ||
Line 89: | Line 96: | ||
#return <b>maxCharsOutput</b> characters from the low-order end of the string. | #return <b>maxCharsOutput</b> characters from the low-order end of the string. | ||
− | ====@hash(ElementName,maxCharsOutput,maxWordsInput)==== | + | ====@alphabetichash(ElementName,maxCharsOutput,maxWordsInput)==== |
− | This <b> | + | This <b>alphabetichash</b> function operates like <b>@alphabetichash(ElementName,maxCharsOutput)</b> except that it only accepts the first <b>maxWordsInput</b> words in the input element. This is the preferred method for producing hashed patient names because it can be used to suppress middle names, which may be absent, present as a full name, or present as an initial. In this case, a good approach would be <b>@hash(PatientName,6,2)</b>. |
+ | |||
+ | ====@numerichash(ElementName,maxCharsOutput)==== | ||
+ | The <b>numerichash</b> function returns a numeric string of the specified length by computing the secure hash of the identified element's text. The algorithm is: | ||
+ | #combine all the words into one string; | ||
+ | #remove whitespace, apostrophes, and periods; | ||
+ | #convert to uppercase; | ||
+ | #compute the secure hash of the resulting string; | ||
+ | #convert the binary result to a base-10 string; | ||
+ | #return <b>maxCharsOutput</b> characters from the low-order end of the string. | ||
+ | |||
+ | ====@numerichash(ElementName,maxCharsOutput,maxWordsInput)==== | ||
+ | This <b>numerichash</b> function operates like <b>@numerichash(ElementName,maxCharsOutput)</b> except that it only accepts the first <b>maxWordsInput</b> words in the input element. This is the preferred method for producing hashed patient names because it can be used to suppress middle names, which may be absent, present as a full name, or present as an initial. In this case, a good approach would be <b>@numerichash(PatientName,6,2)</b>. | ||
====@round(ElementName,groupsize)==== | ====@round(ElementName,groupsize)==== | ||
Line 115: | Line 134: | ||
====@keep( )==== | ====@keep( )==== | ||
− | The <b>keep</b> function forces the element to be preserved in the DICOM object. This function is provided to make it easy to preserve elements that would otherwise be removed by a global action. This function is equivalent to <b>@contents(this)</b>. | + | The <b>keep</b> function forces the element to be preserved in the DICOM object. This function is provided to make it easy to preserve elements that would otherwise be removed by a global action. This function is equivalent to <b>@contents(this)</b>, but the <b>keep</b> function is preferred because it is less costly and it handles sequence elements that the <b>contents</b> function does not. |
+ | |||
+ | ====@hash(ElementName)==== | ||
+ | The <b>hash</b> function computes the MD5 hash of an element's value and return it as a base-10 digit string. | ||
====@uid(root,ElementName)==== | ====@uid(root,ElementName)==== | ||
Line 121: | Line 143: | ||
The <b>uid</b> function recognizes a parameter reference in the root argument, and is typically coded as <b>@uid(@UIDROOT,ElementName)</b>. | The <b>uid</b> function recognizes a parameter reference in the root argument, and is typically coded as <b>@uid(@UIDROOT,ElementName)</b>. | ||
+ | |||
+ | ====@hashuid(root,ElementName)==== | ||
+ | The <b>hashuid</b> function is designed to create UIDs from existing ones. The <b>root</b> argument is a text string containing the UID root for the institution (for example, <b>1.2.840.4267.32.</b>. The <b>hashuid</b> function creates a new UID by computing the MD5 hash of the existing UID, converting it to a base-10 digit string and prepending the root. If the <b>root</b> does not end in a period, the anonymizer appends a period. | ||
+ | |||
+ | The <b>hashuid</b> function recognizes a parameter reference in the root argument, and is typically coded as <b>@hashuid(@UIDROOT,this)</b>. | ||
====@ptid(siteID,ElementName,prefix,first,width,suffix)==== | ====@ptid(siteID,ElementName,prefix,first,width,suffix)==== | ||
Line 127: | Line 154: | ||
The <b>ptid</b> function recognizes parameter references in the site ID and either or both of the prefix and suffix arguments, as in <b>@ptid(@SITEID,PatientID,@PREFIX,1,4,@SUFFIX)</b>. | The <b>ptid</b> function recognizes parameter references in the site ID and either or both of the prefix and suffix arguments, as in <b>@ptid(@SITEID,PatientID,@PREFIX,1,4,@SUFFIX)</b>. | ||
− | + | ====@hashptid(siteID,ElementName,prefix,suffix)==== | |
+ | The <b>hashptid</b> function is designed to re-identify patients, replacing their clinical PatientID field with a trial PatientID field that is generated from the old value. When the <b>hashptid</b> function is called, the anonymizer obtains the contents of the element identified by <b>ElementName</b> (typically PatientID), computes the MD5 hash of the value, and converts it to a base-10 digit string. It then prepends the <b>prefix</b> argument and appends the <b>suffix</b> argument to the numeric value. | ||
+ | |||
+ | The <b>hashptid</b> function recognizes parameter references in the siteID and either or both of the prefix and suffix arguments, as in <b>@hashptid(@SITEID,PatientID,@PREFIX,@SUFFIX)</b>. | ||
====@id(ElementName)==== | ====@id(ElementName)==== | ||
The <b>id</b> function is designed to create IDs (not UIDs) that are unique to a patient and an element but not unique from patient to patient. The <b>id</b> function looks up the value of the element in a table that is indexed by patient ID, the (group,element) designation for the named element, and the value of the named element. If an entry is found, it is returned. If no entry is found, a sequential number is returned. This function is intended for use in de-identifying non-unique IDs like StudyID. | The <b>id</b> function is designed to create IDs (not UIDs) that are unique to a patient and an element but not unique from patient to patient. The <b>id</b> function looks up the value of the element in a table that is indexed by patient ID, the (group,element) designation for the named element, and the value of the named element. If an entry is found, it is returned. If no entry is found, a sequential number is returned. This function is intended for use in de-identifying non-unique IDs like StudyID. | ||
+ | ====@integer()==== | ||
+ | The <b>integer</b> function returns the next integer in sequence, starting with 1. It never returns the same integer twice. This function is intended for very special applications in which unique numbers must be generated to replace UIDs. This function cannot be used to remap identifiers that are referenced in other places because the integer that is returned is not related to any specific original value. | ||
====@accession(ElementName)==== | ====@accession(ElementName)==== | ||
Line 138: | Line 170: | ||
The <b>offsetdate</b> function is designed to de-identify dates by replacing them with dates offset from a base date. The <b>offsetdate</b> function looks up the value of the element in a table that is indexed by the patient ID and the (group,element) designation for the named element. If a value is not found in the table, it stores the current value and returns the base date. If a value is found in the table, it computes the time difference between the current value and the table value and adds it to the base date. This preserves time differences while hiding absolute dates. | The <b>offsetdate</b> function is designed to de-identify dates by replacing them with dates offset from a base date. The <b>offsetdate</b> function looks up the value of the element in a table that is indexed by the patient ID and the (group,element) designation for the named element. If a value is not found in the table, it stores the current value and returns the base date. If a value is found in the table, it computes the time difference between the current value and the table value and adds it to the base date. This preserves time differences while hiding absolute dates. | ||
− | The offsetdate | + | The <b>offsetdate</b> function recognizes a parameter reference in the siteID and BaseDate arguments, as in <b>@offsetdate(@SITEID,StudyDate,@BASEDATE)</b>. |
− | Note: the <b>incrementdate</b> function provides essentially the same functionality <b>offsetdate</b> function, but at much lower cost. The <b>incrementdate</b> function can also maintain date relationships between different | + | Note: the <b>incrementdate</b> function provides essentially the same functionality as the <b>offsetdate</b> function, but at much lower cost. The <b>incrementdate</b> function can also maintain date relationships between different DICOM elements, which the <b>offsetdate</b> function cannot. |
====@incrementdate(ElementName,incInDays)==== | ====@incrementdate(ElementName,incInDays)==== | ||
− | The <b>incrementdate</b> function adds a constant offset to a date. The offset is specified in days | + | The <b>incrementdate</b> function adds a constant offset to a date. The offset is specified in days in the <b>incInDays</b> argument. The offset can be positive or negative, with positive increments generating later dates. This function is an alternative to the <b>offsetdate</b> function. It is preferred because it is much less expensive in communication and storage than the <b>offsetdate</b> function. |
+ | |||
+ | ====@modifydate(ElementName,year,month,day)==== | ||
+ | The <b>modifydate</b> function modifies the individual fields in a date. The <b>year</b>, <b>month</b>, and <b>day</b> parameters replace the corresponding values in the date. If a parameter is an asterisk, the corresponding value in the original date is preserved. The <b>modifydate</b> function recognizes parameter references in the arguments. | ||
+ | |||
+ | For example, if the <b>StudyDate</b> element is coded as <b><tt>@modifydate(this,*,1,1)</tt></b>, the StudyDate will be reset to the first of January, leaving the year unmodified. | ||
====@lookup(ElementName,KeyType)==== | ====@lookup(ElementName,KeyType)==== | ||
Line 165: | Line 202: | ||
ptid/27 = 405<br> | ptid/27 = 405<br> | ||
− | If the <b>Replacement</b> field for the <b>PatientID</b> element is coded as <b>@lookup(this,ptid)</b>, then | + | If the <b>Replacement</b> field for the <b>PatientID</b> element is coded as <b>@lookup(this,ptid)</b>, then a PatientID element with the value <b>25</b> will be mapped to the value <b>403</b>. |
===Global Actions=== | ===Global Actions=== | ||
Line 212: | Line 249: | ||
====@quarantine( )==== | ====@quarantine( )==== | ||
− | The <b>quarantine</b> function causes the anonymizer to abort the anonymization process and place the unmodified object in the quarantine for manual processing. On a MIRC site, this processing must be done by the administrator or someone who has access to the MIRC site’s computer. At a site running the FieldCenter program, the program itself provides editing functions to allow the object to be manipulated and resubmitted to the anonymizer. The <b>quarantine</b> function must appear in a conditional clause of an <b>if</b> statement, but this is not enforced programmatically. | + | The <b>quarantine</b> function causes the anonymizer to abort the anonymization process and place the unmodified object in the quarantine for manual processing. On a MIRC site, this processing must be done by the administrator or someone who has access to the MIRC site’s computer. At a site running the FieldCenter program, the program itself provides editing functions to allow the object to be manipulated and resubmitted to the anonymizer. The <b>quarantine</b> function must appear in a conditional clause of an <b>if</b> statement, but this is not enforced programmatically. If it were to appear in script that is executed during every anonymization, it would force the quarantining of every object. |
====@skip( )==== | ====@skip( )==== | ||
Line 283: | Line 320: | ||
====Conditionally Processing Files==== | ====Conditionally Processing Files==== | ||
− | The <b>skip</b> function can be used in the following way. Suppose that the <b>ReferringPhysicianName</b> element is not used in the clinical trial. Its <b>Replacement</b> field could be coded as: | + | The <b>skip</b> function can be used in the following way to avoid processing files that have already been processed. Suppose that the <b>ReferringPhysicianName</b> element is not used in the clinical trial. Its <b>Replacement</b> field could be coded as: |
<b>@if(ReferringPhysicianName,matches,”DONE”){@skip()}{DONE}</b> | <b>@if(ReferringPhysicianName,matches,”DONE”){@skip()}{DONE}</b> | ||
This will cause the anonymizer to insert the word <b>DONE</b> in the field on the first pass. If the object were to be processed again, the anonymizer would detect the word and skip the anonymization process. | This will cause the anonymizer to insert the word <b>DONE</b> in the field on the first pass. If the object were to be processed again, the anonymizer would detect the word and skip the anonymization process. | ||
+ | |||
+ | ====Parsing Element Content==== | ||
+ | The <b>contents(ElementName,"regex")</b> function can be used to parse the contents of an element, retrieving only a portion of its value. Suppose that the <b>StudyComments</b> element is populated by a modality operator with specially formatted content: a numeric code followed by other information including a user ID: | ||
+ | |||
+ | <b><tt>78.7812 [ADJUSTED: HE41328 - 01/02/2007 13:00:26]</tt></b> | ||
+ | |||
+ | The following function call would retrieve the leading code (<b>78.7812</b>): | ||
+ | |||
+ | <b><tt>@contents(StudyComments,"\\s.*")</tt></b> | ||
+ | |||
+ | The following function call would retrieve the user's ID (<b>HE41328</b>): | ||
+ | |||
+ | <b><tt>@contents(PatientName,"([^:]*:\\s+)|(\\s*-.*)")</tt></b> | ||
==Saving the Changes== | ==Saving the Changes== | ||
Line 298: | Line 348: | ||
==Advanced Configuration== | ==Advanced Configuration== | ||
The anonymizer can be extended to meet specialized requirements by editing the <b>dicom-anonymizer.properties</b> file, which drives both the anonymizer itself and the anonymizer configurator in the MIRC DICOM Service and the MIRC FieldCenter and DicomEditor programs. A word to the wise: a certain amount of caution should be observed when editing powerful files. | The anonymizer can be extended to meet specialized requirements by editing the <b>dicom-anonymizer.properties</b> file, which drives both the anonymizer itself and the anonymizer configurator in the MIRC DICOM Service and the MIRC FieldCenter and DicomEditor programs. A word to the wise: a certain amount of caution should be observed when editing powerful files. | ||
+ | |||
+ | The File Service and each Storage Service has its own DICOM Service, and each DICOM Service has its own anonymizer. | ||
+ | |||
+ | The File Service's DICOM anonymizer script file is located at: <b>Tomcat/webapps/file/dicom-anonymizer.properties</b> | ||
+ | |||
+ | A Storage Service's DICOM anonymizer script file is located at: <b>Tomcat/webapps/[storageservicename]/trial/dicom-anonymizer.properties</b> | ||
+ | |||
The anonymizer.properties file is a text file that can be edited with any good text editor like TextPad. The content of the file is a set of properties, one per line, in the form: | The anonymizer.properties file is a text file that can be edited with any good text editor like TextPad. The content of the file is a set of properties, one per line, in the form: | ||
− | <b>key = value</b> | + | :<b>key = value</b> |
Properties beginning with <b>#</b> are disabled. Do not remove disabled properties or the anonymizer configurator will lose knowledge of the property. The order of the lines in the file determines the order in which the anonymizer configurator presents them to the user. There are four basic kinds of keys: | Properties beginning with <b>#</b> are disabled. Do not remove disabled properties or the anonymizer configurator will lose knowledge of the property. The order of the lines in the file determines the order in which the anonymizer configurator presents them to the user. There are four basic kinds of keys: | ||
− | *Keys beginning with <b>param.</b> are parameters. Traditionally, parameter names are all in upper case and all the parameters are defined at the top of the file, but there is no programmatic requirement to do so. If you want to define additional parameters for use in the DICOM element scripts, you can add them by appending the parameter name to the prefix, like this:<br><br><b>param.NEWPARAM = | + | *Keys beginning with <b>param.</b> are parameters. Traditionally, parameter names are all in upper case and all the parameters are defined at the top of the file, but there is no programmatic requirement to do so. If you want to define additional parameters for use in the DICOM element scripts, you can add them by appending the parameter name to the prefix, like this:<br><br><b>param.NEWPARAM = value</b><br><br>The <b>=</b> sign is required. The value is optional. |
− | *Keys beginning with <b>set.</b> provide replacement scripts for individual DICOM elements. Additional elements can be added. It is best to add them in sequence to make it easy to find them in the anonymizer configurator table, but there is no programmatic requirement to do so. Set keys have the form:<br><br><b>set.[gggg,eeee]ElementName= | + | *Keys beginning with <b>set.</b> provide replacement scripts for individual DICOM elements. Additional elements can be added. It is best to add them in sequence to make it easy to find them in the anonymizer configurator table, but there is no programmatic requirement to do so. Set keys have the form:<br><br><b>set.[gggg,eeee]ElementName= value</b><br><br>The <b>ElementName</b> is traditionally the name recognized by the dcm4che DICOM class library for the element, although it is the <b>[group,element]</b> designation that determines which element is modified by the script. When adding an element for a private group, you can pick any name you wish, but scripts cannot reference the element by name. The value is optional. |
*Keys beginning with <b>keep.group</b> immediately followed by the hex value of a DICOM group number, as in <b>keep.group18</b>, are global <b>keep</b> commands. They do not contain scripts. To provide a label for the command in the anonymizer configurator, the value of the property can be supplied, like this:<br><br><b>keep.group18 = Keep group 18 [recommended]</b><br><br>The standard <b>dicom-anonymizer.properties</b> file contains <b>keep</b> commands for groups 18, 20, and 28, and default label values for those groups are defined in the program. They may be overridden by specifying values in the <b>dicom-anonymizer.properties</b> file. A typical use of this type of property is to provide a convenient way to keep a specific private group, but standard DICOM groups can be added as well. | *Keys beginning with <b>keep.group</b> immediately followed by the hex value of a DICOM group number, as in <b>keep.group18</b>, are global <b>keep</b> commands. They do not contain scripts. To provide a label for the command in the anonymizer configurator, the value of the property can be supplied, like this:<br><br><b>keep.group18 = Keep group 18 [recommended]</b><br><br>The standard <b>dicom-anonymizer.properties</b> file contains <b>keep</b> commands for groups 18, 20, and 28, and default label values for those groups are defined in the program. They may be overridden by specifying values in the <b>dicom-anonymizer.properties</b> file. A typical use of this type of property is to provide a convenient way to keep a specific private group, but standard DICOM groups can be added as well. | ||
*Keys beginning with <b>remove.</b> are global <b>remove</b> commands. The anonymizer cannot be extended with <b>remove</b> commands. | *Keys beginning with <b>remove.</b> are global <b>remove</b> commands. The anonymizer cannot be extended with <b>remove</b> commands. |
Latest revision as of 19:28, 31 July 2009
This article describes how to configure the anonymizer on the DICOM Services contained in MIRC Storage Services. The anonymizer provides automatic modification of the header elements in DICOM objects received by the DICOM Service. Typical applications for the anonymizer are in MIRC Clinical Trial Services and File Services. The intended audience for this information is MIRC system administrators or clinical trial coordinators at field center sites.
1 DICOM Services
Each MIRC site may have one File Service and multiple Storage Services. Every service (File or Storage) has its own DICOM service, and each DICOM service has its own anonymizer and anonymizer configuration, allowing for separate processing of DICOM objects for each purpose.
2 Accessing the Anonymizer Configurator for a Storage Service
The Anonymizer Configurator is part of the Storage Service’s admin service.
To access the Admin Service, go to the query page for the site, select the Storage Service in the list at the top of the query pane, and click the Admin Service button on the right side of the window.
On the Admin page, click the Update Configuration button in the DICOM Service column.
On the DICOM Service Configurator page, click the Update the Anonymizer button in the DICOM Import Service section. The result will be a page that looks like this:
3 Modifying DICOM Elements
The anonymizer has a simple scripting language. Each DICOM element can have its own replacement script containing contents and instructions for what to do with the element when it is processed.
To cause the anonymizer to take direct action on an element when a DICOM object is received, place a check in the Select checkbox for the element. Elements that are unchecked are left intact unless they qualify for global action as described later.
If you want to replace the contents of an element with new static text, enter the text in the Replacement text field for the element.
If you want the element to be removed from the DICOM object, blank the Replacement text field for the element or, alternatively, use the remove( ) function described below.
If you want to insert an empty element or replace the contents of an element with an empty (zero-length) string, use the empty( ) function described below.
Leading and trailing blanks in all Replacement fields are removed before processing.
In the functions described below, wherever an ElementName is required, the keyword this may be used to indicate the element whose replacement value is being constructed.
3.1 Special Functions
The anonymizer provides several functions that can be used to modify elements. Functions are invoked by a leading @, followed by the name of the function, followed by the arguments (if any) in parentheses. Function calls can be embedded in static text in the Replacement text field. Multiple function calls can appear in one element.
To allow @ characters to appear as static text, the anonymizer recognizes the \ escape character, which forces the next character to be taken literally. To insert a \ character, it is necessary to escape it, e.g. \\.
3.1.1 @contents(ElementName)
This contents function returns the contents of the DICOM element named by the argument.
3.1.2 @contents(ElementName,”regex”)
This contents function returns the contents of the DICOM element named by the argument, after removing all the characters selected by the regular expression. If you are not familiar with regular expressions, get an experienced programmer to help you. The effect of the operation is the same as the Java statement: String.replaceAll("regex","");.
3.1.3 @contents(ElementName,”regex”,”replacement”)
This contents function returns the contents of the DICOM element named by the argument, after replacing all the characters selected by the regular expression with the characters contained in the replacement string. If you are not familiar with regular expressions, get an experienced programmer to help you. The effect of the operation is the same as the Java statement: String.replaceAll("regex","replacement");.
3.1.4 @encrypt(ElementName,”key”)
This encrypt function returns the contents of the DICOM element named by the argument, encrypting the value with the specified key. The key is a single-word string of any length.
3.1.5 @encrypt(ElementName,@ParameterName)
This encrypt function returns the contents of the DICOM element named by the argument, encrypting the value using the value of the specified parameter as the key.
3.1.6 @require( )
This require function creates an empty element if the current element does not exist in the object.
3.1.7 @require(ElementName)
This require function creates an element if the current element does not exist in the object. The current element’s contents are set to the contents of the named element. If the named element does not exist in the object, the created element is empty.
3.1.8 @require(ElementName,”default value”)
This require function creates an element if the current element does not exist in the object. The current element’s contents are set to the contents of the named element. If the named element does not exist in the object, the created element’s contents are set to the default value.
3.1.9 @param(@ParameterName)
The param function returns the contents of the named parameter. Parameters are stored in the dicom-anonymizer.properties file and can be accessed by name, allowing their contents to be defined once and used many times in various elements. These parameter names are predefined:
- TRIAL
- UIDROOT
- BASEDATE
- PREFIX
- SUFFIX
- SITENAME
- SITEID
Other parameter names can be added manually by editing the dicom-anonymizer.properties file and adding properties of the form:
param.NAME=
3.1.10 @initials(ElementName)
The initials function returns a string of uppercase characters constructed from the contents of the named element by taking the first letter of each field in the element and then placing the first character last in the string. The purpose of this function is to generate the patient’s initials from the contents of a PatientName element which is encoded as Last^First^Middle. In this example, the @initials(PatientName) function call would return FML.
3.1.11 @scramble(ElementName,word1skip,word1take,word2skip,word2take,...)
The scramble function picks letters out of the text of an element and returns an uppercase text string. The motivation is to produce a replacement for a patient name that is not clearly connected to the patient.
In elements like PatientName, individual names are separated by the "^" character, e.g. Last^First^Middle. The arguments of the scramble function identify the element whose text is to be used as input, followed by pairs of values, one pair for each word to be used. The first value of a pair identifies the number of characters in the word to be skipped before accepting characters into the output. The second value of a pair identifies the number of characters to be placed into the output.
For example, the function call @scramble(this,2,2,3,1), when applied to an element whose value is Mouse^Michael^J produces USH.
If the skip value for a word is negative, it counts from the end of the word. For example, to select the last 3 characters of a word, the pair would be -3,3.
While the function produces a scrambled text string, its output is more likely to result in duplicate values for similar names than the alphabetichash or numerichash functions. For this reason, they are preferred.
3.1.12 @alphabetichash(ElementName,maxCharsOutput)
The alphabetichash function returns an uppercase text string of the specified length by computing the secure hash of the identified element's text. The algorithm is:
- combine all the words into one string;
- remove whitespace, apostrophes, and periods;
- convert to uppercase;
- compute the secure hash of the resulting string;
- encode the hash value in base 64;
- remove all the non-alphabetic characters from the base-64 string;
- convert the result to uppercase;
- return maxCharsOutput characters from the low-order end of the string.
3.1.13 @alphabetichash(ElementName,maxCharsOutput,maxWordsInput)
This alphabetichash function operates like @alphabetichash(ElementName,maxCharsOutput) except that it only accepts the first maxWordsInput words in the input element. This is the preferred method for producing hashed patient names because it can be used to suppress middle names, which may be absent, present as a full name, or present as an initial. In this case, a good approach would be @hash(PatientName,6,2).
3.1.14 @numerichash(ElementName,maxCharsOutput)
The numerichash function returns a numeric string of the specified length by computing the secure hash of the identified element's text. The algorithm is:
- combine all the words into one string;
- remove whitespace, apostrophes, and periods;
- convert to uppercase;
- compute the secure hash of the resulting string;
- convert the binary result to a base-10 string;
- return maxCharsOutput characters from the low-order end of the string.
3.1.15 @numerichash(ElementName,maxCharsOutput,maxWordsInput)
This numerichash function operates like @numerichash(ElementName,maxCharsOutput) except that it only accepts the first maxWordsInput words in the input element. This is the preferred method for producing hashed patient names because it can be used to suppress middle names, which may be absent, present as a full name, or present as an initial. In this case, a good approach would be @numerichash(PatientName,6,2).
3.1.16 @round(ElementName,groupsize)
The round function is intended for use on patient age elements to allow them to be binned into groups of groupsize size. The center of the first group is always at zero. Therefore, if the PatientAge element contains 57, the function call @round(PatientAge,10) returns 60.
The groupsize argument can also be a parameter. For example, if a parameter called AGEBINSIZE has been defined, the function call could be coded as:
@round(PatientAge,@AGEBINSIZE)
3.1.17 @date(separator)
The date function returns the current date in the format YYYY-MM-DD where the “-“ character is replaced by the separator string. The value corresponds to the local date at the instant the anonymizer calls the function. To generate a DICOM-compliant date, use an empty separator string, e.g @date().
3.1.18 @time(separator)
The time function returns the current 24-hour time in the format HH:MM:SS where the “:” character is replaced by the separator string. The time corresponds to the local time at the instant the anonymizer calls the function. To generate a DICOM-compliant date, use an empty separator string, e.g. @time().
3.1.19 @empty( )
The empty function returns a zero-length string. This function is provided to allow differentiation between a blank Replacement text field, which causes deletion of the element from the DICOM object, and an empty element.
3.1.20 @blank(n)
The blank function returns a string of blanks of length n. This function is provided to allow a fixed-length field to be blanked. The function call @blank(0) is equivalent to @empty().
3.1.21 @remove( )
The remove function forces the element to be removed from the DICOM object. It is equivalent to a blank Replacement field, but it is more visually apparent on the Anonymizer Configurator page.
3.1.22 @keep( )
The keep function forces the element to be preserved in the DICOM object. This function is provided to make it easy to preserve elements that would otherwise be removed by a global action. This function is equivalent to @contents(this), but the keep function is preferred because it is less costly and it handles sequence elements that the contents function does not.
3.1.23 @hash(ElementName)
The hash function computes the MD5 hash of an element's value and return it as a base-10 digit string.
3.1.24 @uid(root,ElementName)
The uid function is designed to create UIDs. The root argument is a text string containing the UID root for the institution (for example, 1.2.840.4267.32.. The uid function checks a table to determine whether it has created a replacement UID for the value of the element before. If it has, it returns the replacement UID that it created before. If it has not, it creates a new UID by appending a sequential number to the root. If the root does not end in a period, the anonymizer appends a period.
The uid function recognizes a parameter reference in the root argument, and is typically coded as @uid(@UIDROOT,ElementName).
3.1.25 @hashuid(root,ElementName)
The hashuid function is designed to create UIDs from existing ones. The root argument is a text string containing the UID root for the institution (for example, 1.2.840.4267.32.. The hashuid function creates a new UID by computing the MD5 hash of the existing UID, converting it to a base-10 digit string and prepending the root. If the root does not end in a period, the anonymizer appends a period.
The hashuid function recognizes a parameter reference in the root argument, and is typically coded as @hashuid(@UIDROOT,this).
3.1.26 @ptid(siteID,ElementName,prefix,first,width,suffix)
The ptid function is designed to re-identify patients, replacing their clinical PatientID field with a trial PatientID field that is assigned automatically by the anonymizer. When the ptid function is called, the anonymizer obtains the contents of the element identified by ElementName (typically PatientID) and looks it up in a table to see if a trial PatientID has been assigned to that value at the identified site. If so, it returns the trial PatientID. If not, it creates a new trial PatientID from a sequential number that starts (at the beginning of the trial) with the numeric value of the first argument and increments for each new clinical PatientID that is encountered. If the width argument is larger than the number of digits in the number, the function prepends 0 characters until it has the required width. It then prepends the prefix argument and appends the suffix argument to the numeric value and stores it in its internal table opposite the clinical PatientID for future reference.
The ptid function recognizes parameter references in the site ID and either or both of the prefix and suffix arguments, as in @ptid(@SITEID,PatientID,@PREFIX,1,4,@SUFFIX).
3.1.27 @hashptid(siteID,ElementName,prefix,suffix)
The hashptid function is designed to re-identify patients, replacing their clinical PatientID field with a trial PatientID field that is generated from the old value. When the hashptid function is called, the anonymizer obtains the contents of the element identified by ElementName (typically PatientID), computes the MD5 hash of the value, and converts it to a base-10 digit string. It then prepends the prefix argument and appends the suffix argument to the numeric value.
The hashptid function recognizes parameter references in the siteID and either or both of the prefix and suffix arguments, as in @hashptid(@SITEID,PatientID,@PREFIX,@SUFFIX).
3.1.28 @id(ElementName)
The id function is designed to create IDs (not UIDs) that are unique to a patient and an element but not unique from patient to patient. The id function looks up the value of the element in a table that is indexed by patient ID, the (group,element) designation for the named element, and the value of the named element. If an entry is found, it is returned. If no entry is found, a sequential number is returned. This function is intended for use in de-identifying non-unique IDs like StudyID.
3.1.29 @integer()
The integer function returns the next integer in sequence, starting with 1. It never returns the same integer twice. This function is intended for very special applications in which unique numbers must be generated to replace UIDs. This function cannot be used to remap identifiers that are referenced in other places because the integer that is returned is not related to any specific original value.
3.1.30 @accession(ElementName)
The accession function creates sequential integers for a specific element. The accession function looks up the value of the element in a table that is indexed by the (group,element) designation for the named element and the value of the element. If an entry is found, it is returned. If no entry is found, a sequential number is returned. This function is intended for use in de-identifying non-unique IDs like accession numbers.
3.1.31 @offsetdate(siteID,ElementName,BaseDate)
The offsetdate function is designed to de-identify dates by replacing them with dates offset from a base date. The offsetdate function looks up the value of the element in a table that is indexed by the patient ID and the (group,element) designation for the named element. If a value is not found in the table, it stores the current value and returns the base date. If a value is found in the table, it computes the time difference between the current value and the table value and adds it to the base date. This preserves time differences while hiding absolute dates.
The offsetdate function recognizes a parameter reference in the siteID and BaseDate arguments, as in @offsetdate(@SITEID,StudyDate,@BASEDATE).
Note: the incrementdate function provides essentially the same functionality as the offsetdate function, but at much lower cost. The incrementdate function can also maintain date relationships between different DICOM elements, which the offsetdate function cannot.
3.1.32 @incrementdate(ElementName,incInDays)
The incrementdate function adds a constant offset to a date. The offset is specified in days in the incInDays argument. The offset can be positive or negative, with positive increments generating later dates. This function is an alternative to the offsetdate function. It is preferred because it is much less expensive in communication and storage than the offsetdate function.
3.1.33 @modifydate(ElementName,year,month,day)
The modifydate function modifies the individual fields in a date. The year, month, and day parameters replace the corresponding values in the date. If a parameter is an asterisk, the corresponding value in the original date is preserved. The modifydate function recognizes parameter references in the arguments.
For example, if the StudyDate element is coded as @modifydate(this,*,1,1), the StudyDate will be reset to the first of January, leaving the year unmodified.
3.1.34 @lookup(ElementName,KeyType)
The lookup function maps values through a local lookup table. It is intended to be used for mapping values that are known to the local site. For instance, it can be used to map patient ID values to case numbers by preloading the lookup table with values matching each patient ID with the corresponding case number.
The ElementName argument can be replaced with the this keyword when the value to be remapped is the value of the current DICOM element.
To allow for mapping multiple types of values in one anonymization step, the KeyType argument identifies the category. Its value is any text string that does not contain a colon or equals sign. It is best to use a single descriptive word or abbreviation.
The lookup table is a properties file named lookup-table.properties that must be located in the same directory as the dicom-anonymizer.properties file. On a MIRC site, this is Tomcat/webapps/[storage service name]/trial. The format of the lookup table file is:
KeyType/value = remapped value
For example, if you are remapping patient IDs to case numbers, you might have a lookup table file that looks like:
ptid/22 = 400
ptid/23 = 401
ptid/24 = 402
ptid/25 = 403
ptid/26 = 404
ptid/27 = 405
If the Replacement field for the PatientID element is coded as @lookup(this,ptid), then a PatientID element with the value 25 will be mapped to the value 403.
3.2 Global Actions
At the bottom of the Anonymizer Configurator page, the last lines in the table look like this:
3.2.1 Keep group 18
Checking the “Keep group 18” box causes the anonymizer to preserve all group 18 elements. This selection overrides the “Remove unchecked elements” selection. Actions specified for checked group 18 elements take precedence over all global actions.
3.2.2 Keep group 20
Checking the “Keep group 20” box causes the anonymizer to preserve all group 20 elements. This selection overrides the “Remove unchecked elements” selection. Actions specified for checked group 20 elements take precedence over all global actions.
3.2.3 Keep group 28
Checking the “Keep group 28” box causes the anonymizer to preserve all group 28 elements. This selection overrides the “Remove unchecked elements” selection. Actions specified for checked group 28 elements take precedence over all global actions.
3.2.4 Remove private groups
Checking the “Remove private groups” box causes the anonymizer to remove all elements in odd-numbered groups. These are private groups whose contents are not specified by the DICOM standard. Because these groups often contain PHI, they are usually removed when fully de-identifying a DICOM object. If the box is not checked, elements in private groups are kept.
3.2.5 Remove unchecked elements
Checking the “Remove unchecked elements” box causes the anonymizer to remove all elements that have not been selected in the table for special handling. There are several exceptions to this action, however, where unselected elements are still preserved by default, even when removing unspecified elements:
- The SOP Class UID
- The SOP Instance UID
- The Study Instance UID
- Group 28 (the parameters describing the pixels)
- Groups 60xx (overlays)
To remove the first three elements requires specific action in their scripts. Generally, those elements are re-identified using the uid function or simply preserved without modification
3.2.6 Remove overlays
Checking the “Remove overlays” box causes the anonymizer to remove all elements in 60xx groups. These are overlays and are sometimes removed when fully de-identifying an object because they can contain PHI as annotations. The notation “not recommended” is simply to discourage an administrator from removing these groups unless he knows exactly what he is doing.
3.3 Conditional Functions
The anonymizer has a limited conditional capability designed to allow it to perform different actions depending on the content of an element. The form of the conditional statement is:
@if(ElementName, condition, x) {true clause} {false clause}
Where the third parameter, x, is used only if the condition requires it. Both clauses are required in the statement or the anonymizer will ignore any commands that appear in the replacement script after the true clause. Whitespace within the arguments or between the clauses is ignored.
Multiple if statements are allowed in one Replacement field, but nested if statements are not supported. Function calls are allowed within the conditional clauses.
3.3.1 @if(ElementName,isblank)
The isblank conditional statement executes the true clause if the named element is missing from the object or appears with a zero length or with a non-zero length and contains only blank characters; otherwise, it executes the false clause.
3.3.2 @if(ElementName,matches,”regex”)
The matches conditional statement executes the true clause if the contents of the named element match the regular expression; otherwise, it executes the false clause. If you are not familiar with regular expressions and you need to use this function, get an experienced programmer to help you. This function can be used to execute very complex tests on the contents of an element.
3.3.3 @quarantine( )
The quarantine function causes the anonymizer to abort the anonymization process and place the unmodified object in the quarantine for manual processing. On a MIRC site, this processing must be done by the administrator or someone who has access to the MIRC site’s computer. At a site running the FieldCenter program, the program itself provides editing functions to allow the object to be manipulated and resubmitted to the anonymizer. The quarantine function must appear in a conditional clause of an if statement, but this is not enforced programmatically. If it were to appear in script that is executed during every anonymization, it would force the quarantining of every object.
3.3.4 @skip( )
The skip function causes the anonymizer to abort the anonymization process and to allow the unmodified object to continue through the system. It is intended to be used when it is possible to detect that an object has already been anonymized, thus preventing it from being anonymized a second time. The skip must appear in a conditional clause of an if statement, but this is not enforced programatically. Nevertheless, if it were to appear in a script that is executed during every anonymization, it would allow PHI through the process.
3.4 Examples
3.4.1 Patient and Trial Identifiers
For an ACCORD trial, the PatientName element must contain the case number followed by a delimiter character (“^”) and the field center identifier. If the case number is stored by the modality operator in the PatientComments element and the field center identifier is CWR, the Replacement text field for the PatientName element would read:
@contents(PatientComments)^CWR
If the ptid function were to be used to generate the PatientID automatically in the form Pt-nnnn (e.g., with no suffix), then the Replacement text field for the PatientName element would read:
@ptid(@SITEID,PatientID,Pt-,1,4,)^CWR
If the PREFIX parameter were defined to have the value “Pt-“, the above function call could also be written as:
@ptid(@SITEID,PatientID,@PREFIX,1,4,)^CWR
For an ACCORD trial, the OtherPatientIds element must contain the word ACCORD. The Replacement field for the OtherPatientIds element would then read:
ACCORD
For the WHIMS trial, the PatientName element must contain the patient’s initials followed by a dash, the name of the trial, another dash, and the site’s identifier, which is configured in the SITEID parameter. The Replacement field for the PatientName element would then read:
@initials(PatientName)-WHIMS-@param(@SITEID)
3.4.2 UID Remapping
To generate new UIDs for the StudyInstanceUID using the UID root 1.2.840.123.321, the Replacement field for the StudyInstanceUID element would then read:
@uid(1.2.840.123.321.,StudyInstanceUID)
If one were remapping UIDs as in the function call above, it would be more efficient to define the UIDROOT parameter to have the value “1.2.840.123.321.” and code the calls as:
@uid(@UIDROOT,StudyInstanceUID)
This ensures that all UIDs are mapped to the same root. If the root does not end in a period, the anonymizer appends a period, but it is good form to supply it.
3.4.3 Keeping and Removing Elements
If the Remove unspecified elements box is checked and the value of an element must be preserved, the Replacement field for the element would then read:
@keep()
If the Keep group 18 box is checked, but a specific group 18 element must be removed, the Replacement field for that element would then read:
@remove()
3.4.4 Conditionally Modifying Elements
If the InstitutionName element is to be kept if it is present and non-blank, but replaced with static text if it is missing or blank, the Replacement field for the element would read:
@if(InstitutionName,isblank){My Hospital}{@keep()}
If the StudyComments element is being used to contain a trial patient ID and the ID must have exactly seven numeric digits, and if this element is to be copied to the PatientID element, the Replacement field for the PatientID element would read:
@contents(StudyComments)
And the Replacement field for the StudyComments element would read:
@if(StudyComments,matches,”\\d{7}.*”){}{@quarantine()}
Note that the coding of the regular expression in this case looks odd because the escape character is doubled. This is necessary because the anonymizer and the regular expression processor both use the same escape character, the backslash. Thus, to get one escape character, it must itself be escaped.
Note also that the true clause will force the StudyComments element to be deleted from the object, which would be reasonable, since its contents are being moved to the PatientID field. If other processing were desired in this situation, it could be placed in the true clause.
In this example, a better script for the PatientID element might be:
@contents(StudyComments,”\\D”)
This will delete all non-numeric characters from the string used for the PatientID. Some modalities insert a newline character at the end of entry fields when the operator ends an entry with the Enter key. This script filters out those characters and anything else in the field that is not numeric. Note that the regular expression in the StudyComments script above ended with “.*”. That script will match a seven-digit string ending in a newline.
3.4.5 Conditionally Processing Files
The skip function can be used in the following way to avoid processing files that have already been processed. Suppose that the ReferringPhysicianName element is not used in the clinical trial. Its Replacement field could be coded as:
@if(ReferringPhysicianName,matches,”DONE”){@skip()}{DONE}
This will cause the anonymizer to insert the word DONE in the field on the first pass. If the object were to be processed again, the anonymizer would detect the word and skip the anonymization process.
3.4.6 Parsing Element Content
The contents(ElementName,"regex") function can be used to parse the contents of an element, retrieving only a portion of its value. Suppose that the StudyComments element is populated by a modality operator with specially formatted content: a numeric code followed by other information including a user ID:
78.7812 [ADJUSTED: HE41328 - 01/02/2007 13:00:26]
The following function call would retrieve the leading code (78.7812):
@contents(StudyComments,"\\s.*")
The following function call would retrieve the user's ID (HE41328):
@contents(PatientName,"([^:]*:\\s+)|(\\s*-.*)")
4 Saving the Changes
After configuring the Select checkboxes and Replacement fields, scroll to the bottom of the window and click the Update anonymizer.properties button. The page, with any changes made, will be redisplayed. At that point, you can continue editing the page or close it.
5 Enabling the Changes
Changes to the dicom-anonymizer.properties file are not enabled until the DICOM service is started (or restarted, if it is already running). To do so, click the Start/Restart button in the DICOM Service column on the Admin page. In the FieldCenter program, changes to the anonymizer configuration go into effect immediately after clicking the Save button.
6 Advanced Configuration
The anonymizer can be extended to meet specialized requirements by editing the dicom-anonymizer.properties file, which drives both the anonymizer itself and the anonymizer configurator in the MIRC DICOM Service and the MIRC FieldCenter and DicomEditor programs. A word to the wise: a certain amount of caution should be observed when editing powerful files.
The File Service and each Storage Service has its own DICOM Service, and each DICOM Service has its own anonymizer.
The File Service's DICOM anonymizer script file is located at: Tomcat/webapps/file/dicom-anonymizer.properties
A Storage Service's DICOM anonymizer script file is located at: Tomcat/webapps/[storageservicename]/trial/dicom-anonymizer.properties
The anonymizer.properties file is a text file that can be edited with any good text editor like TextPad. The content of the file is a set of properties, one per line, in the form:
- key = value
Properties beginning with # are disabled. Do not remove disabled properties or the anonymizer configurator will lose knowledge of the property. The order of the lines in the file determines the order in which the anonymizer configurator presents them to the user. There are four basic kinds of keys:
- Keys beginning with param. are parameters. Traditionally, parameter names are all in upper case and all the parameters are defined at the top of the file, but there is no programmatic requirement to do so. If you want to define additional parameters for use in the DICOM element scripts, you can add them by appending the parameter name to the prefix, like this:
param.NEWPARAM = value
The = sign is required. The value is optional. - Keys beginning with set. provide replacement scripts for individual DICOM elements. Additional elements can be added. It is best to add them in sequence to make it easy to find them in the anonymizer configurator table, but there is no programmatic requirement to do so. Set keys have the form:
set.[gggg,eeee]ElementName= value
The ElementName is traditionally the name recognized by the dcm4che DICOM class library for the element, although it is the [group,element] designation that determines which element is modified by the script. When adding an element for a private group, you can pick any name you wish, but scripts cannot reference the element by name. The value is optional. - Keys beginning with keep.group immediately followed by the hex value of a DICOM group number, as in keep.group18, are global keep commands. They do not contain scripts. To provide a label for the command in the anonymizer configurator, the value of the property can be supplied, like this:
keep.group18 = Keep group 18 [recommended]
The standard dicom-anonymizer.properties file contains keep commands for groups 18, 20, and 28, and default label values for those groups are defined in the program. They may be overridden by specifying values in the dicom-anonymizer.properties file. A typical use of this type of property is to provide a convenient way to keep a specific private group, but standard DICOM groups can be added as well. - Keys beginning with remove. are global remove commands. The anonymizer cannot be extended with remove commands.
6.1 Precedence
It is possible to create a set of instructions that might appear to be self-contradicting, so an instruction precedence is needed. The principle for defining precedence is:
- The command most specific to an element takes precedence over global commands.
- Global keep commands take precedence over global remove commands.
Thus, if an element is part of a private group and private groups are to be removed, but the element has a script defining it to be kept, it is kept. This can be tricky because the DICOM class library does not enforce any rules on private groups, so you must be sure to keep group length elements if you are going to partially keep private groups.
For another example, if an element is not selected (checked) and unchecked elements are to be globally removed, but the element is part of a group to be kept, it is kept.
Similarly, if an element that is part of a private group is not selected and private groups are to be globally removed, but the element’s group is to be kept, the element is kept.
There is one exception to the principle: if overlays are to be globally removed, that command takes precedence over any keep commands that have been defined for individual overlay groups.