The MIRCquery Schema

From MircWiki
Jump to navigation Jump to search

This article describes technical requirements for participation in the MIRC community as a storage service. Additional information is provided in The MIRCqueryresult Schema and The MIRCdocument Schema.

The intended audience for this article is the developer or system administrator of a teaching file system who is implementing a MIRC storage service to make the system’s teaching files available to the MIRC community.

1 Overview

MIRC may be defined as a collection of web sites that share a common query mechanism.

The two key components involved in the query process are:

  • Query Service: a web site that is accessed by a MIRC user with a web browser. The query service provides web pages that allow a user to define search criteria and select sites in the MIRC community to be searched. It queries the selected sites, organizes the query results into a web page or pages, and sends the pages to the browser.
  • Storage Service: a web site that stores information of interest to MIRC users. In response to a query received from a query service, it identifies all the information it stores which meets the search criteria, constructs a query response containing an abstract of the information to allow a user to determine whether the information is of interest, and sends the response to the query service. It also serves the information directly to a MIRC user in response to a request from the user.

The query service and storage service are independent. A MIRC site may include a query service or a storage service, or both.

2 MIRC Query

A MIRC query is an XML object in the form defined by the MIRCquery schema. It is passed from a query site to a storage service via an HTTP POST of content type text/xml.

The following is an example of a MIRCquery:

<MIRCquery firstresult=”…” maxresults=”…” queryUID=”…” unknown=”…”>
    <title> . . . </title>
    <author>    . . .    </author>
    <abstract> . . . </abstract>
    <keywords> . . . </keywords>
    <history> . . . </history>
    <findings> . . . </findings>
    <diagnosis> . . . </diagnosis>
    <differential-diagnosis> . . . </differential-diagnosis>
    <discussion> . . . </discussion>
    <pathology> . . . </pathology>
    <anatomy> . . . </anatomy>
    <organ-system> . . . </organ-system>
    <code coding-system=”…”> . . . </code>
    <modality> . . . </modality>
    <patient>
        <pt-age>
            <years> . . . </years>
            <months> . . . </months>
            <weeks> . . . </weeks>
            <days> . . . </days>
        </pt-age>
        <pt-sex> . . . </pt-sex>
        <pt-race> . . . </pt-race>
        <pt-species> . . . </pt-species>    <!—-veterinary-->
        <pt-breed> . . . </pt-breed>        <!--veterinary-->
    </patient>
    <image>
        <format> . . . </format>
        <compression> . . . </compression>
        <modality> . . . </modality>
        <anatomy> . . . </anatomy>
        <pathology> . . . </pathology>
    </image>
    <document-type> . . . </document-type>
    <category> . . . </category>
    <level> . . . </level>
    <access> . . . </access>
    <peer-review/>
    <language code=”…”> . . . </language>
    … free text search field …
</MIRCquery>

2.1 MIRCquery Attributes

The firstresult and maxresults attributes of the MIRCquery element are used to allow the query service to break the responses into groups. The firstresult attribute specifies the first result to be returned by the storage service. The value 0 corresponds to the first result in the list. If the attribute is missing, the default value 0 is to be used.

The maxresults attribute specifies the maximum number of results to be returned by the storage service. For example, if the query service is grouping results into sets of 10 and is asking for the third group, firstresult would be set to 20 and maxresults would be set to 10. If the maxresults attribute is missing or 0, the storage service is to return 1 result.

The queryUID attribute may be generated by query services to uniquely identify the query. If present, it can be used by the storage service to cache the results of the query. When provided, all page requests, (all MIRCquery elements with different firstresult attribute values but otherwise containing the same child elements) have the same queryUID attribute value.

The unknown attribute is optionally provided by the query service to instruct the storage service to return the query results as a set of unknowns, providing an alternative title and abstract that conceal the diagnostic result from the student. The value of the attribute may be yes or no. If the attribute is missing, the default value of no is to be used.

2.2 MIRCquery Child Elements

All the child elements are optional in a MIRCquery. A storage service uses the value of any child element included in a MIRCquery as a query field and searches its index for documents containing the contents of the query field in data that is identified to be of the type defined by the name of the child element. Thus, if a MIRCquery contains an

<anatomy>chest</anatomy>

child, the storage service searches its index for documents which containing the word chest in a field identified as anatomy.

Certain elements have enumerated values:

  • pt-sex
  • format
  • compression
  • modality
  • document-type
  • level
  • access
  • language code="…"

These values are defined in The MIRCdocument Schema.

The author element is a special case. Any text contained within the author element of a MIRCquery is intended to be used as a match against any information associated with an author. The RSNA storage service, for example, uses the contents of the author element of the MIRCquery to search all the child elements of the author element in a MIRCdocument (e.g., the name, affiliation, and contact elements).

The peer-review element is another special case. If the element is present in the MIRCquery, all documents listed in search results are required to have been peer-reviewed. If it is missing from the MIRCquery, no constraints are placed on the peer-review status of documents listed in search results. Any text value of the peer-review element is ignored.

3 Query Rules

There may be at most one child element of each type in a MIRCquery. Complex searches within an element type are done using the boolean syntax described below.

The RSNA query and storage services implement the following search rules:

  • If text appears in any MIRCquery child element or attribute, it is a required match for a corresponding element in a document to be listed in the search results.
  • All child elements appearing within a MIRCquery are required matches, e.g., a matching document is one that matches all query fields.
  • Child elements that are not included in a MIRCquery are not required matches. Thus, an empty MIRCquery (<MIRCquery/>) is a match to all documents in the storage service.
  • Text is not case-sensitive.
  • A free-text search, matching text anywhere in a document, is done by placing the search text in the text value of the <MIRCquery> element.
  • Search text containing separate words with no intervening operator characters results in a logical AND of all the words, but not necessarily in order.
  • Search text can be constructed with a logical OR using the “|” character.
  • Search text can be constrained to appear in order by placing it in quotes within the search string.
  • Complex combinations of logical AND and logical OR operations can be created using the parenthesis operator, ( … ).

Note that the first bullet above implies that if text appears in a MIRCquery element or attribute that is not supported by a site’s software implementation, the site must return zero matches.

4 Examples

4.1 Free-text Search

The following query:

<MIRCquery>
    alpha bravo charlie
</MIRCquery>

matches documents containing all the words alpha, bravo, and charlie in any order anywhere in the document, together or apart.

The following query:

<MIRCquery>
    alpha | “bravo charlie” delta
</MIRCquery>

matches documents containing the word alpha. It also matches all documents containing the words bravo, and charlie together, in order, along with the word delta anywhere in the document, together or apart, whether alpha appears or not. This query is identical to:

<MIRCquery>
    alpha | (“bravo charlie” delta)
</MIRCquery>

The following query:

<MIRCquery>
    (alpha | “bravo charlie”) delta
</MIRCquery>

matches documents containing the word delta and either or both of alpha and bravo charlie, where bravo charlie appears as written.

4.2 Author Search

The following query:

<MIRCquery>
	    <author> john </author>
</MIRCquery>

matches documents where at least one of the author’s names is John, so it matches author names John Doe and Elton John. It would also match a document authored by Jane Doe of Johns Hopkins.

The following query:

<MIRCquery>
    <author> john doe </author>
</MIRCquery>

matches documents where an author’s name is “John Doe” or “Doe John”. It also matches documents with two authors named “John Smith” and “Jane Doe”, but it does not match a document with only one author named “Elton John”. The following query:

<MIRCquery>
    <author> ”john doe” </author>
</MIRCquery>

matches documents where an author’s name is John Doe or Elton John Doe but not John Q. Doe. It also does not match documents with two authors named John Smith and Jane Doe.

4.3 Patient Search

The following query:

<MIRCquery>
    <patient>
        <pt-age>
            <months>0-2</months>
        <pt-age>
    </patient>
</MIRCquery>

matches documents containing at least one reference to a patient from birth to 2 months old.

4.4 Image Search

The image element is used to search for documents containing images. The following query:

<MIRCquery>
    <image>
        <format>DICOM</format>
        <modality>CT</modality>
        <anatomy>brain</anatomy>
    </image>
</MIRCquery>

matches documents referencing at least one DICOM CT image of the brain.

The modality and anatomy elements may also appear as children of the MIRCquery element, in which case they refer to the document as a whole rather than to images within the document. Thus, the following query:

<MIRCquery>
    <modality>MR</modality>
    <anatomy>chest</anatomy>
</MIRCquery>

matches documents referencing the MR modality and chest anatomy, whether they contain images or not.

4.5 Code Search

The code element is used in documents stored in MIRCdocument format to identify medical codes, e.g. ACR, CPT, SNOMED, etc.

The following query:

<MIRCquery>
    <code coding-system=”ACR”>1.2</code>
</MIRCquery>

matches documents containing an ACR code with the value 1.2.

The following query:

<MIRCquery>
    <code>1.2</code>
</MIRCquery>

matches documents containing a code element with the value 1.2 in any coding-scheme.

4.6 Document Description Search

The following query:

<MIRCquery>
    <document-description>
        radiologic teaching file
    </document-description>
</MIRCquery>

matches documents identified as teaching files.

The following query:

<MIRCquery>
    <document-type>
        radiologic teaching file
    </document-type>
    <level>
        advanced
    </level>
    <peer-review/>
    <access>
        public
    </access>
</MIRCquery>

matches advanced teaching files that have been peer-reviewed and that are defined as publicly accessible.

4.7 Combined Element and Free Text Search

When multiple elements occur in a query, the search criteria are treated as all being required. The following query:

<MIRCquery>
    <access>public</access>
    <author>john</author>
    <modality>MR</modality>
    <anatomy>chest</anatomy>
    alpha bravo charlie
</MIRCquery>

matches public documents where at least one of the author’s names is John, at least one reference to an MR chest image is included, and the words alpha, bravo, and charlie all occur in any order anywhere in the document, together or apart.

5 Notes and Suggestions for Implementers

5.1 Content Type

To provide the most efficient support for all languages, query services and storage services should transmit their MIRCquery and MIRCqueryresult contents in UTF-8. Some MIRC query service implementations, in order to convert to UTF-8, transmit a content type of:

text/xml; charset=utf-8

For that reason, storage services should not test the Content-Type using something like:

if (contentType.equals(“text/xml”)) { … }

but instead:

if (contentType.indexOf(“text/xml”) != -1) { … }

5.2 The Abstract

The contents of the abstract element should be brief – less than 10 lines of text – in order to keep the complete set of query results short enough to allow the user to look through the results and select the desired document. The RSNA query service imposes a 1000-character limit on the length of the abstract. Abstracts that exceed the limit are truncated, and any embedded element tags (e.g. HTML) are suppressed to ensure that the result remains well-formed.

If the document is not in HTML format, its format can be noted in the abstract to assist the user in deciding whether the document is of interest.