Data Mining the MIRC Index
This article describes several ways to mine the data in the MIRC index. To perform the procedures listed here, you must be an administrator of the storage service whose index is to be accessed.
In releases from T10 to T29, the index of a storage service is a memory-resident XML DOM object which is constructed when Tomcat starts and which is dynamically maintained while MIRC is running. This object can be saved to the root directory of the storage service by clicking the Save Index button in the Storage Service column on the storage service's admin page. The filename of the saved index is saved-index-file.xml.
The saved index file can be accessed from a browser through the URL:
- [siteurl]/[storage service name]/saved-index-file.xml
To get the entire contents of the file, you must select the browser's View source menu idem. This will launch the configured text editor and display all the text of the object. At that point, you can select the text editor's Save as... menu item to save it to the local disk.
Mild caution: The index contains information that is pre-processed to make index searches reasonably efficient, but because it is intended for programmatic access, it may be difficult to read.
There are several options for processing the file:
- You can use a text editor to search for elements of interest and copy them to another file containing only the information of interest. In a large index, this is likely to be tedious.
- Since the index file is a well-formed XML string, you can insert it into an XML database and access it using the database's methods.
- You can write a program in Java or some other language that accesses the data programmaticaly using the XML DOM.
- You can create an XSL program to pre-filter the index on the MIRC site as described below.
Using the XSL Capability of the XML Server
The XML Server in the MIRC software allows authorized users to select any pre-stored XSL program to apply to an XML file when it is accessed. The default XSL program which is used to process an XML file is named for the root element in the XML file. Thus, since the root element in all MIRCdocuments is MIRCdocument, the default program that is used to process MIRCdocuments for viewing is called MIRCdocument.xsl. The XML Server looks for the program file first in the same directory where the XML file is located and, if not found there, then in the root of the storage service. Authorized users can select another XSL program file by including the xsl query string. Here are three examples to make it clear:
- filename.xml is processed with the default XSL program for the XML file.
- filename.xml?xsl=xyz.xsl is processed with the xyz.xsl program.
- filename.xml?xsl= is not processed at all, and returns the original XML