IFilterShop XML IFilter Server Edition Release 1.0 README

CONTENT OF README FILE


General Information

XML IFilter is a plug-in that allows Microsoft Search products and services to index XML files, enabling customers to search and organize their content. XML IFilter extracts values of elements and attributes of XML document and indexes them as document content and/or metadata. XML IFilter is easily configurable and can support various indexing schemas including selective indexing of XML nodes. See "Information Retrieval" section for more information.

XML IFilter supports Indexing Service, SharePoint Portal Server, SQL Server Full-Text Search, Window Search Service and all other products based on Microsoft Search technology.


System Requirements

XML IFilter supports the following Microsoft server operating systems:

XML IFilter supports the following Microsoft desktop operating systems:

XML IFilter supports the following Microsoft Search products


Information Retrieval

When parsing XML file XML IFilter follows rules set up through the configuration file. Default configuration file (XmlFilterConfig.xml) comes as part of XML IFilter installation. XmlFilterConfig.xml configuration file is stored in XML IFilter installation directory (C:\Program Files\IFilterShop\XmlFilter by default). XML IFilter can be set up to work with user specified configuration file. To change configuration file name and/or location:

  1. Stop all appropriate Search services.
  2. Open registry key "HKEY_LOCAL_MACHINE\SOFTWARE\IFilterShop\XmlFilter"
  3. Add new String value named "ConfigFile" and enter the full path to the new configuration file. If this value is missing, empty or the path is not pointing to a valid file, default configuration file will be used.
  4. Start all appropriate Search services and re-index catalogs containing .xml files.

Out of the box XML IFilter outputs values of all XML elements as content of XML file. It does not output any document metadata. XML IFilter can be easily configured to support various indexing scenarios including selective indexing. Indexing scenario has to be set up through XML IFilter configuration file. Please refer to "Configuration File Format" section below.


Configuration File Format

XML IFilter configuration file is a document in XML format with the following structure:

<config> -- Root element, MUST exist
   <content> -- Defines xpaths that should or should not be indexed as document content, MUST exist.
                All XML elements that are not explicitly excluded in this section will be extracted as document
                content.
      <exclude> -- Defines xpaths that should not be indexed as document content, MUST exist.
                   Can be empty or can include one or several <xpath> elements.
         <xpath> </xpath> -- Value of this node defines xpath that should not be
                             indexed as document content, MAY exist.
         <xpath> </xpath>
         ...
      </exclude>
      <include> -- Defines xpaths that should be indexed as document content, MUST exist.
                   Can be empty or can include one or several <xpath> elements
         <xpath> </xpath> -- Value of this node defines xpath that should be
                             indexed as document content, MAY exist.
         <xpath> </xpath>
         ...
      </include>
   </content>

   <metadata> -- Defines xpaths that should or should not be indexed as document metadata, MUST exist.
                 All XML elements that are not explicitly excluded in this section will be extracted as document
                 metadata with the default Property GUID and Property Name set to the node name.
      <default> -- MUST exist, can be empty
         <guid> </guid> -- Defines Property GUID to be assigned by default to all
                           document metadata. MAY exist, cannot be empty. If this node does not exist
                           then Microsoft defined GUID for user defined metadata which is
                           {D5CDD505-2E9C-101B-9397-08002B2CF9AE} will be used as a default GUID.
      </default>
      <exclude> -- Defines xpaths that should not be indexed as document metadata, MUST exist.
                   Can be empty or can include one or several <xpath> elements.
         <xpath> </xpath> -- Value of this node defines xpath that should not be indexed as
                             document metadata, MAY exist.
         <xpath> </xpath>
         ...
      </exclude>
      <include> -- Defines xpaths that should be indexed as document metadata, MUST exist.
                   Can be empty or can include one or several <mapping> elements.
         <mapping> -- Defines xpath that should be indexed as document metadata, MAY exist, cannot be empty.
            <xpath> </xpath>-- Value of this node defines xpath that should be indexed as document metadata,
                                MUST exist, cannot be empty.
            <property> -- Optional element that defines mapping between XML element and Indexing Service Property.
                          MAY exist, cannot be empty. If this node does not exist then the element defined in
                          corresponding <xpath> node will be output with the default settings.
               <guid> </guid> -- Defines Property GUID, MAY exist, cannot be empty.
               <name> </name> -- Defines Property Name, MAY exist, cannot be empty.
                                 Exclusive with <id> below.
               <id> </id> -- Defines Property ID, MAY exist, cannot be empty.
                                 Exclusive with <name> above.
               <type> </type> -- Defines Property type, MAY exist, cannot be empty.
                                 VT_LPWSTR type is used by default. XML IFilter can also
                                 output properties of VT_FILETIME and VT_INT types
            </property>
            ...
         </mapping>
         ...
      </include>
   </metadata>

   <namespaces> -- Elements in this section define mappings between namespaces and their aliases,
                   MAY exist, can be empty or can include one or several <namespace> elements.
      <namespace> -- Defines alias for XML schema, MAY exist, can be empty.
         <alias> </alias> --  MAY exist, cannot be empty. Missing <alias> element denotes the default namespace
         <schema> </schema>--  MAY exist, cannot be empty.
      </namespace>
      ...
   </namespaces>
</config>


Content and Metadata Indexing

When parsing XML file, XML IFilter first extracts and outputs as document content values of all XML elements and attributes, excluding the nodes defined in <content><exclude> </exclude> </content> section. <xpath>//*</xpath> allows to exclude all XML elements from indexing. After that XML IFilter indexes content of XML elements defined in <content><include> </include></content> section.

Then XML IFilter extracts and outputs as document metadata values of all XML elements, excluding the nodes defined in <metadata><exclude> </exclude></metadata> section. In accordance to Microsoft specification, XML IFilter defines each XML metadata as combination of Property Set and Property Name. During this step XML IFilter assigns to all document properties Property Set GUID defined in <metadata><default><guid> </guid></default></metadata>section. XML IFilter uses node names to define Property Names. If default Property Set GUID is not defined through the configuration file then XML IFilter uses Microsoft defined GUID for user defined metadata which is {D5CDD505-2E9C-101B-9397-08002B2CF9AE}. After that XML IFilter indexes content of XML elements defined in <metadata><include> </include></metadata> section. This section allows to output values of XML elements under different Property Set GUIDs and Property IDs. It also allows to index XML nodes as non-text elements, such as properties of VT_FILETIME and VT_INT types.


How to test XML IFilter Configuration File

XML IFilter uses settings defined in the configuration file to parse XML files. If format of configuration file is invalid XML IFilter will not operate properly. The IFilter comes with XmlConfigTest.exe utility that allows to test XML IFilter configuration file. XmlConfigTest.exe application is stored in XML IFilter installation directory (C:\Program Files\IFilterShop\XmlFilter by default). It is a command line application that accepts full path to the configuration file as a single command line argument.


Installation Instructions

Setup file is a self-extracting archive that must be downloaded and opened on the machine where you wish to use XML IFilter.

  1. Stop all appropriate Search services.
  2. Uninstall any previous version of XML IFilter.
  3. Start setup file and follow the on-screen instructions.
  4. Start all appropriate Search services.
  5. Re-index catalogs containing XML files.


Additional Setup Steps

Some Microsoft Search products require additional setup steps as described below:

SharePoint Portal Server 2003:

  1. Open "Site Settings" web page
  2. In the "Search Settings and Indexed Content" section click on "Configure search and indexing"
  3. Click on "Include file types"
  4. Make sure that ".xml" file type is included

Office SharePoint Server 2007:

  1. Open Shared Services Provider Admin Site
  2. In the "Search" section click on "Search settings"
  3. Click on "File type inclusions"
  4. Make sure that ".xml" file type is included

Windows SharePoint Services 3.0:

  1. Open registry key "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\<WSS Server Name>\Gather\Search\Extensions\ExtensionList"
  2. Add ".xml" extension to the list of indexable file types
  3. Restart Windows SharePoint Services Search

SharePoint Server 2010:

  1. In SharePoint Central Administration go to "General Application Settings" page
  2. In the "Search" section click on "Farm-Wide Search Administration"
  3. Click on " Search Service Application" link
  4. On the left side menu select "File Types"
  5. Make sure that ".xml" file type is included


How to Uninstall

If you ever have to uninstall XML IFilter application you can easily do it using any of the following methods:


Known Issues

XML files are not searchable in Windows Vista, Windows 7 or Windows Desktop Search

When integrated with Windows Desktop Search XML IFilter may use temporary directory to process .xml files. By default it uses system temporary directory. For Window Desktop Search versions 3.x and later XML IFilter must be set to work with user specified temporary directory. To change temporary directory settings:

  1. Stop Windows Search service.
  2. Open registry key "HKEY_LOCAL_MACHINE\SOFTWARE\IFilterShop\XmlFilter"
  3. Add new String value named "TempPath" and enter the full path to the new temporary directory. If this value is missing, empty or the path is not pointing to a valid directory, system temporary directory will be used. Please make sure that "Users" or "Authenticated Users" Group has "Full Control" permissions to the custom temporary directory.
  4. Start Windows Search service.
When using custom temporary directory, we recommend that you set it as "not indexable" in all your indexing products. Otherwise temporary files may be indexed. This will pollute the index and also can prevent temporary files from proper removal by XML IFilter.


Additional Information

Acknowledgements

This product includes software developed by the Apache Software Foundation (http://www.apache.org).


Contact Information

WWW:
http://www.ifiltershop.com
E-mail:
support@ifiltershop.com