XML IFilter is a plug-in that allows Microsoft Search products and services to index XML files, enabling customers to search and organize their content. XML IFilter extracts values of elements and attributes of XML document and indexes them as document content and/or metadata. XML IFilter is easily configurable and can support various indexing schemas including selective indexing of XML nodes. See "Information Retrieval" section for more information.
XML IFilter supports SharePoint Server, SQL Server Full-Text Search, Window Search and all other products based on Microsoft Search technology.
XML IFilter supports the following Microsoft server operating systems:
XML IFilter supports the following Microsoft desktop operating systems:
XML IFilter supports the following Microsoft Search products
When parsing XML file XML IFilter follows rules set up through the configuration file. Default configuration file (XmlFilterConfig.xml) comes as part of XML IFilter installation. XmlFilterConfig.xml configuration file is stored in XML IFilter installation directory (C:\Program Files\IFilterShop\XmlFilter by default). XML IFilter can be set up to work with user specified configuration file. To change configuration file name and/or location:
Out of the box XML IFilter outputs values of all XML elements as content of XML file. It does not output any document metadata. XML IFilter can be easily configured to support various indexing scenarios including selective indexing. Indexing scenario has to be set up through XML IFilter configuration file. Please refer to "Configuration File Format" section below.
XML IFilter configuration file is a document in XML format with the following structure:
<config> -- Root element, MUST exist <content> -- Defines xpaths that should or should not be indexed as document content, MUST exist. All XML elements that are not explicitly excluded in this section will be extracted as document content. <exclude> -- Defines xpaths that should not be indexed as document content, MUST exist. Can be empty or can include one or several <xpath> elements. <xpath> </xpath> -- Value of this node defines xpath that should not be indexed as document content, MAY exist. <xpath> </xpath> ... </exclude> <include> -- Defines xpaths that should be indexed as document content, MUST exist. Can be empty or can include one or several <xpath> elements <xpath> </xpath> -- Value of this node defines xpath that should be indexed as document content, MAY exist. <xpath> </xpath> ... </include> </content> <metadata> -- Defines xpaths that should or should not be indexed as document metadata, MUST exist. All XML elements that are not explicitly excluded in this section will be extracted as document metadata with the default Property GUID and Property Name set to the node name. <default> -- MUST exist, can be empty <guid> </guid> -- Defines Property GUID to be assigned by default to all document metadata. MAY exist, cannot be empty. If this node does not exist then Microsoft defined GUID for user defined metadata which is {D5CDD505-2E9C-101B-9397-08002B2CF9AE} will be used as a default GUID. </default> <exclude> -- Defines xpaths that should not be indexed as document metadata, MUST exist. Can be empty or can include one or several <xpath> elements. <xpath> </xpath> -- Value of this node defines xpath that should not be indexed as document metadata, MAY exist. <xpath> </xpath> ... </exclude> <include> -- Defines xpaths that should be indexed as document metadata, MUST exist. Can be empty or can include one or several <mapping> elements. <mapping> -- Defines xpath that should be indexed as document metadata, MAY exist, cannot be empty. <xpath> </xpath>-- Value of this node defines xpath that should be indexed as document metadata, MUST exist, cannot be empty. <property> -- Optional element that defines mapping between XML element and Indexing Service Property. MAY exist, cannot be empty. If this node does not exist then the element defined in corresponding <xpath> node will be output with the default settings. <guid> </guid> -- Defines Property GUID, MAY exist, cannot be empty. <name> </name> -- Defines Property Name, MAY exist, cannot be empty. Exclusive with <id> below. <id> </id> -- Defines Property ID, MAY exist, cannot be empty. Exclusive with <name> above. <type> </type> -- Defines Property type, MAY exist, cannot be empty. VT_LPWSTR type is used by default. XML IFilter can also output properties of VT_FILETIME and VT_INT types </property> ... </mapping> ... </include> </metadata> <namespaces> -- Elements in this section define mappings between namespaces and their aliases, MAY exist, can be empty or can include one or several <namespace> elements. <namespace> -- Defines alias for XML schema, MAY exist, can be empty. <alias> </alias> -- MAY exist, cannot be empty. Missing <alias> element denotes the default namespace <schema> </schema>-- MAY exist, cannot be empty. </namespace> ... </namespaces> </config>
When parsing XML file, XML IFilter first extracts and outputs as document content values of all XML elements and attributes, excluding the nodes defined in <content><exclude> </exclude> </content> section. <xpath>//*</xpath> allows to exclude all XML elements from indexing. After that XML IFilter indexes content of XML elements defined in <content><include> </include></content> section.
Then XML IFilter extracts and outputs as document metadata values of all XML elements, excluding the nodes defined in <metadata><exclude> </exclude></metadata> section. In accordance to Microsoft IFilter specification, XML IFilter defines each XML metadata as combination of Property Set and Property Name. During this step XML IFilter assigns to all document properties Property Set GUID defined in <metadata><default><guid> </guid></default></metadata>section. XML IFilter uses node names to define Property Names. If default Property Set GUID is not defined through the configuration file then XML IFilter uses Microsoft defined GUID for user defined metadata which is {D5CDD505-2E9C-101B-9397-08002B2CF9AE}. After that XML IFilter indexes content of XML elements defined in <metadata><include> </include></metadata> section. This section allows to output values of XML elements under different Property Set GUIDs and Property IDs. It also allows to index XML nodes as non-text elements, such as properties of VT_FILETIME and VT_INT types.
XML IFilter uses settings defined in the configuration file to parse XML files. If format of configuration file is invalid XML IFilter will not operate properly. The IFilter comes with XmlConfigTest.exe utility that allows to test XML IFilter configuration file. XmlConfigTest.exe application is stored in XML IFilter installation directory (C:\Program Files\IFilterShop\XmlFilter by default). It is a command line application that accepts full path to the configuration file as a single command line argument.
Setup file is a self-extracting archive that must be downloaded and opened on the machine where you wish to use XML IFilter.
Some Microsoft Search products require additional setup steps as described below:
When integrated with Windows Search, XML IFilter uses a temporary directory to process XML files. Due to Windows Search security restrictions, IFilters are not able to utilize the default system temporary directory. Therefore, XML IFilter must be set to work with a user specified temporary directory. To change the XML IFilter temporary directory's settings:
If you ever have to uninstall XML IFilter application you can easily do it using any of the following methods:
Custom temporary directory has to be configured as described in "Additional Setup Steps" for Windows Search section above.
This product includes software developed by the Apache Software Foundation (http://www.apache.org).
Version 3.0
Version 2.0
© IFilterShop LLC. All Rights Reserved