PDF+ IFilter is an enhanced IFilter for Adobe PDF files. It extends Adobe PDF IFilter to extract text and XMP metadata from PDF files. It may also work without Adobe PDF IFilter, in which case only XMP metadata will be indexed. PDF+ IFilter supports Dublin Core, XMP Basic, Adobe PDF and custom XMP schemas. PDF+ IFilter is easily extensible and can support other XMP core schemas such as Rights Management or Media Management. If your metadata needs are not covered by the core schemas, you may add custom schemas as extensions. Please refer to "Support for custom XMP schemas" section for more information. For better integration with Microsoft applications PDF+ IFilter also outputs common office document properties such as 'DocAuthor', 'DocKeywords' and others. See "Office Document Properties" sections below for more information.
PDF+ IFilter supports Indexing Service, SharePoint Portal Server, SQL Server Full-Text Search, Window Search Service and all other products based on Microsoft Search technology.
PDF+ IFilter supports the following Microsoft server operating systems:
PDF+ IFilter supports the following Microsoft desktop operating systems:
PDF+ IFilter supports the following Microsoft Search products
PDF+ IFilter extends Adobe PDF IFilter to extract text and XMP metadata from PDF files. It may also work without Adobe PDF IFilter, in which case only XMP metadata will be indexed.
The Dublin Core Schema provides a set of commonly used properties.
PDF+ IFilter extracts the following XMP Dublin Core metadata:
| XMP Dublin Core Metadata | Property Name | Property Type | Description |
| dc:contributor | contributor | VT_LPWSTR | Contributors to the resource (other than the authors) |
| dc:coverage | coverage | VT_LPWSTR | The extent or scope of the resource |
| dc:creator | creator | VT_LPWSTR | The authors of the resource (listed in order of precedence, if significant) |
| dc:date | date | VT_FILETIME | Date(s) that something interesting happened to the resource |
| dc:description | description | VT_LPWSTR | A textual description of the content of the resource |
| dc:format | format | VT_LPWSTR | The file format used when saving the resource |
| dc:identifier | identifier | VT_LPWSTR | Unique identifier of the resource |
| dc:language | language | VT_LPWSTR | Language of the document |
| dc:publisher | publisher | VT_LPWSTR | Publishers |
| dc:relation | relation | VT_LPWSTR | How the content relates to other resources |
| dc:rights | rights | VT_LPWSTR | Informal rights statement |
| dc:source | source | VT_LPWSTR | Unique identifier of the work from which this resource was derived |
| dc:subject | subject | VT_LPWSTR | An unordered array of descriptive phrases or keywords that specify the topic of the content of the resource |
| dc:title | title | VT_LPWSTR | The title of the document, or the name given to the resource |
| dc:type | type | VT_LPWSTR | A document type; for example, novel, poem, or working paper |
In accordance with Microsoft IFilter specification, PSD+ IFilter defines each metadata as combination of Property Set and Property Name. All XMP Dublin Core metadata belong to {DC099694-64F5-4371-9AA9-868846A5657E} Property Set GUID.
The XMP Basic Schema contains properties that provide basic descriptive information.
PDF+ IFilter extracts the following XMP Basic metadata:
| XMP Basic Metadata | Property Name | Property Type | Description |
| xap:Advisory | Advisory | VT_LPWSTR | An unordered array specifying properties that were edited outside the authoring application |
| xap:BaseURL | BaseURL | VT_LPWSTR | The base URL for relative URLs in the document content |
| xap:CreateDate | CreateDate | VT_FILETIME | The date and time the resource was originally created |
| xap:CreatorTool | CreatorTool | VT_LPWSTR | The name of the first known tool used to create the resource |
| xap:Identifier | Identifier | VT_LPWSTR | An unordered array of text strings that unambiguously identify the resource within a given context |
| xap:MetadataDate | MetadataDate | VT_FILETIME | The date and time that any metadata for this resource was last changed. It should be the same as or more recent than xap:ModifyDate |
| xap:ModifyDate | ModifyDate | VT_FILETIME | The date and time the resource was last modified |
| xap:Nickname | Nickname | VT_LPWSTR | A short informal name for the resource |
All XMP Basic metadata belong to {BA64F93D-FBA6-4b75-8F7F-37FC8B493176} Property Set GUID.
Adobe PDF Schema specifies properties used with Adobe PDF files.
PDF+ IFilter extracts the following XMP Adobe PDF metadata:
| XMP Adobe PDF Metadata | Property Name | Property Type | Description |
| pdf:Keywords | Keywords | VT_LPWSTR | External Keywords |
| pdf:PDFVersion | PDFVersion | VT_LPWSTR | PDF file version |
| pdf:Producer | Producer | VT_LPWSTR | Name of tool that created PDF document |
All XMP Adobe PDF metadata belong to {A2BAC514-218A-43E8-A3EF-7598A66B19BE} Property Set GUID.
PDF+ IFilter is easily configurable for additional XMP core schemas and custom XMP schemas. To make your custom XMP schema searchable:
| Registry value | Description | Example for PDFx Schema |
| NameSpace | URI for custom XMP schema | http://ns.adobe.com/pdfx/1.0/ |
| GUID | Property Set GUID that will be used by Indexing Service * | {2C443B1E-F1E2-404F-974D-E21FEF8E72AA} |
| FileName | Full path to the text file with custom XMP schema properties mapping ** | C:\IFilterShop\PdfPlusFilter\PDFxSchema.txt |
* GUID shall be a newly generated GUID
** FileName value is optional. If this value is missing then all properties within the schema will be indexed
Each line in the text file referred by FileName value shall have the following structure:
<XMP Metadata>;<Property Name>;<Property Type>, where
CustomProp1;ProjectName CustomProp2;ProjetNum;VT_INT CustomProp3;ProjetStartDate;VT_FILETIME
A sidecar file is an alternative to storing the metadata directly in PDF file itself by instead storing the data in a separate .xmp file with the same base name as the PDF file. Sidecars are typically used in cases when PDF file should not be edited directly.
PDF+ IFilter supports indexing of XMP sidecar files. When loaded for a PDF file, PDF+ IFilter will at first try to locate .xmp file with the same base name and the same location as the original PDF file. If .xmp file is found, PDF+ IFilter will extract XMP metadata from that file. If .xmp file is not available, PDF+ IFilter will extract XMP metadata from the PDF file itself.
PDF+ IFilter outputs the following standard Indexing Service properties as duplicates of certain XMP Dublin Core and XMP Basic properties when support for Dublin Core Schema and XMP Basic Schema are enabled.
| Property Friendly Name | Property Set GUID | Property Name | Description | XMP Metadata |
| DocAuthor | {F29F85E0-4FF9-1068-AB91-08002B27B3D9} | 4 | Author of the document | dc:creator |
| DocCreatedTm | {F29F85E0-4FF9-1068-AB91-08002B27B3D9} | 12 | Time document was created | xap:CreateDate |
| DocKeywords | {F29F85E0-4FF9-1068-AB91-08002B27B3D9} | 5 | Keywords for the document | dc:subject |
| DocLastSavedTm | {F29F85E0-4FF9-1068-AB91-08002B27B3D9} | 13 | Time document was last saved | xap:ModifyDate |
| DocSubject | {F29F85E0-4FF9-1068-AB91-08002B27B3D9} | 3 | Subject of the document | dc:description |
| DocTitle | {F29F85E0-4FF9-1068-AB91-08002B27B3D9} | 2 | Title of the document | dc:title |
Setup file is a self-extracting archive that must be downloaded and opened on the machine where you wish to use PDF+ IFilter.
PDF+ IFilter uses PDF text extractor to index text of PDF document. When installed standalone, PDF+ IFilter indexes only XMP metadata embedded into PDF document. PDF text extractor has to be installed on the machine in order to enable PDF+ IFilter to index PDF text. By default PDF+ IFilter integrates with Adobe IFilter 9. PDF+ IFilter can be configured to integrate with other PDF text extractors. Please follow instructions below:
| PDF text extractor | GUID |
| Adobe PDF IFilter (ver. 5.0 or 6.0) | {4C904448-74A9-11d0-AF6E-00C04FD8DC02} |
| Adobe PDF IFilter (ver. 8.x or later) | {E8978DA6-047F-4E3D-9C78-CDBE46041603} |
By default PDF+ IFilter outputs multiple instances of the property as multiple properties. In products such as SharePoint Portal Server 2003 only one instance of the same value property can be indexed. PDF+ IFilter can be configured to output multiple instances of the property as a single value property. To enable this:
Some Microsoft Search products require additional setup steps as described below:
If you ever have to uninstall PDF+ IFilter application you can easily do this using any of the following methods:
PDF+ IFilter uses PDF text extractor to index text of PDF files. When installed standalone, PDF+ IFilter extracts XMP metadata only. You have to install Adobe PDF IFilter or other PDF text indexer in addition to PDF+ IFilter in order to search both metadata and content of PDF files. Please refer to "Text Extractor Setup" section for more information.
PLEASE NOTE (for Indexing Service only). When both IFilters are installed Indexing Service relies on registration order to choose which one to use. Each time you start Indexing Service it looks at the list of DLLs in the "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex\DLLsToRegister" registry value and registers each of the DLLs in that order. To resolve this issue, move the registration for PDF+ IFilter DLL (PdfPlusFilter.dll) to the end of the list that is maintained by Indexing Service:
Adobe PDF IFilter (ver. 5.0 and 6.0) is an apartment threaded IFilter. Apartment threaded IFilters behave abnormally on some server platforms where indexing process is multithreaded. Follow the steps below to make Adobe PDF IFilter work on these platforms.
Adobe eXtensible Metadata Platform enables various types of content with open standards metadata. It works by embedding metadata packets into binary data file. XMP metadata can currently be embedded into various image files (GIF, PNG, JPEG, TIFF) and document files such as PDF, PostScript, Adobe Illustrator, Adobe FrameMaker. Metadata packets are specifically designed to preserve consistency of the file, so that other applications would not be affected. XMP metadata is extremely rich in nature and suits a large variety of tasks. More information about Adobe XMP can be found at http://www.adobe.com/products/xmp.
Dublin Core is an initiative to create digital library metadata for the Web. Dublin Core is made up of 15 metadata (data that describes data) elements that offer expanded cataloging information and improved document indexing for search engine programs. Two forms of Dublin Core exist: Simple Dublin Core and Qualified Dublin Core. Simple Dublin Core expresses elements as attribute-value pairs using just the 15 metadata elements from the Dublin Core Metadata Element Set. Qualified Dublin Core increases the specificity of metadata by adding information about encoding schemes, enumerated lists of values, or other processing clues. While enabling searches to be more specific, qualifiers are also more complex and can pose challenges to interoperability. More information about Dublin Core may be found at http://www.dublincore.org.
PLEASE NOTE. Adobe XMP and PDF+ IFilter use Dublin Core version 1.1. Lately Dublin Core Metadata Initiative board extended the set with more elements thus making the previous specification obsolete. This should not affect the performance of Adobe XMP and PDF+ IFilter. Current specification is fully backwards compatible with version 1.1.
Version 3.0
Version 2.2
Version 2.1
Version 2.0