
Google Search Appliance: External Metadata Indexing Guide 11
Scenario 5
Metadata: Inserted into the feed XML file.
Primary Document: Referenced by the URL in the feed XML file (web feed).
This scenario is similar to the previous scenario, except that the primary document is referenced by URL
only (instead of the contents of the primary document being fed to the search appliance). The feed file
therefore contains the
<header>
information and, for each
<record>
element, the URL of the record
and the
<metadata>
elements.
1. Create the feed XML file and define the header information, including the data source name, as
shown in the following example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE gsafeed PUBLIC "-//Google//DTD GSA Feeds//EN" "">
<gsafeed>
<header>
<datasource>sample2</datasource>
<feedtype>metadata-and-url</feedtype>
</header>
Note that the
<feedtype>
element is
metadata-and-url
. This tells the web or file system crawler
to pick up the URLs for the primary document and index them accordingly.
2. Create a
<record>
element for each primary document. In the
<metadata>
element, insert one or
more
<meta>
elements, as shown in the following example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE gsafeed PUBLIC "-//Google//DTD GSA Feeds//EN" "">
<gsafeed>
<header>
<datasource>sample2</datasource>
<feedtype>metadata-and-url</feedtype>
</header>
<group>
<record url="http://www.corp.enterprise.com/hello02"
mimetype="text/plain" last-modified="Tue, 17 Feb 2009 12:45:26 GMT">
<metadata>
<meta name="author" content="Stevens"/>
<meta name="project" content="hello02"/>
<meta name="department" content="HR"/>
</metadata>
External Metadata Sent in an HTTP Header
At crawl time, the search appliance can accept external metadata, along with documents, through the
X-
GSA-External-Metadata
HTTP response header. This is useful for indexing metadata for non-HTML
documents, where it is not possible to include metadata. The metadata supplied at crawl time replaces
any and all metadata that may have been indexed earlier.
To use this method of indexing external metadata, the web service that stores the content needs to be
designed to generate the optional
X-GSA-External-Metadata
HTTP header. The header includes a
comma separated list of encoded values, as specified in RFC2616 (http://www.w3.org/Protocols/
rfc2616/rfc2616.html), Section 4.2:
X-GSA-External-Metadata: value_1, value_2,...
Commentaires sur ces manuels