Apache Jackrabbit : ExcerptProvider

An ExcerptProvider retrieves text excerpts for a node in the query result and marks up the words in the text that match the query terms.

This feature is Jackrabbit specific (introduced in version 1.3) and will not work with other JCR implementations.

Per default highlighting words that matched the query is disabled because this feature requires that additional information is written to the search index. To enable this feature you need to add the following parameter inside the SearchIndex element of your workspace.xml or repository.xml file:

   <param name="supportHighlighting" value="true"/>

Additionally there is a parameter that controls the format of the excerpt created. This must be a class that implements the org.apache.jackrabbit.core.query.lucene.ExcerptProvider interface.

In Jackrabbit 1.3 the default is set to org.apache.jackrabbit.core.query.lucene.DefaultXMLExcerpt and will be changed to org.apache.jackrabbit.core.query.lucene.DefaultHTMLExcerpt in Jackrabbit 1.4. The configuration parameter for this setting is:

   <param name="excerptProviderClass" value="org.apache.jackrabbit.core.query.lucene.DefaultXMLExcerpt"/>

DefaultXMLExcerpt

This excerpt provider creates an XML fragment of the following form:

<excerpt>
    <fragment>
        <highlight>Jackrabbit</highlight> implements both the mandatory
        XPath and optional SQL <highlight>query</highlight> syntax.
    </fragment>
    <fragment>
        Before parsing the XPath <highlight>query</highlight> in
        <highlight>Jackrabbit</highlight>, the statement is surrounded
    </fragment>
</excerpt>

DefaultHTMLExcerpt

This excerpt provider creates an HTML fragment of the following form:

<div>
    <span>
        <strong>Jackrabbit</strong> implements both the mandatory XPath
        and optional SQL <strong>query</strong> syntax.
    </span>
    <span>
        Before parsing the XPath <strong>query</strong> in
        <strong>Jackrabbit</strong>, the statement is surrounded
    </span>
</div>

How to use it

If you are using XPath you must use the rep:excerpt() function in the last location step, just like you would select properties:

QueryManager qm = session.getWorkspace().getQueryManager();
Query q = qm.createQuery("//*[jcr:contains(., 'jackrabbit')]/(@Title|rep:excerpt(.))", Query.XPATH);
QueryResult result = q.execute();
for (RowIterator it = result.getRows(); it.hasNext(); ) {
    Row r = it.nextRow();
    Value title = r.getValue("Title");
    Value excerpt = r.getValue("rep:excerpt(.)");
}

The above code searches for nodes that contain the word jackrabbit and then gets the value of the Title property and an excerpt for each result node.

Starting with Jackrabbit 1.4 it is also possible to use a relative path in the call Row.getValue() while the query statement still remains the same. See JCR-860 for more information. Also starting with Jackrabbit 1.4 you may use a relative path to a string property. The returned value will then be an excerpt based on string value of the property.

Both available excerpt provider will create fragments of about 150 characters and up to 3 fragments.

In SQL the function is called excerpt() without the rep prefix, but the column in the RowIterator will nonetheless be labled rep:excerpt(.)!

QueryManager qm = session.getWorkspace().getQueryManager();
Query q = qm.createQuery("select excerpt(.) from nt:resource where contains(., 'jackrabbit')", Query.SQL);
QueryResult result = q.execute();
for (RowIterator it = result.getRows(); it.hasNext(); ) {
    Row r = it.nextRow();
    Value excerpt = r.getValue("rep:excerpt(.)");
}