Search implementation
Jackrabbit implements both the mandatory XPath and optional SQL query syntax. Its design follows the goal of the JSR-170 specification that all the mandatory query features can be expressed either in XPath or in SQL. Thus, the actual implementation of the query engine is independent of the query syntax used, though Jackrabbit's query internals are closer to XPath than SQL, because of the hierarchical structure of a JCR.
The major parts of the query implementation are:
- XPath Parser
- SQL Parser
- Abstract Query Tree
- Query engine
- Utilities
XPath Parser
The XPath query parser is based on the W3C XQuery grammar definition which
is not yet final but can be downloaded as draft here. The reason why
Jackrabbit uses the XQuery grammar, rather than the XPath grammar, is, that
JSR-170 specifies an ‘order by’ clause for the XPath query syntax. This
‘order by’ clause is borrowed from the XQuery FLWOR expression syntax.
Before parsing the XPath query in Jackrabbit, the statement is surrounded
with dummy code, to form a valid XQuery FLWOR expression and is then passed
to the XQuery parser. The actual parser is a class generated by JavaCC,
which uses the grammar that can be found in src/grammar/xpath. The parsed
XPath statement is then translated into an Abstract Query Tree. See class:
org.apache.jackrabbit.core.query.xpath.XPathQueryBuilder
SQL Parser
The SQL query parser is generated from a grammar definition located in
src/grammar/sql. After parsing, the Abstract Syntax Tree is translated into
the Jackrabbit internal Abstract Query Tree. See class:
org.apache.jackrabbit.core.query.sql.JCRSQLQueryBuilder
Abstract Query Tree
The Abstract Query Tree (AQT) is the common query description format that
allows Jackrabbit to implement a query engine which is (to a certain
extent) independent of the query syntax used (XPath or SQL). The AQT
consists of the classes that are derived from:
org.apache.jackrabbit.core.query.QueryNode
Please note that the AQT is Jackrabbit internal and not exposed to a client using the JCR API!
Query Engine
Now this is where the meat is. The actual implementation of the query
engine is configurable. One needs to implement the interface:
org.apache.jackrabbit.core.query.QueryHandler
. Jackrabbit comes with an
implementation that uses a Lucene index:
org.apache.jackrabbit.core.query.lucene.SearchIndex
This index is
independent of the persistence manager in use. However it is also possible
to write a QueryHandler implementation which is aware of the underlying
storage (e.g. a database) and executes the query on the ‘native’ storage.
The class org.apache.core.query.lucene.LuceneQueryBuilder
translates the
Abstract Query Tree into a query that can be executed against the Lucene
index. Jackrabbit implements a couple of extensions to the standard Lucene
classes, primarily to improve performance in an environment with
incremental indexing like Jackrabbit. Instead of a single index, Jackrabbit
uses generations of indexes to circumvent costly IndexReader
/ IndexWriter
creation. See: org.apache.jackrabbit.core.query.lucene.MultiIndex
. The most
recent generation of the search index is held completely in memory. See:
org.apache.jackrabbit.core.query.lucene.VolatileIndex
. It is comparable
with the garbage collection in Java, where generations are used to move
living objects from the young into the old generation over time. Queries
are then executed on a MultiReader
that spans all the indexes. Every now
and then (depending on the configuration parameters in workspace.xml
)
indexes are merged and nodes marked as deleted in the index are removed.
This happens similar to how Lucene merges its internal segments.
Utilities
The class org.apache.jackrabbit.core.query.QueryParser
allows you to
translate a query statement into an Abstract Query Tree and vice versa.
It's a nice tool to see how a query in XPath looks like in SQL or the other
way round.
The class org.apache.jackrabbit.core.query.PropertyTypeRegistry
provides
fast access to the type information based on property names. The Jackrabbit
QueryHandler
implementation uses this class to coerce value literals into
other value types.