Search implementation

Jackrabbit implements both the mandatory XPath and optional SQL query syntax. Its design follows the goal of the JSR-170 specification that all the mandatory query features can be expressed either in XPath or in SQL. Thus, the actual implementation of the query engine is independent of the query syntax used, though Jackrabbit's query internals are closer to XPath than SQL, because of the hierarchical structure of a JCR.

The major parts of the query implementation are:

XPath Parser
SQL Parser
Abstract Query Tree
Query engine
Utilities

XPath Parser

The XPath query parser is based on the W3C XQuery grammar definition which is not yet final but can be downloaded as draft here. The reason why Jackrabbit uses the XQuery grammar, rather than the XPath grammar, is, that JSR-170 specifies an ‘order by’ clause for the XPath query syntax. This ‘order by’ clause is borrowed from the XQuery FLWOR expression syntax. Before parsing the XPath query in Jackrabbit, the statement is surrounded with dummy code, to form a valid XQuery FLWOR expression and is then passed to the XQuery parser. The actual parser is a class generated by JavaCC, which uses the grammar that can be found in src/grammar/xpath. The parsed XPath statement is then translated into an Abstract Query Tree. See class: org.apache.jackrabbit.core.query.xpath.XPathQueryBuilder

SQL Parser

The SQL query parser is generated from a grammar definition located in src/grammar/sql. After parsing, the Abstract Syntax Tree is translated into the Jackrabbit internal Abstract Query Tree. See class: org.apache.jackrabbit.core.query.sql.JCRSQLQueryBuilder

Abstract Query Tree

The Abstract Query Tree (AQT) is the common query description format that allows Jackrabbit to implement a query engine which is (to a certain extent) independent of the query syntax used (XPath or SQL). The AQT consists of the classes that are derived from: org.apache.jackrabbit.core.query.QueryNode

Please note that the AQT is Jackrabbit internal and not exposed to a client using the JCR API!

Query Engine

Now this is where the meat is. The actual implementation of the query engine is configurable. One needs to implement the interface: org.apache.jackrabbit.core.query.QueryHandler. Jackrabbit comes with an implementation that uses a Lucene index: org.apache.jackrabbit.core.query.lucene.SearchIndex This index is independent of the persistence manager in use. However it is also possible to write a QueryHandler implementation which is aware of the underlying storage (e.g. a database) and executes the query on the ‘native’ storage.

The class org.apache.core.query.lucene.LuceneQueryBuilder translates the Abstract Query Tree into a query that can be executed against the Lucene index. Jackrabbit implements a couple of extensions to the standard Lucene classes, primarily to improve performance in an environment with incremental indexing like Jackrabbit. Instead of a single index, Jackrabbit uses generations of indexes to circumvent costly IndexReader / IndexWriter creation. See: org.apache.jackrabbit.core.query.lucene.MultiIndex. The most recent generation of the search index is held completely in memory. See: org.apache.jackrabbit.core.query.lucene.VolatileIndex. It is comparable with the garbage collection in Java, where generations are used to move living objects from the young into the old generation over time. Queries are then executed on a MultiReader that spans all the indexes. Every now and then (depending on the configuration parameters in workspace.xml) indexes are merged and nodes marked as deleted in the index are removed. This happens similar to how Lucene merges its internal segments.

Utilities

The class org.apache.jackrabbit.core.query.QueryParser allows you to translate a query statement into an Abstract Query Tree and vice versa. It's a nice tool to see how a query in XPath looks like in SQL or the other way round.

The class org.apache.jackrabbit.core.query.PropertyTypeRegistry provides fast access to the type information based on property names. The Jackrabbit QueryHandler implementation uses this class to coerce value literals into other value types.