Apache Jackrabbit : Performance

Q. My XPath query is too slow.

Quotes from mailist regarding XPath query performance can be found here: http://markmail.org/message/uew5xeyuzdb7v6bv

Performance of XPath queries is much better with 1.5 snapshot.

Q. I have too many child nodes and performance goes down.

The current internal Jackrabbit design is optimized for small to medium sized child node sets, i.e. up to ~10k child nodes per node. Really large child node sets negatively affect write performance.

Please note that this is not a general issue of JCR but specific to Jackrabbit's current internal persistence strategy - independent from the fact if you use a normal persistence manager or a "bundle" persistence manager, albeit the latter one is recommended; see PersistenceManagerFAQ. Each node contains the references to all its child nodes. This is a design decision inside Jackrabbit to improve speed when using few child nodes. To improve performance, introduce some extra-levels to your content model. This also helps humans to explore the repository when using a browser tool. Typical solutions are to use some categories of the context of your data or date folders, such as "2009/01/09".

From Jackrabbit 2.2 on there are some utility classes in the org.apache.jackrabbit.flat package of the jcr-commons module for automatically arranging nodes in a B-Tree like manner while maintaining a flat view. See JCR-2688.

Q. I have many references to a single node and performance goes down.

The current Jackrabbit design is not optimized for many nodes referencing a single node, because for easy back-referencing in the JCR API all those references are stored in the target node. Please note that many people don't recommend references in a content model anyway - see for example DavidsModel, rule #5.

Q. How can I improve performance with DavEx remoting (jcr2spi / spi2davex)

On the current trunk there are 3 parameters which can be used to tweak performance for jcr2spi/spi2davex. These are the size of the item info cache, the size of the item cache and the depth of batch read operations.

Some Background:

The item cache contains JCR items (i.e. nodes and properties). The item info cache contains item infos. An item info is an entity representing nodes or properties on the SPI layer. The jcr2spi module receives item infos from an SPI implementation (i.e. spi2davex) and uses them to build up a hierarchy of JCR items. When an item is requested from the JCR API, jcr2spi first checks whether the item is in the item cache. If so, that item is returned. If not, the request is passed down to the SPI. But before actually calling the SPI the item info cache is check first. If this cache contains the requested item info the relevant part of the JCR hierarchy is build and the corresponding JCR item is placed into the item cache. Only when the item info cache does not contain the requested item info a call will be made to the SPI. Here the batch read depth comes into play. Since calls to the SPI cause some latency (i.e. network round trips), the SPI may - in addition to the actually requested item info - return additional item infos. The batch read depth parameter specifies the depth down to which item infos of the children of the requested item info are returned.

Overall the size of the item info cache and the batch read depth should be used to optimize for the requirements of the back-end (i.e. network and server). In general, the item info cache should be large enough to easily hold all items from multiple batches. The batch read depth should be a trade off between network latency and item info cache overhead. Finally the item cache should be used to optimize for the requirements of the front-end (i.e. the JCR API client). It should be able to hold the items in the current working set of the API consumer.

Some pointers:

Batch reading:

org.apache.jackrabbit.spi.RepositoryService#getItemInfos()
org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_BATCHREAD_CONFIG

Item info cache size:

org.apache.jackrabbit.spi2davex.Spi2davexRepositoryServiceFactory#PARAM_ITEMINFO_CACHE_SIZE

Item cache size:

org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory#PARAM_ITEM_CACHE_SIZE

Related JIRA issues:

JCR-2497: Improve jcr2spi read performance
JCR-2498: Implement caching mechanism for ItemInfo batches
JCR-2461: Item retrieval inefficient after refresh
JCR-2499: Add simple benchmarking tools for jcr2spi read perform