Best Practices when Using Jackrabbit Oak
Session Management
Session refresh behavior
Oak is based on the MVCC model where each session starts with a snapshot
view of the repository. Concurrent changes from other sessions are not
visible to a session until it gets refreshed. A session can be refreshed
either explicitly by calling the refresh()
method or implicitly by
direct-to-workspace methods or by the auto-refresh mode. Also observation
event delivery causes a session to be refreshed.
By default the auto-refresh mode automatically refreshes all sessions that have been idle for more than one second, and it's also possible to explicitly set the auto-refresh parameters. A typical approach would be for long-lived admin sessions to set the auto-refresh mode to keep the session always up to date with latest changes from the repository.
Pattern: One session for one request/operation
One of the key patterns targeted by Oak is a web application that serves HTTP requests. The recommended way to handle such cases is to use a separate session for each HTTP request, and never to refresh that session.
Anti pattern: concurrent session access
Oak is designed to be virtually lock free as long as sessions are not shared across threads. Don't access the same session instance concurrently from multiple threads. When doing so Oak will protect its internal data structures from becoming corrupted but will not make any guarantees beyond that. In particular violating clients might suffer from lock contentions or deadlocks.
If Oak detects concurrent write access to a session it will log a warning.
For concurrent read access the warning will only be logged if DEBUG
level
is enabled for org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate
.
In this case the stack trace of the other session involved will also be
logged. For efficiency reasons the stack trace will not be logged if
DEBUG
level is not enabled.
Content Modelling
Large number of direct child node
Oak scales to large number of direct child nodes of a node as long as those
are not orderable. For orderable child nodes Oak keeps the order in an
internal property, which will lead to a performance degradation when the list
grows too large. For such scenarios Oak provides the oak:Unstructured
node
type, which is equivalent to nt:unstructured
except that it is not orderable.
Large Multi Value Property
Using nodes with large multi value property would not scale well. Depending on NodeStore it might hit some size limit restriction also. For e.g. with DocumentMK the MVP would be stored in the backing Document which on Mongo has a 16MB limit.
More efficient alternatives to large MVPs include:
- store the list of values in a binary property
- use a PropertySequence available in jackrabbit-commons (JCR-2688)
Inlining large binaries
Most of the BlobStore
provide an option to inline small binary content as part of
node property itself. For example FileDataStore
supports minRecordLength
property.
If that is set to say 4096 then any binary with size less than 4kb would be stored
as part of node data itself and not in BlobStore.
It is recommended to not set very high value for this as depending on implementation it might hit some limit causing the commit to fail. For e.g. the SegmentNodeStore enforces a limit of 8k for any inlined binary value. Further this would also lead to repository growth as by default when binaries are stored in BlobStore then they are deduplicated.
Creating files
The default node type provided by JCR 1.0 to model file structure using
nt:file
is to add jcr:content
child with type nt:resource
, which makes
that content referenceable.
If the file has no need to be referenceable it is recommended to use the
node type oak:Resource
instead and add the mixin type mix:referenceble
only upon demand (see OAK-4567)
Hierarchy Operations
Tree traversal
As explained in Understanding the node state model, Oak stores content in a tree hierarchy. Considering that, when traversing the path to access parent or child nodes, even though being equivalent operations, it is preferable to use JCR Node API instead of Session API. The reason behind is that session API uses an absolute path, and to get to the desired parent or child node, all ancestor nodes will have to be traversed before reaching the target node. Traversal for each ancestor node includes building the node state and associating it with TreePermission (check Permission Evaluation in Detail), where this is not needed when using Node API and relative paths.
Node c = session.getNode("/a/b/c");
Node d = null;
// get the child node
d = session.getNode("/a/b/c/d");
d = c.getNode("d"); // preferred way to fetch the child node
// get the parent node
c = session.getNode("/a/b/c");
c = d.getParent(); // preferred way to fetch the parent node
Security
Misc
Don't use Thread.interrupt()
Thread.interrupt()
can severely impact or even stop the repository. The reason for
this is that Oak internally uses various classes from the nio
package that implement
InterruptibleChannel
, which are asynchronously closed
when receiving an InterruptedException
while blocked on IO. See OAK-2609.
Avoid or minimize conflicts
To reduce the possiblity of having errors like OakState0001: Unresolved conflicts in ...
:
-
Make sure you always release the session by calling session.logout(). If possible, avoid long-running sessions. If they are required (e.g. for observation) make sure to always call session.refresh(false) before applying changes or session.refresh(true) before saving the changes.
-
Enable the DEBUG level for
org.apache.jackrabbit.oak.plugins.commit.MergingNodeStateDiff
andorg.apache.jackrabbit.oak.plugins.commit.ConflictValidator
loggers if you want to have more information on the circumstances of a conflict that happened in a point of time. -
Write your own conflict handler and add it when configuring your Oak or WhiteBoard instances. Only if you know what you are doing (i.e. you know how to resolve the conflict in each one of the possible situations). By default, the AnnotatingConflictHandler instance will discard your changes and your commit will fail. If persisting changes fails with a conflict and you cannot lose them, refactor your code such that you can retry after having called session.refresh(false). Check the source code of JcrLastModifiedConflictHandler for an example of a conflict handler.