Jackrabbit Statistics

Apache Jackrabbit 2.3.2 introduced via JCR-2936 a new mechanism to gather low level statistics at the repository level.

Repository Statistics

Overview

The newly introduced Repository Statistics service leverages a set of time series to provide 3 types of statistical information:

COUNTER(s) They generally refer to Quantity.
DURATION(s) They generally refer to event duration. Measured in nanoseconds, unless specified otherwise.
AVERAGE(s) They are computed by dividing DURATION(s) with COUNTER(s).

Sequence Types: Depending on how the time series aggregates data points there are 2 types of sequences:

Single - each data point represents only the recorded value
Incremental - each data point represents the value of the previous data point + the new recorded value

The Repository Statistics service is always enabled.

Provided Statistical Information

		PersistenceManager (not implemented in Oak)
Name	Sequence Type	Description
BUNDLE_READ_COUNTER	Single	Counts the number of bundle read operations
BUNDLE_WRITE_COUNTER	Single	Counts the number of bundle write operations
BUNDLE_WRITE_DURATION	Single	Tracks the duration (ns) of bundle write operations
BUNDLE_WRITE_AVERAGE	Incremental	Computes the bundle write average duration (ns): BUNDLE_WRITE_DURATION / BUNDLE_WRITE_COUNTER
BUNDLE_CACHE_ACCESS_COUNTER	Single	Bundle Cache: access counter
BUNDLE_CACHE_SIZE_COUNTER	Single	Bundle Cache: size counter
BUNDLE_CACHE_MISS_COUNTER	Single	Bundle Cache: cache miss count
BUNDLE_CACHE_MISS_DURATION	Single	Bundle Cache: cache miss duration (ns)
BUNDLE_CACHE_MISS_AVERAGE	Incremental	Bundle Cache: cache miss average (ns): BUNDLE_CACHE_MISS_DURATION / BUNDLE_CACHE_MISS_COUNTER
BUNDLE_COUNTER	Single	Not Implemented
BUNDLE_WS_SIZE_COUNTER	Single	Not Implemented
	* ns = nanoseconds

		Session
Name	Sequence Type	Description
SESSION_READ_COUNTER	Single	Counts the number of session read operations
SESSION_READ_DURATION	Single	Tracks the duration (ns) of session read operations
SESSION_READ_AVERAGE	Incremental	Computes the average duration (ns) of session read operations
SESSION_WRITE_COUNTER	Single	Counts the number of session write operations
SESSION_WRITE_DURATION	Single	Tracks the duration (ns) of session write operations
SESSION_WRITE_AVERAGE	Incremental	Computes the average duration (ns) of session write operations
SESSION_LOGIN_COUNTER	Single	Counts the number of session logins (new created sessions)
SESSION_COUNT	Single	Counts the number of active sessions
	* ns = nanoseconds

Classification of Session related operations:

Read operations: #getItem(), #getNode(), #getProperty(), #itemExists(), #nodeExists(), #propertyExists(), #refresh(), #removeItem()
Write operations: #move(), #save()

	Query
Name	Sequence Type	Description
QUERY_COUNT	Single	Counts the number of queries ran
QUERY_DURATION	Single	Tracks the duration (ms) of queries
QUERY_AVERAGE	Single	Computes the average duration (ms) of the queries
	* ms = milliseconds

	Observation (not implemented in Jackrabbit 2)
Name	Sequence Type	Description
OBSERVATION_EVENT_COUNTER	Single	Counts the number of observation Event instances delivered
OBSERVATION_EVENT_DURATION	Single	Tracks the time (ns) spent processing observation events
OBSERVATION_EVENT_AVERAGE	Incremental	Computes the average time (ns) spent processing observation events
	* ns = nanoseconds

TimeSeries

The TimeSeries is an interface for a time series of the measured values per second, minute, hour and day. The type of the value is arbitrary; it could be cache hits or misses, disk reads or writes, created sessions, completed transactions, or pretty much anything of interest.

It is available since Apache Jackrabbit 2.3.2.

A brief walkthough

#getValuePerSecond() returns the measured value per second over the last minute
#getValuePerMinute() returns the measured value per minute over the last hour
#getValuePerHour() returns the measured value per hour over the last week
#getValuePerWeek() returns the measured value per week over the last three years

All the data series are cronological and have as many data point slots as units in the time period they represent: valuePerSecond has 60 slots as there are 60 seconds in a minute, valuePerMinute also contains 60 as there are 60 minutes in an hour, valuePerHour contains 168 (7x24) slots and valuePerWeek 156 (3x52) slots.

Each data series is being aggregated down (as presented in the above hierarchy) once the time period it represents has passed. For example: after each minute the valuePerSecond data points will be summed and added as a single data point into the valuePerMinute series, and so on.

Example:

  RepositoryContext context = // get the RepositoryContext
  RepositoryStatistics repositoryStatistics = context.getRepositoryStatistics();
  TimeSeries loginCounter = repositoryStatistics.getTimeSeries(Type.SESSION_LOGIN_COUNTER);
  System.out.println(Arrays.toString(loginCounter.getValuePerSecond()));

And the output is:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 3]

The SESSION_LOGIN_COUNTER is a simple counter (non-incremental) so each data point represents the absolute value of the counter: we can see 3 logins in the last second, none the previous one and just 1 the one before.

Query Statistics

The QueryStat service provides query related performance logs:

#getSlowQueries() provides a list of the slowest queries. The queue size can be specified via the #setSlowQueriesQueueSize() method. The default queue size value is 15.
#getPopularQueries() provides a list of the queries that ran more often. The queue size can be specified via the #setPopularQueriesQueueSize() method. The default queue size value is 15.

The Query Statistics service is disabled by default.

Future work

One of the ideas for future improvements is to turn the statistics code into its own dedicated component. This evolution can be followed via JCR-3130.

Apache Jackrabbit : Statistics