Apache Jackrabbit : Statistics

Jackrabbit Statistics

Apache Jackrabbit 2.3.2 introduced via JCR-2936 a new mechanism to gather low level statistics at the repository level.

Repository Statistics

Overview

The newly introduced Repository Statistics service leverages a set of time series to provide 3 types of statistical information:

  • COUNTER(s) They generally refer to Quantity.
  • DURATION(s) They generally refer to event duration. Measured in nanoseconds, unless specified otherwise.
  • AVERAGE(s) They are computed by dividing DURATION(s) with COUNTER(s).

Sequence Types: Depending on how the time series aggregates data points there are 2 types of sequences:

  • Single - each data point represents only the recorded value
  • Incremental - each data point represents the value of the previous data point + the new recorded value

The Repository Statistics service is always enabled.

Provided Statistical Information

 

 

PersistenceManager (not implemented in Oak)

Name

Sequence Type

Description

BUNDLE_READ_COUNTER

Single

Counts the number of bundle read operations

BUNDLE_WRITE_COUNTER

Single

Counts the number of bundle write operations

BUNDLE_WRITE_DURATION

Single

Tracks the duration (ns) of bundle write operations

BUNDLE_WRITE_AVERAGE

Incremental

Computes the bundle write average duration (ns): BUNDLE_WRITE_DURATION / BUNDLE_WRITE_COUNTER

BUNDLE_CACHE_ACCESS_COUNTER

Single

Bundle Cache: access counter

BUNDLE_CACHE_SIZE_COUNTER

Single

Bundle Cache: size counter

BUNDLE_CACHE_MISS_COUNTER

Single

Bundle Cache: cache miss count

BUNDLE_CACHE_MISS_DURATION

Single

Bundle Cache: cache miss duration (ns)

BUNDLE_CACHE_MISS_AVERAGE

Incremental

Bundle Cache: cache miss average (ns): BUNDLE_CACHE_MISS_DURATION / BUNDLE_CACHE_MISS_COUNTER

BUNDLE_COUNTER

Single

Not Implemented

BUNDLE_WS_SIZE_COUNTER

Single

Not Implemented

 

* ns = nanoseconds

 

 

 

Session

Name

Sequence Type

Description

SESSION_READ_COUNTER

Single

Counts the number of session read operations

SESSION_READ_DURATION

Single

Tracks the duration (ns) of session read operations

SESSION_READ_AVERAGE

Incremental

Computes the average duration (ns) of session read operations

SESSION_WRITE_COUNTER

Single

Counts the number of session write operations

SESSION_WRITE_DURATION

Single

Tracks the duration (ns) of session write operations

SESSION_WRITE_AVERAGE

Incremental

Computes the average duration (ns) of session write operations

SESSION_LOGIN_COUNTER

Single

Counts the number of session logins (new created sessions)

SESSION_COUNT

Single

Counts the number of active sessions

 

* ns = nanoseconds

 

Classification of Session related operations:

 

Query

 

Name

Sequence Type

Description

QUERY_COUNT

Single

Counts the number of queries ran

QUERY_DURATION

Single

Tracks the duration (ms) of queries

QUERY_AVERAGE

Single

Computes the average duration (ms) of the queries

 

* ms = milliseconds

 

 

Observation (not implemented in Jackrabbit 2)

 

Name

Sequence Type

Description

OBSERVATION_EVENT_COUNTER

Single

Counts the number of observation Event instances delivered

OBSERVATION_EVENT_DURATION

Single

Tracks the time (ns) spent processing observation events

OBSERVATION_EVENT_AVERAGE

Incremental

Computes the average time (ns) spent processing observation events

 

* ns = nanoseconds

 

TimeSeries

The TimeSeries is an interface for a time series of the measured values per second, minute, hour and day. The type of the value is arbitrary; it could be cache hits or misses, disk reads or writes, created sessions, completed transactions, or pretty much anything of interest.

It is available since Apache Jackrabbit 2.3.2.

A brief walkthough

  • #getValuePerSecond() returns the measured value per second over the last minute
  • #getValuePerMinute() returns the measured value per minute over the last hour
  • #getValuePerHour() returns the measured value per hour over the last week
  • #getValuePerWeek() returns the measured value per week over the last three years

All the data series are cronological and have as many data point slots as units in the time period they represent: valuePerSecond has 60 slots as there are 60 seconds in a minute, valuePerMinute also contains 60 as there are 60 minutes in an hour, valuePerHour contains 168 (7x24) slots and valuePerWeek 156 (3x52) slots.

Each data series is being aggregated down (as presented in the above hierarchy) once the time period it represents has passed. For example: after each minute the valuePerSecond data points will be summed and added as a single data point into the valuePerMinute series, and so on.

Example:

  RepositoryContext context = // get the RepositoryContext
  RepositoryStatistics repositoryStatistics = context.getRepositoryStatistics();
  TimeSeries loginCounter = repositoryStatistics.getTimeSeries(Type.SESSION_LOGIN_COUNTER);
  System.out.println(Arrays.toString(loginCounter.getValuePerSecond()));

And the output is:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 3]

The SESSION_LOGIN_COUNTER is a simple counter (non-incremental) so each data point represents the absolute value of the counter: we can see 3 logins in the last second, none the previous one and just 1 the one before.

Query Statistics

The QueryStat service provides query related performance logs:

  • #getSlowQueries() provides a list of the slowest queries. The queue size can be specified via the #setSlowQueriesQueueSize() method. The default queue size value is 15.
  • #getPopularQueries() provides a list of the queries that ran more often. The queue size can be specified via the #setPopularQueriesQueueSize() method. The default queue size value is 15.

The Query Statistics service is disabled by default.

Future work

One of the ideas for future improvements is to turn the statistics code into its own dedicated component. This evolution can be followed via JCR-3130.