Composite Blob Store

NOTE: The current status of this component is a proposed feature.

Overview

The composite blob store is a multi-source blob store - a logical blob store consisting of at least two delegate blob stores. The union of all the data in all the delegate blob stores is presented to a user of the composite blob store as a single logical "view" of the data being stored.

Motivation

A composite blob store offers flexibility in storage deployments that can address a number of user scenarios. Some of these user scenarios are already identified as existing customer use cases for binary management in Oak; see UC2, UC9, and UC14 in JCR Binary Usecase. These and other possible use cases are outlined in the "Use Cases" section below. Some of these may require additional features beyond those in the defined scope for composite blob store today.

Speaking generally, some problems that might be addressable by a composite blob store include:

Enabling and/or simplifying development system / production system blob store sharing scenarios.
Storing blobs closer to the users that use them most, in geo-distributed environments.
Allowing the selection and use of different storage classes for different blobs based on frequency of access (including auto-archival).
Offering greater control within Oak to manage geo-redundancy and high availability, including on an object-by-object basis.
Choose different types of storage for different types of binary data in Oak based on some configurable criteria.

Technical Details

To use the composite blob store, delegate blob stores are configured as data store factories. The configuration of each data store factory must specify the following:

Any standard configuration required to configure this data store
A role (any string value) identifying this delegate
Any configuration pertaining to this data store's role as a delegate

For example, if configuring an S3DataStore as a delegate, a user would:

Configure standard S3DataStore values, like the access key, secret key, and bucket name
Define a role for this data store, e.g. "role=S3DS_1"
Add other configuration, if any, to configure this data store as a delegate (like whether it is a readOnly store)

After configuring the delegates, the composite is configured using the PID "org.apache.jackrabbit.oak.plugins.blob.datastore.CompositeDataStore". A single configuration entry is required, which is a listing of the roles this composite manages. For example, suppose there are two delegate data stores. In the configuration of one delegate, it defines "role=S3DS_1". In the configuration of the second delegate, it defines "role=S3DS_2". To use these two delegates, the composite data store configuration would include the line "roles=S3DS_1,S3DS_2".

Delegate Traversal

"Delegate Traversal" refers to the logic that is used to go through delegates on a read or write request to determine which delegate should be used for a request. The algorithm used to traverse delegates is extensible. The default implementation is called the Intelligent Delegate Traversal Strategy. It is "intelligent" because it attempts to interpret the provided configuration and apply a priority based on the interpretation of the configuration. Another possible default could be e.g. the Simple Delegate Traversal Strategy. This strategy simply attempts to use the delegates in the raw order they are specified in the configuration, with no logic applied. The Intelligent Delegate Traversal Strategy will probably provide the most unsurprising results for end users.

Intelligent Delegate Traversal Strategy

Writes

====== Read-Only Delegates ======
A delegate may be specified as a read-only delegate, in which case it will not accept any write requests. If it would otherwise have been chosen for a write request, the request will defer to the next delegate in the traversal that matches the request, if any.

====== Delegate Write Preference ======
The write algorithm is fairly simple: Excluding delegates that are read-only, iterate through delegates to select the first that can accept the write, and perform the write. Return the result of this write as the result of the composite blob store write. Priority is given to delegates that already have a matching blob ID.

Reads

The composite blob store fulfills read requests by deferring the read to delegate blob stores.

====== Delegate Read Preference ======
The read algorithm is also fairly simple:

Excluding delegates that are read-only, iterate through delegates and select the first that can satisfy the read request.
If no delegate can satisfy the read request, iterate through read-only delegates and select the first that can satisfy the read request.
Return the result of the delegate read, or an appropriate "not found" error message if no delegate has a match.

The response to a read request is the result of the first successful read from a delegate. In this way the top priority result is always selected.

Read-only delegates take lower precedence to writable delegates, as writable delegates may contain more up-to-date information which would be preferred.

Read-Only Delegates

The composite blob store supports the notion of a read-only delegate blob store. One or more of the delegate blob stores can be configured in read-only mode, meaning that it can be used to satisfy read requests but not write requests. An example use case for this scenario is where two content repositories are used, one for a production environment and one for a staging environment. The staging repository can be configured with a composite blob store that accesses to the production storage location in read-only mode, so tests can execute in staging using production data without modifying production data or the production store.

Reads issued to a read-only delegate would be processed as normal. Read-only delegates are not considered for write requests, causing the composite blob store to move on to the next delegate to attempt to fulfill the write request.

Note that configuring all delegates of a composite blob store as read-only delegates would make the blob store useless for storing blobs and thus should not be an allowed condition - at least one delegate blob store must not be a read-only delegate.

Blob ID / Delegate Mapping

In order to avoid issuing read requests to delegates that do not contain the blob ID in question, the composite blob store must maintain a mapping of each blob ID to the delegate containing it. Bloom filters should be used for this purpose. This mapping must be created at startup and maintained as the system runs, and should be rebuilt every time data store garbage collection runs (among other things, the filter may need to be resized for the current number of blob IDs present).

Use Cases

There are many possible use cases for the composite blob store. In order to manage the implementation of the capability, functionality is being added and supported one use case at a time. When this capability is released, the first supported use case will be the Staging Environment (listed below). Other use cases are listed here for reference but are not currently supported.

Staging Environment

The composite blob store can be used to address a production/staging deployment use case, where one Oak repository is the production repository and another is the staging repository. The production repository accesses a single blob store. The staging repository uses a composite blob store to access a staging blob store as well as the production blob store in read-only mode. Thus staging can serve blobs out of either blob store but can only modify blobs on the staging blob store.

+-----------------+        +-----------------+
| Production Env  |        |   Staging Env   |
| +-------------+ |        | +-------------+ |
| |     Oak     | |    +-----+     Oak     | |
| +------+------+ |    |   | +------+------+ |
|        |        |  Read- |        |        |
|        |        |  Only  |        |        |
| +------V------+ |    |   | +------V------+ |
| | S3DataStore <------+   | | S3DataStore | |
| +-------------+ |        | +-------------+ |
|                 |        |                 |
+-----------------+        +-----------------+

Hierarchical Blob Store

The composite blob store directly addresses JCR Binary Usecase UC14 to store data in one of a number of blob stores based on a hierarchy.

In the example below, blobs are initially stored in the FileDataStore and then once they are more than 30 days old are moved to !S3DataStore. They can be read from either location. Note that moving from one data store to the other fits under the category of curation, which is not in this scope.

+-------+
|       |  <30 Days Old  +---------------+
|       +----------------> FileDataStore |
|       |                +---------------+
|  Oak  |
|       |
|       |  >=30 Days Old  +-------------+
|       +-----------------> S3DataStore |
|       |                 +-------------+
+-------+

S3DataStore Clustering

The composite blob store could be used to address JCR Binary Usecase UC9, where two Oak nodes in a cluster may both have a record of a blob in the node store but one node may temporarily not be able to access the blob in the case of async upload. This could be addressed by using a composite blob store where the first level blob store would be FileDataStore on an NFS mount and the second level blob store would be !S3DataStore without a cache. The composite blob store on each node will look for any asset in both the FileDataStore and the !S3DataStore, thus avoiding a split-brain scenario.

+-----------------------------+
| Node 1                      |
| +-----+                     |
| |     |                     |
| |     +-------------------------------+
| | Oak |                     |         |
| |     |   +---------------+ |         |
| |     +-->+ FileDataStore | |  +------V------+
| |     |   +-------^-------+ |  | S3DataStore |
| +-----+           |         |  +------+------+
+-------------------|---------+         |
                    |            +------V------+
                   NFS           |  S3 Bucket  | 
                    |            +------^------+
+-------------------|---------+         |
| Node 2            |         |  +------+------+
| +-----+           |         |  | S3DataStore |
| |     |   +-------V-------+ |  +------^------+
| |     +---> FileDataStore | |         |
| | Oak |   +---------------+ |         |
| |     |                     |         |
| |     +-------------------------------+
| |     |                     |
| +-----+                     |
+-----------------------------+

Replication Across Storage Regions

The composite blob store can address JCR Binary Usecase UC2 by storing blobs close to users.

(NOTE: This use case would rely upon Composite Blob Store Storage Filters which are not in scope for this release.)

In this example, imagine a company with a main office in the United States and branch offices in Tokyo and Paris. Here there is a single Oak repository configured using a composite blob store with three delegates. The default delegate blob store is in the AWS "us-west-2" region, in Oregon in the United States, presumably near the main office. Two other delegate blob stores are configured, one in the AWS "ap-northeast-1" region (Tokyo), and one in the AWS "eu-west-2" region (London). When blobs are stored in Oak, if the "officeLocation" property is set on the stored blob, that will be used to determine where to store the blob. Any blobs stored with "officeLocation" set to any value besides "Tokyo" or "Paris", or blobs stored without this property set, will be stored in the default delegate blob store.

                         +-----+
                         | Oak |
                         +--+--+
                            |
           +----------------+----------------+
           |                |                |
  officeLocation=Tokyo   default    officeLocation=Paris
           |                |                |
    +------V------+  +------V------+  +------V------+
    | S3DataStore |  | S3DataStore |  | S3DataStore |
    |   (Tokyo)   |  |   (Oregon)  |  |   (London)  |
    +-------------+  +-------------+  +-------------+

Apache Jackrabbit : Composite Blob Store