Apache Jackrabbit : DataStore

Data Store

Overview

The data store is optionally used to store large binary values. Normally all node and property data is stored in a persistence manager, but for large binaries such as files special treatment can improve performance and reduce disk usage.

The main features of the data store are:

  • Space saving: only one copy per unique object it kept
  • Fast copy: only the identifier is copied
  • Storing and reading does not block others
  • Multiple repositories can use the same data store
  • Objects in the data store are immutable
  • Garbage collection is used to purge unused objects
  • Hot backup is supported
  • Clustering: all cluster nodes use the same data store

Requirements

Jackrabbit 1.4 is required, it is not available in the previous releases.

A Bundle persistence manager is required, or any other persistence manager that supports data stores. The SimpleDbPersistenceManager and subclasses do not support the data store, meaning large objects are still saved multiple times if it is used.

The file system must supports files as large as the largest object you want to store. Please note that the file size limit of FAT32 is 2 GB.

How to Configure the File Data Store

To use the file based data store, add this to your repository.xml before the </Repository>:

    <DataStore class="org.apache.jackrabbit.core.data.FileDataStore"/>

File Data Store

The file data store stores each binary in a file. The file name is the hash code of the content. When reading, the data is streamed directly from the file (no local or temporary copy of the file is created). The file data store does not use any local cache, that means content is directly read from the files as needed. New content is first stored in a temporary file, and later renamed / moved to the right place.

Because the data store is append-only, the FileDataStore is guaranteed to be consistent after a crash (unlike the BundleFsPersistenceManager). It is usually faster than the DbDataStore, and the preferred choice unless you have strict operational reasons to put everything into a database.

Configuration

This is a full configuration using the default values:

    <DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
        <param name="path" value="${rep.home}/repository/datastore"/>
        <param name="minRecordLength" value="100"/>
    </DataStore>

All configuration options are optional:

  • path: The name of the directory where this data store keeps the files. The default is /repository/datastore.
  • minRecordLength: The minimum object length. The default is 100 bytes; smaller objects are stored inline (not in the data store). Using a low value means more objects are kept in the data store (which may result in a smaller repository, if the same object is used in many places). Using a high value means less objects are stored in the datastore (which may result in better performance, because less datastore access is required). There is a limitation on the minRecordLength: the maximum value is around 32000. The reason for this is that Java doesn't support strings longer than 64 KB in writeUTF.

Database Data Store

The database data store stores data in a relational database. All content is stored in one table, the unique key of the table is the hash code of the content.

When reading, the data may be first copied to a temporary file on the server, or streamed directly from the database (depending on the copyWhenReading setting). New content is first stored in the table under a unique temporary identifier, and later the key is updated to the hash of the content.

When adding a record, by default the stream is first copied to a temporary file. If you get the exception "Can not insert new record java.io.IOException: No space left on device" that means your temporary directory is too small. The reason for the temp file is: most databases need to know the stream size when adding a record, and the JCR API doesn't provide a way to do that. The mechanism used to add a record depends on the property "storeStream" in resource org/apache/jackrabbit/core/data/db/<databaseType>.properties. Implemented mechanisms are: "tempFile" (default; create a temp file before adding a record), "-1" (use the length -1 when adding the record; currently only supported by the H2 database), and "max" (use the length Integer.MAX_VALUE).

Configuration

Here is a possible configuration using the database data store:

     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
        <param name="url" value="jdbc:postgresql:test"/>
        <param name="user" value="sa"/>
        <param name="password" value="sa"/>
        <param name="databaseType" value="postgresql"/>
        <param name="driver" value="org.postgresql.Driver"/>
        <param name="minRecordLength" value="1024"/>
        <param name="maxConnections" value="3"/>
        <param name="copyWhenReading" value="true"/>
        <param name="tablePrefix" value=""/>
        <param name="schemaObjectPrefix" value=""/>
    </DataStore>

The configuration options are:

  • url: The database URL (required).
  • user: The database user name (required).
  • password: The database password (required).
  • databaseType: The database type. By default the sub-protocol of the JDBC database URL is used if it is not set. It must match the resource file [databaseType].properties. Example: mysql. Currently supported are: db2, derby, h2, mssql, mysql, oracle, sqlserver.

  • driver: The JDBC driver class name. By default the default driver of the configured database type is used.
  • maxConnections: Set the maximum number of concurrent connections in the pool. At least 3 connections are required if the garbage collection process is used.
  • copyWhenReading: The the copy setting, enabled by default. If enabled, a stream is always copied to a temporary file when reading a stream, so that reads can be concurrent. If disabled, reads are serialized.
  • tablePrefix: The table name prefix. The default is empty. Can be used to select an non-default schema or catalog. The table name is constructed like this: ${tablePrefix}${schemaObjectPrefix}${tableName}. Before Jackrabbit version 2.0, this setting is case sensitive (needs to be lowercase for PostgreSQL and MySQL, and uppercase for other databases).
  • schemaObjectPrefix: The schema object prefix. The default is empty. Available from version 1.6+. Before Jackrabbit version 2.0, this setting is case sensitive (needs to be lowercase for PostgreSQL and MySQL, and uppercase for other databases).

Limitations

MySQL does not support sending very large binaries from the JDBC driver to the database. See also: http://bugs.mysql.com/bug.php?id=10859 and http://forums.mysql.com/read.php?10,258333,260405#msg-260405 .

Temporary Files

By default, the database data store creates temporary files that start with "dbRecord" in the temp directory (the directory where the system property "java.io.tmpdir" points to). To disable this behavior, set the configuration property "copyWhenReading" to "false" (see above).

FAQ

Q: Can I disable the data store? A: Only if there are no elements in the data store. If there it is not empty, you need to copy the data to a new repository.

Q: When I use the database data store I get the message: 'Table or view does not exists'. A: Maybe the data store table already exists in another schema. When starting the repository, the database data store checks if the table already exists (using a database meta data call), and will create the table if not. If the table exists, but is in another schema, the table is not created, but accessing it may fail (if the other schema is not in the schema search path for this user).

Q: What would happen when multiple users download large files from the store? A: DbDataStore: It depends on whether the copyWhenReading option is enabled or not (see there). FileDataStore: The same file would be opened multiple times. The file would be in the operating system / file system block cache, so that wouldn't be a problem.

Q: Can I change the file content in the data store? A: No. The whole point of the data store is that the content is immutable.

Clustering is supported if you use a shared file system, such as SAN or NFS (Windows file sharing works as well). You need to set data store path of all cluster nodes to the same location.

Blob Store: When the data store is enabled, the blob store is not used. The data store solves the same (and more) problems than the blob store. Therefore, the blob store is now deprecated, however it will be supported for quite some time.

Transaction: transactional semantics are guaranteed.

There is only one data store per repository (not one per Workspace).

Backup: It is very easy to backup the data store: just copy all files. They are never modified, and only renamed from temp file to live file. Deleted only when no longer used (and only by the garbage collector). Backup can be incremental. Backup at runtime (hot backup) is supported.

The main advantages of the data store over the blob store are: unlike the blob store, the data store keeps only one copy per object, even if it is used multiple times. The data store detects if the same object is already stored and will only store a link to the existing object. The data store can be shared across multiple workspaces, and even across multiple repositories if required. Data store operations (read and write) don't block other users because they are done outside the persistence manager. Multiple data store operations can be done at the same time.

Migration: currently there is no special mechanism to migrate data from a blob store to a data store. You will have to convert the whole repository, see also BackupAndMigration.

How Does It Work

When adding a binary object, Jackrabbit checks the size of it. When it is larger than minRecordLength, it is added to the data store, otherwise it is kept in-memory. This is done very early (possible when calling Property.setValue(stream)). Only the unique data identifier is stored in the persistence manager (except for in-memory objects, where the data is stored). When updating a value, the old value is kept there (potentially becoming garbage) an the new value is added. There is no update operation.

The current implementation still stores temporary files in some situations, for example in the RMI client. Those cases will be changed to use the data store directly where it makes sense.

Very small objects (where it does not make sense to create a file) are stored in the persistence manager (in-place).

Objects in the data store are only removed when they are not reachable (that means, objects referenced in the cache or in memory are not collected). There is no 'update' operation, only 'add new entry'. Data is added before the transaction is committed. Additions are globally atomic, cluster nodes can share the same data store. Even different repositories can share the same store, as long as garbage collection is done correctly.

Overview:

DataStoreOverview.png!

Objects are usually stored early in the data store, even before the transaction is committed. Only the the identifier is stored in the persistence manager. The blob store is not used any longer (except for backward compatibility). When using the RMI client, large objects are not stored directly in the data store, instead they are first transferred to the server.

API

The regular JCR API is used read and write entries in the data store.

Retrieve the Identifier

The identifier can help you locate the file in the data store backend; however, creating a value instance from a identifier is currently not supported. To get the identifier of a binary value, use JackrabbitValue.getContentIdentity(). Example:

InputStream in;
Binary b = session.getValueFactory().createBinary(in);
Value value = session.getValueFactory().createValue(b);
if (value instanceof JackrabbitValue) {
    JackrabbitValue jv = (JackrabbitValue) value;
    String id = jv.getContentIdentity();
}

Data Store Garbage Collection

The data store never deletes entries except when running data store garbage collection. Similar to Java heap garbage collection, data store garbage collection will first mark all used entries, and later remove unused items.

Data store garbage collection does not delete entries if the identifier is still in the Java heap memory. To delete as many unreferenced entries as possible, call System.gc() a few times before running the data store garbage collection. Please note System.gc() does not guarantee all objects are garbage collected.

Running Data Store Garbage Collection (Jackrabbit 1.x)

Running the garbage collection is currently a manual process. You can run this as a separate thread concurrently to your application:

import org.apache.jackrabbit.core.data.GarbageCollector;
...
GarbageCollector gc;
SessionImpl si = (SessionImpl)session;
gc = si.createDataStoreGarbageCollector();

// optional (if you want to implement a progress bar / output):
gc.setScanEventListener(this);
gc.scan();
gc.stopScan();

// delete old data
gc.deleteUnused();

The process above applies to a standalone repository. When clustered, the garbage collection can be run from any cluster node.

If multiple distinct repositories use the same data store, the process is a bit different: First, call gc.scan() on the first repository, then on the second and so on. At the end, call gc.deleteUnused() on the first repository:

gc1.scan();
gc2.scan();
gc3.scan();
gc1.stopScan();
gc1.deleteUnused();
gc2.stopScan();
gc3.stopScan();

An alternative is:

  1. Write down the current time = X
  2. Run gc.scan() on each repository
  3. Manually delete files with last modified date older than X

Running Data Store Garbage Collection (Jackrabbit 2.x)

Running the garbage collection is currently a manual process. You can run this as a separate thread concurrently to your application:

JackrabbitRepositoryFactory rf = new RepositoryFactoryImpl();
Properties prop = new Properties();
prop.setProperty("org.apache.jackrabbit.repository.home", DIR);
prop.setProperty("org.apache.jackrabbit.repository.conf", DIR + "/repository.xml");
JackrabbitRepository rep = (JackrabbitRepository) rf.getRepository(prop);
RepositoryManager rm = rf.getRepositoryManager(rep);

// need to login to start the repository
Session session = rep.login(new SimpleCredentials("", "".toCharArray()));

DataStoreGarbageCollector gc = rm.createDataStoreGarbageCollector();
try {
    gc.mark();
    gc.sweep();
} finally {
    gc.close();
}

session.logout();
rm.stop();

The process above applies to a standalone repository. When clustered, the garbage collection can be run from any cluster node.

If multiple distinct repositories use the same data store, the process is a bit different: First, call gc.mark() on the first repository, then on the second and so on. At the end, call gc.sweep() on the first repository:

gc1.mark();
gc2.mark();
gc3.mark();
gc1.sweep();
gc1.close();
gc2.close();
gc3.close();

An alternative is:

  1. Write down the current time = X
  2. Run gc.mark() on each repository
  3. Manually delete files with last modified date older than X

How to Write a New Data Store Implementation

New implementations are welcome! Cool would be a S3 data store (http://en.wikipedia.org/wiki/Amazon_S3). A caching data store would be great as well (items that are used a lot are stored in fast file system, others in a slower one).

Future Ideas

Theoretically the data store could be split to different directories / hard drives. Currently this can be done manually moving directories to different disks and by creating softlinks. Content that is accessed more often could be moved to a faster disk, and less used data could eventually be moved to slower / cheaper disk. That would be an extension of the 'memory hierarchy' (see also http://en.wikipedia.org/wiki/Memory_hierarchy). Of course this wouldn't limit the space used per workspace, but would improve system performance if done right. Maybe we need to do that anyway in the near future to better support solid state disk.

Other feature requests:

  • A replicating data store
  • Currently the FileDataStore creates a lot of directories (and files). If possible the number of directories (and maybe files) should be reduced to improve performance.
  • Fulltext search and meta data extraction could be done when storing the object (only once per object) and stored next to the object.
  • Client should first send the checksum and size of large objects when they store something (import, adding or updating data), in many cases the actual data does not need to be sent.
  • Speed up garbage collection. One idea is to use 'back references' for larger objects: each larger object would know the set of nodes that reference it. This would be an 'append only' set, that means at runtime links only added, not removed. Only the garbage collection process removes links. The garbage collection would first update links for large objects (this process could stop at the first link that still exists). That way large objects can be removed quickly if they are not used any more. Afterwards, objects with a low use count should be scanned. This algorithm wouldn't necessarily speed up the total garbage collection time, but it would free up space more quickly.
  • Auto-Compressing Datastore (file, db) - if a specific file's content type and size make it likely to have large disk space savings if compressed, set the datastore to auto-compress (whether zip, gzip, bz, etc.). File datastore is more likely to have this feature than DB, and should not impact retrieval or other normal usage. (user added 7/17/2009)

Attachments:

DataStoreOverview.png (image/png)