Apache Jackrabbit : BackupTool

NOTE: The content on this page is obsolete. See the backup and migration support included in Jackrabbit 1.6.

The backup tool v1 is able to backup a repository and restore it to a blank repository. It uses as much as possible existing storage/restore mechanism.

Currently we manage the following resources:

Repository (repository.xml)
Node Type
Namespaces
All workspaces (config and content)
Node version histories
Backup configuration

Please feel free to comment (through the ML or this contact form to contact directly Nicolas Toper). We plan to work on a second version soon. We are currently gathering feedback and new use cases.

Design Goals

Backup both data and configuration option.
Ease of use for sysAdmin to backup and restore content.
Aim for generic operations: when adding new functionalities to Jackrabbit we should not have to update the backup application code.
Aim for modularity. This would be the first release of the backup tool. It will evolve for sure.
Disk space is not an issue for now. (It can be worked out in another release)
Performance is not an issue for now.(It can be worked out in another release)

Prerequisites & Misc.

All operations are sequential. No multi-threading are currently involved.
Repository must be stopped for backup/restore and dedicated to the backup/restore operations.
The Backup tool source code is available via Subversion at
https://svn.apache.org/repos/asf/jackrabbit/trunk/contrib/backup/
and anonymous access is available at
http://svn.apache.org/repos/asf/jackrabbit/trunk/contrib/backup/
or with ViewVC at
http://svn.apache.org/viewvc/jackrabbit/trunk/contrib/backup/

Backuping A Repository

To launch a backup, please run the following command:

LaunchBackup --zip myzip.zip --conf backup.xml --login nico --password mlypass backup repository.xml repository/

where zip is the name of the file to generate, backup.xml, the name of the XML configuration file (if you don't know how to use it, please use the one included), login and password: the required userID and password.

Restoring

To restore a repository, please prepare a blank repository (available through create repository)

LaunchBackup --zip ./myzip.zip – conf backup.xml --login nico --password p restore repository.xml repository/

where zip is the name of the backup file, backup.xml, the name of the XML configuration file (if you don't know how to use it, please use the one included), login and password: the required userID and password.

repository.xml and repository/ respectively points toward the repository.xml file and its home to restore.

NB You can easily migrate one repository to the other this way and change PersistenceManager easily.

Architecture

We tried as much as possible to achieve a symmetry between backup and restore. All classes can be used for backup and restore operations.

The backup tool is organized in main classes:

Launch utility (LaunchBackup) The launch utility allow to launch a backup through a cronjob or the CLI. You can also integrate the backup in your application by instantiating this class.
<Resource>Backup It is a collection of classes extending the abstract class Backup. Each class is responsible for backuping/restoring a specific resource. For instance, NodeTypeBackup is responsible to backup and restore all node types. To create another class (for instance to backup Lucene indexes), extend Backup and implement its two abstract methods: backup and restore. If you do so, please commit them back.
Manager manages the instanciation and handling of all <Resource>Backup classes (please see NB). It knows which Backup subclasses to call through a XML configuration file. You can therefore create easily your own customized backup.
IOsystem The IOsystem is handled through an interface (BackupIOHandler) and its implementation (ZipBackupIOHandler). It allows us to easily improve the IOsystem without impacting other parts of the code.
NB The restore operation of the backup configuration and the repositories are special since they are mandatory and allow the restore operations to take place. Therefore, LaunchBackup is calling those two classes directly in order to be able to continue the restore operation.

Configuration File


<Backup>
<WorkingFolder path="tmp/" />
  <Resources>
  <!-- The repository and the config file are automatically backupped -->
    <Resource savingClass="org.apache.jackrabbit.backup.NodeTypeBackup" />
    <Resource savingClass="org.apache.jackrabbit.backup.NamespaceBackup" />
    <Resource savingClass="org.apache.jackrabbit.backup.NodeVersionHistoriesBackup" />
    <Resource savingClass="org.apache.jackrabbit.backup.AllWorkspacesBackup" />
   </Resources>
</Backup>

Backup And Restore Operations

The configuration files is saved as files.

The workspaces (and node version histories) is exported using t to a specific workspace (SavingWorkspace) using ObjectPM or XmlPM. We would zip the directory, copy it and destroy the workspace.

Other resources (custom node types and namespaces) are saved and serialized using Jackrabbit's internal xml node type serialization format (NodeTypeWriter and NodeTypeReader for instance).

We would then zip everything in the working folder move it as a stream to RepositoryImpl.

Evolution

Here are some evolutions ideas, please feel free to comment there or on the ML. We plan to implement them soon.

Remove the need for the working folder. Use only streams.
Add asynchronous I/O (synchronous for now only since there are only a few resources to backup)
Add a remote client using either a dedicated RMI connection or the JCR one.
Add support later for a restore operation while the repository is still in operation by rewriting the local restore operation and its client.
Hotbackup (see post on the ML on this subject)
Incremental backup (using Rsync ?)
Backup Lucene Index (see post on Lucene ML about saving indexes)
Backup a large repository to span multiple CD/DVD's. Incremental backups on additional CD/DVD's based on what has not been backed-up yet.
A 'persistence manager' to remotely access backups on CD/DVD (example: remotely on a jukebox of CD/DVD's).
Tag (active) metadata so that it is aware of backups on CD/DVD (i.e. how many copies are out there, CD/DVD label/name/namespace/sometypeofuniqueID).
Add support for partial restore on the Worskpace level (add resource WorkspaceBackup and WorkspaceBackup + a parameter for the name) (<!--
<Resource class="org.apache.jackrabbit.backup.WorkspaceBackup"
<param name="default" />
</Resource>
<Resource class="org.apache.jackrabbit.backup.WorkspaceConfigBackup">
<param name="default" />
</Resource>

-->)

Add different uuidBehavior possible (see 
<param name="uuidBehavior" value="0"/>)
Partial restore

Please contact Nicolas Toper (through the ML or this contact form) on any question/suggestion/idea on this project