Apache Jackrabbit : Conflict handling through rebasing branches

This is a slightly edited and somewhat amended version of my original proposal posted to oak-dev@.

There are various places in Oak where conflicting updates to the content tree can occur: committing changes against a non root revision, merging changes from a branch into trunk and synchronisation of cluster nodes.

The Microkernel API currently doesn't specify a contract for merging conflicting changes but leaves it mostly to the underlying implementation. As discussed before this is not satisfactory and I'd like to tighten the contract.

As announced earlier, I spent some time implementing various approaches for rebasing branches in Oak. Apart from the other problems this solves and which have been discussed at length, this also lends itself nicely for a central, clean and efficient way of handling conflicts.

The core idea is to try to resolve all conflicts in a branch through rebasing it on top of the current trunk. After successfully rebasing a branch, merging it to trunk is as simple as fast forwarding the head of the trunk to the head of the branch.

Here is a naïve sample implementation of the relevant methods in pseudo code. Note how commit is implemented in terms of branch and merge. It has not to be implemented that way but rather the observable behaviour should be like this.

/* Rebase branch on top of the head of the trunk. Returns false if a conflict occurs, true otherwise. */
boolean rebase(branch) {
    // See https://github.com/mduerig/jackrabbit-oak/commits/OAK-536
    // for possible implementations
}

int merge(branch) {
    atomic {
        if (branch.baseRevision == trunk.headRevision) {
            trunk.headRevision = branch.headRevision
            return trunk.headRevision;
        }
        else {
            throw new MicroKernelException("conflict");
        }
    }
}

int commit(baseRevision, jsop) {
    branch = createBranchFrom(baseRevision)
    branch.apply(jsop)
    if (!rebase(branch)) {
        throw new MicroKernelException("conflict");
    }

    // If merge fails at this point we could also retry the commit 
    // a certain number of times instead of failing right away.
    return merge(branch)
}

Note the very simple merge() implementation: it only covers the fast forward case where the base of the branch to be merged is the head of the trunk and fails all other merges. merge() has a very narrow atomic section, which is basically a conditional update. Such operations are exposed by most operating systems / storage systems / platforms and should thus be easy to implement.

The overall idea of implementing merge in this way is that callers need to take care of properly rebasing branches and resolving conflicts through the provided rebase() method. The rebase() method itself will detect a specific set of conflicts and annotates them with special conflict markers. This enables conflict resolution to be handled higher up the stack where we have a better understanding of the actual semantics of the conflict. Also note how all conflict handling takes place on private branches thus keeping the trunk free of contentions.

In the current implementation if rebase() results in a conflict, conflicting nodes are annotated with a conflict marker denoting the type of the conflict and the value(s) before the rebase operation. The conflict marker is an internal node with the name :conflict and is added to the node whose properties or child nodes are in conflict.

addExistingProperty

A property has been added that has a different value than a property with the same name that has been added in trunk.

removeRemovedProperty

A property has been removed while a property of the same name has been removed in trunk.

removeChangedProperty

A property has been removed while a property of the same name has been changed in trunk.

changeRemovedProperty

A property has been changed while a property of the same name has been removed in trunk.

changeChangedProperty

A property has been changed while a property of the same name has been changed to a different value in trunk.

addExistingNode

A node has been added that is different from a node of them same name that has been added to the trunk.

removeRemovedNode

A node has been removed while a node of the same name has been removed in trunk.

removeChangedNode

A node has been removed while a node of the same name has been changed in trunk.

changeRemovedNode

A node has been changed while a node of the same name has been removed in trunk.

In this context a node is regarded as changed if a property way added, a property was removed, a property was set to a different value, a child node was added, a child node was removed or a child node was changed.

On conflict the conflict marker node carries the conflicting value of the branch while the rebased value in the branch itself will be set to the conflicting value of the trunk. In the case of conflicting properties, the conflicting value is the property value from the branch. In the case of conflicting node, the conflicting value is the node from the branch.