Workspace Filter

One of the most important meta files of a vault checkout or a content package is the filter.xml which is present in the META-INF/vault directory. The filter.xml is used to load and initialize the WorkspaceFilter. The workspace filter defines what parts of the JCR repository are imported or exported during the respective operations through vlt or package management.

Workspace Filter

General Structure

The filter.xml consists of a set of filter elements, each with a mandatory root attribute and an optional list of include and exclude child elements.

Example:

<workspaceFilter xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://jackrabbit.apache.org/filevault/xsd/workspacefilter-1.0.xsd" version="1.0">
    <filter root="/apps/project1" />
    <filter root="/etc/project1">
        <exclude pattern=".*\.gif" />
        <include pattern="/etc/project1/static(/.*)?" />
    </filter>
    <filter root="/etc/map" mode="merge" />
    <filter root="/apps/old-project-location" type="cleanup" />
</workspaceFilter>

Filter Elements

The filter elements are independent of each other and define include and exclude patterns for subtrees. The root of a subtree is defined by the root attribute, which must be an absolute path in JCR 2.0 Path Standard Form. The filter element can have an optional mode attribute which specified the import mode used when importing content. The following values are possible:

replace : This is the normal behavior. Existing content is replaced completely by the imported content, i.e. is overridden or deleted accordingly.
merge : Existing content is not modified, i.e. only new content is added and none is deleted or modified. Deprecated, as not handled consistently, use merge_properties instead.
merge_properties: Existing content is not modified, i.e. only new content is added and none is deleted or modified.
update : Existing content is updated, new content is added and none is deleted. Deprecated, as not handled consistently, use update_properties instead.
update_properties: Existing content is updated, new content is added and none is deleted.

For a more detailed description of the import mode, see here. Note that all values must be given in lowercase letters (despite the underlying Java enum type using uppercase letters).

In addition it is possible to influence the auto-detection of the package type (if not explicitly specified in the properties.xml) with the attribute type. The only supported value as of now is cleanup which means that the filter rule is ignored for the auto-detection of the package type (JCRVLT-220) as well as ignored for validation of orphaned filter entries with the jackrabbit-filter validator. This is thought for nodes which are supposed to be removed during package installation (i.e. nodes which are not contained in any serialization files/folders).

Include and Exclude Elements

The include and exclude elements can be added as optional children to the filter element to allow more fine grained filtering of the subtree during import and export. They have a mandatory pattern attribute which has the format of a regexp. The regexp is matched against the full respective or potential JCR node path in JCR 2.0 Path Standard Form, so it either must start with / (absolute regex) or a wildcard (relative regex).

Order

The order of the include and exclude elements is important. The paths are tested in a sequential order against all patterns and the type of the last matching element determines if the path is included or not. One caveat is, that the type of the first pattern defines the default behavior, so that the filter is more natural to write. If the first pattern is include, then the default is exclude and vice versa.

The following example only includes the nodes in /tmp that end with .gif.

<filter root="/tmp">
    <include pattern=".*\.gif"/>
</filter>

The following example includes all nodes in /tmp except those that end with .gif.

<filter root="/tmp">
    <exclude pattern=".*\.gif"/>
</filter>

Property Filtering

Since FileVault 3.1.28 (JCRVLT-120) it is not only possible to filter on node level but also only include/exclude certain properties below a certain node by setting the attribute matchProperties on the exclude/include element to true.

<filter root="/tmp">
    <exclude pattern="/tmp/property1" matchProperties="true"/>
</filter>

Then the pattern is matched against property paths instead of node paths. If the attribute matchProperties is not set or false all properties directly below the given node paths are included/excluded, otherwise the pattern is compared with the full property path (in case properties are written/read) allowing to include/exclude only specific properties below an included node.

XML Schema

One can leverage the XML schema provided at https://jackrabbit.apache.org/filevault/xsd/workspacefilter-1.0.xsd to validate a filter.xml of a content package. This schema also provides some documentation on the elements and attributes, so in most IDEs some help is exposed on hovering those.

Referencing the XML schema from within the filter.xml works like this

<workspaceFilter xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://jackrabbit.apache.org/filevault/xsd/workspacefilter-1.0.xsd" version="1.0">

Note that the XML schema is not bound to a namespace, so make sure to reference it via xsi:noNamespaceSchemaLocation only.

Usage for Export

When exporting content into the filesystem or a content package, the workspace filter defines which nodes are serialized. It is important to know, that only the nodes that match the filter are actually traversed, which can lead to unexpected results.

For example:

<filter root="/tmp">
    <include pattern="/tmp/a(/.*)?"/>
    <include pattern="/tmp/b/c(/.*)?"/>
</filter>

The above will include the /tmp/a subtree, but not the /tmp/b/c subtree, since /tmp/b does not match the filter and is therefore not traversed.

There is one exception, if all the patterns are relative (i.e. don't start with a slash), then the algorithm is:

start at the filter root
traverse all child nodes recursively
if the path of the child node matches the regexp, include it in the export

Usage for Import/Installation

When importing (i.e. installing) content packages into a repository the workspace filter defines which nodes are deserialized and overwritten in the repository. Nodes/Properties being covered by some filter rules but not contained in the to be imported content are removed from the repository.

The exact rules are outlined below

Item covered by filter rule	Item contained in the Content Package	Item contained in the Repository (prior to Import/Installation)	State of Item in Repository after Import/Installation
no	yes	yes	not touched
no	no	yes	not touched
no	yes	no	nodes which are ancestors of covered rules: deserialized from content package (for backwards compatibility reasons), nodes which are not ancestors of covered rules: not touched. One should not rely on this behaviour, i.e. all items in the content package should always be covered by some filter rule to make the behaviour more explicit.
no	no	no	not existing (not touched)
yes	yes	yes	overwritten
yes	no	yes	removed
yes	yes	no	deserialized from content package
yes	no	no	not existing

Uncovered ancestor nodes

All uncovered ancestor nodes are either

created with the node type and properties given in the package (in case the node type is given with a .content.xml at the right location and the node does not yet exist in the repo)
since version 3.4.4 (JCRVLT-417) created with the ancestor node type's default child type or if that is not set or prior to version 3.4.4 created with node type nt:folder (in case the node type is not given with a .content.xml at the right location and the node does not yet exist in the repo) or
not touched at all (in case they are already existing in the repo, no matter which node type is given with a .content.xml at the according location)

Example

Content Package Filter

<filter root="/tmp">
    <include pattern="/tmp/a(/.*)?"/>
    <include pattern="/tmp/b(/.*)?/>
    <exclude pattern="/tmp/b/property1" matchProperties="true"/>
    <include pattern="/tmp/c(/.*)?"/>
</filter>

Content Package Serialized Content

+ /jcr_root/
  + tmp/
  	 + a/
  	   - property1="new"
  	 + b/
  	   - property1="new"
  	   - property2="new"

Repository State Before Installation/Import

+ /tmp/
  + b/
    - property1="old"
    - property2="old"
  + c/
    - property1="old"

Repository State After Installation/Import

+ /tmp/
  + a/
    - property1="new"
  + b/
    - property1="old"
    - property2="new"