Apache Jackrabbit : Goals and non goals for Jackrabbit 3

Best effort: everything might be corrupt at any time:

Pass TCK. But TCK might be adapted for invalid or edge cases.
Node type consistency on save and set type (including mixin). Inconsistencies occurring do to write skew or degradation effects are acceptable though.
Scalability:
- Read throughput: no degradation from current Jackrabbit 2, repeated read not slow, take advantage of locality for random reads. TODO: Needs further clarification
- High write throughput across cluster nodes.
- Big lists of direct child nodes (10M)
- Concurrent writes within single cluster node. TODO: Needs further clarification: concurrency itself might not be the goal but the means to reach high single user throughput
- Big transactions (> 100k nodes at 1kB each)
- Start up time < 1s
- Number of nodes in repository: 100M
- Number of nodes in shared cloud: 10T
- 1G binaries with 2MB per binary => 2PB Repository size
Simple/Fast queries (i.e. through specialized indexes) (3ms)
Partitioning of observation. TODO: Needs further clarification
- Handling of recursive deletes: large number of NODE_REMOVED events vs. delete event for specific properties in subtree.
Number of users: 200M / 20M per group
Full versioning model
Flexible durability (depending on durability guarantees of back end)

Everything is content: search index, configuration, workspaces
- At what level (i.e. JCR, SPI, Microkernel, persistence store)?
Microkernel portable to C:
- Or maybe better "language agnostic API"
Flexible persistence layer (RDBMS, Cassandra, ...)
Small and embeddable
- How small?,
- Embeddable into what?
Characteristics of clustering (partitioning, replication, merging, consistency)
Tunable consistency (e.g. when clustered)