Design principle
Best effort: everything might be corrupt at any time:
- node types
- child node existence
- clients may not make any consistency assumptions
Goals
- Pass TCK. But TCK might be adapted for invalid or edge cases.
- Node type consistency on save and set type (including mixin). Inconsistencies occurring do to write skew or degradation effects are acceptable though.
- Scalability:
- Read throughput: no degradation from current Jackrabbit 2, repeated read not slow, take advantage of locality for random reads. TODO: Needs further clarification
- High write throughput across cluster nodes.
- Big lists of direct child nodes (10M)
- Concurrent writes within single cluster node. TODO: Needs further clarification: concurrency itself might not be the goal but the means to reach high single user throughput
- Big transactions (> 100k nodes at 1kB each)
- Start up time < 1s
- Number of nodes in repository: 100M
- Number of nodes in shared cloud: 10T
- 1G binaries with 2MB per binary => 2PB Repository size
- Simple/Fast queries (i.e. through specialized indexes) (3ms)
- Partitioning of observation. TODO: Needs further clarification
- Handling of recursive deletes: large number of NODE_REMOVED events vs. delete event for specific properties in subtree.
- Number of users: 200M / 20M per group
- Full versioning model
- Flexible durability (depending on durability guarantees of back end)
Non goals
- Node type consistency when node type definition changes
- Consistency guarantees
- Scalability:
- Big property list
- Same name siblings
- Namespace remapping
- Query index complete
- Fast move
- JCR lock support (best effort only)
Maybe
- Scalability:
- Large number of values for multi valued properties
- Sharable nodes
- Fast delete
TBD
- Everything is content: search index, configuration, workspaces
- At what level (i.e. JCR, SPI, Microkernel, persistence store)?
- Microkernel portable to C:
- Or maybe better "language agnostic API"
- Flexible persistence layer (RDBMS, Cassandra, ...)
- Small and embeddable
- How small?,
- Embeddable into what?
- Characteristics of clustering (partitioning, replication, merging, consistency)
- Tunable consistency (e.g. when clustered)