Apache Jackrabbit : Oakathon November 2017

Where and When

  • November 13th - 17th 2017
  • Location: Adobe Basel, guests please register at reception on 2nd floor. Oakathon will be on 5th floor in meeting room 'Rhein'.

Attendees

Who

When

Marcel Reutegger

13. - 16.

Matt Ryan

13. - 16.

Angela Schreiber

13. - 16.

Michael Dürig

13. - 16.

Valentin Olteanu

13. - 16.

Andrei Dulceanu

14. - 16.

Francesco Mari

13. - 16.

Thomas Müller

13. - 16.

Daniel Hasler

13. - 16.

Topics/Discussions/Goals

Title

Summary

Effort

Participants

Proposed by

Handling requests for blobs that aren't immediately available

Some cloud storage options, like AWS Glacier, offer the capability to store infrequently used data for a very low cost. The tradeoff is that requests for data may take a long time - even hours or longer - before the data is accessible. The purpose of the discussion will be to determine how to handle both the initial request and subsequent requests and how to help clients interact on such requests effectively. Having someone familiar with Sling involved in the discussion would be useful as Sling's request model may also be affected.

2h

everyone interested

Matt Ryan

Providing JCR node information at blob creation time

The JCR specification defines that values are created via 'ValueFactory.createBinary(stream)' (section 10.4.3.2). This is carried forward into the DataStore interface, wherein the addRecord() method takes only an input stream and has no JCR node information related to the stream. With the introduction of the CompositeDataStore into Oak, there are many interesting use cases that can potentially be supported if certain JCR node information (e.g. JCR path) is known when a binary is being created or accessed. This discussion is around ways to support these types of use cases.

2h (discussion only), or 1-2d (if we do a prototype

everyone interested

Matt Ryan

Threat Model

Create a threat model for Oak

2 x 2hours

everyone interested; 1 expert per topic needed on demand

angela

m12n

Continue/complete modularization effort

as long as it takes

everyone interested

angela

TarMK tooling

Review and advance tooling API and initial implementations from the August Oakathon.

2-5d

everyone interested

Michael

Versioning and adoption

probably related to m12n, discuss how we think about Oak adoption when new feature are implemented, but upstream modules cannot take unstable versions

2h

everyone interested

Alex

TarMK roadmap

Sketch out a roadmap for the TarMK for the upcoming months.

1d

Andrei, Francesco, Valentin, Michael and everyone interested

Michael

TarMK pain points

Based on the current feedback, write down the list of (major) issues encountered by users when operating a TarMK deployment. Identify the main focus areas and prioritize to help defining the roadmap.

2-3h

Andrei, Francesco, Valentin, Michael and everyone interested

Valentin

TarMK on HDFS

Could it be possible to store segments on HDFS instead of a local disk? From a quick analysis this could be easier the intuitively perceived. HDFS scales exceptionally for parallel reads and writes of blocks.

2-3d

Tomek, Francesco, Michael, Andrei and everyone interested

PhilippSuter

No page caching for TarMK

Page caching produces notorious side effects, especially for storing very large repositories. Could it be possible to use JVM managed memory to achieve similar cache hit ratios?

2-3d

Andrei, Valentin, Francesco, Michael and everyone interested

PhilippSuter

In or out?

Go through open issues and decide what goes into 1.8 and what needs to be deferred. Committers familiar with a module should do a first pass before the Oakathon and use the time with the team to discuss issues that are controversial, on a tight schedule or require attention for some other reason.

2-4h

everyone

Marcel

Wrap up CompositeDataStore

I think CompositeDataStore is almost across the finish line, let's tie a bow on it

2-4h

MattR + 1-2 committers familiar with data store

Matt Ryan

Benchmarking

Writing benchmarks and interpreting results is challenging. I would like to present and discuss problems, solutions, and possible improvements. See also: Statistically rigorous Java performance evaluation, Virtual machine warmup blows hot and cold, Everything You Know About Latency Is Wrong

1h

everyone interested

Thomas

Serverless Computing, first for Index+Search

Serverless computing is gaining traction. See also: Serverless computing: economic and architectural impact. We should discuss in which modules it can be used. One example is Search and Queries

1h

everyone interested

Thomas

Agenda Proposal

Mon

General

DataStore

TarMK

Misc

9:00-12:30

9:00 Setup
9:30 5 min overview per topic

10:30 Providing JCR node information at blob creation time|

10:30 TarMK pain points| |

13:30-17:00

 

 

14:00 TarMK pain points

13:30 Modularization

 

 

Tue

 

 

 

 

9:00-12:30

9:00 Benchmarking (1h)

10:00 Handling requests for blobs that aren't immediately available |TarMK tooling| |

13:30-17:00

 

Wrap up CompositeDataStore

15:00 Threat modeling (2h)|

 

 

Wed

 

 

 

 

9:00-12:30

9:00 Versioning and adoption (1h)

 

10:00 TarMK future

10:00 Threat modeling (2h) |

13:30-17:00

 

 

13:30 TarMK future
16:00 Wrap up: TarMK future

 

18:00-...

Dinner. Separate invite pending

 

 

 

 

 

Thu

 

 

 

 

9:00-12:30

9:00 In or out? (2h)

 

09:00 TarMK on HDFS (Tomek)
10:00 No page caching for TarMK

 

13:30-17:00

13:30 Serverless Computing, first for Indexing+Search(1h)
14:30 Oak Future - Where is Oak headed? (1.5h)

 

 

 

 

Fri

 

 

 

 

9:00-12:30

 

 

 

 

13:30-17:00

 

 

 

 

Prep Work

Notes from the Oakathon

Providing JCR Node information to DataStore

There are two main cases to consider: Creating a new blob and accessing an existing blob.

  • Blob creation is done in the DataStore interface via addRecord(InputStream). Options discussed (in order of preference) were:
    • Creating a new InputStream implementation that includes node information in the input stream. When the DataStore reads the stream if the stream is of the new implementation type it will pull the node information out of the stream and then send the rest along, or something like that.
    • Extend the Jackrabbit DataStore API to also support addRecord() with additional node information. This would not replace the addRecord(InputStream) method, but would be additive (and, we admit, not strictly compliant with the JCR spec).
    • Use an existing deprecated method that might suit this purpose. We dislike this, obviously, because we would be knowingly using a deprecated method.
  • Accessing an existing blob is usually but not always via a DataIdentifier.
    • We considered encoding additional information into the DataIdentifier, but we are leaning away from that for a few reasons:
      • Encoding node information into the identifier means that blob ids would have to change if any of the node information were ever to change, like adding a property or moving the blob to a different path. It also presents complications for supporting binary deduplication (the same blob may be stored at two different paths).
      • Encoding a data store identifier into the blob id has similar issues if the blob were to be moved from one data store to another.
      • Taking this step also creates a data migration issue for existing users.
    • Instead we discussed that the CompositeDataStore can assume the responsibility for mapping DataIdentifiers to delegate data stores. This was basically considered a requirement for CompositeDataStore anyway (via Bloom filters). The CompositeDataStore would need to load existing identifiers at startup time to do this. We might be able to get the DataIdentifiers via the blob tracker.

During the meeting we also brought up a number of issues that need to be verified with CompositeDataStore, related to this topic:

  • We need to check initiation of CompositeDataStore in the system and make sure that caching gets initialized correctly.
  • We need to check DSGC in the production system/test system use case. Since the test system accesses the production data store read-only, does it also participate in the mark phase? If binaries are marked for delete in production, do they end up getting deleted from the production data store?
  • In the production system/test system use case, can the test system reuse or share the index segments from the production system, or is the test system required to rebuild the indexes for the test system use? This may take so long that it limits the usefulness of the test system, so this needs to be understood. How would this work if the production instance is doing active deletion of Lucene indexes?
    • Can we clone an instance and also clone the index segments if active deletion is being used? Since the clone only happens from the node store's head state, would the clone care about other information not at the head state?
    • Is it okay to have separate index segments between both (and rebuild them), or copy them and update them for the local system, or would it be better to try to share the index segments?

Finally, we discussed what we may consider to be the first use case of this capability in Oak. Initially Matt proposed that allowing the CompositeDataStore to select a delegate based on path information may be the first use case. Another suggestion (Amit? Vikas?) was that a smaller use case might exist just within Oak to use a CompositeDataStore and store only index segments in one delegate and everything else in the other. In that case this would happen entirely within Oak and the user would not be aware that a CompositeDataStore was being used.

Handling requests for blobs that aren't immediately available

The prime example for this scenario is using AWS Glacier as an Oak data store option. Glacier as a data store doesn't make a lot of sense by itself but if used in the context of CompositeDataStore with some support for tiering or prioritization in the data stores, Glacier as the lowest priority, it might make sense.

The biggest challenge with using Glacier is that unarchiving blobs from Glacier is a time-consuming task. The standard expectation is 4+ hours; expedited extraction is possible but even in this case retrieval is on the order of minutes. This clearly means we would need some mechanism for conveying "I can get the requested blob; I don't have it now, but I will have it in the future." Once a blob is unarchived it must be retrieved within 24 hours or it will return to the archived state.

A new component (referred to as a "curator" in this discussion) was suggested. The role of the curator would be to retrieve unarchived objects from Glacier to a higher-level storage tier for future access. As proposed it would also have the role of applying policy to move infrequently accessed objects to lower tiers, eventually to Glacier. Because the curator moves objects, it knows where they are and when they have moved so Oak continues to know the whereabouts. If they are moved outside of Oak, it becomes difficult for Oak to keep track of blobs and their locations which could result in the composite data store requesting a blob from delegates where it doesn't exist anymore.

The following items were discussed:

  • Should Glacier restore be an administrative task instead of something that occurs as the result of a standard user request? Since Glacier restores are expensive, there is risk that spurious user requests for blobs could result in unnecessary restores. This would mean that the responsibility of unarchiving would reside outside of Oak, either at the application level or we could assume users simply do it via their own AWS console or something.
  • In the context of tiered storage in a composite data store, Glacier storage is perhaps not useful unless S3DataStore is also being used, so we can probably assume S3DataStore. In which case, we can make assumptions about unarchiving. For example, an AWS Lambda could be used to move code from one store to another.
  • How does garbage collection work across multiple stores? (Open issue)
  • Curator component probably belongs within the context of oak-blob-composite, not as a separate bundle.
  • S3 IA has had some very rudimentary testing with Oak and should be much simpler to use. Would this be sufficient to meet a user desire for lower-cost storage and thus minimize the need to take on the additional complexity of using Glacier?
  • How do we get the curator to avoid archiving things that don't want to be archived? There may be some blobs that we never want to archive. Some suggestions:
    • Size (some concerns with this; while it is easy to determine whether a blob exceeds the minimum size requirement, it is much harder to come up with a meaningful size. For example, blob thumbnails should probably never be archived for user experience purposes, but what is the correct size that would include every conceivable thumbnail but not exclude archiving things that we want to archive?)
    • Last accessed time
    • Some other items, like index segments, should never be archived
    • Could we only archive certain parts of the tree?
  • The blob store deduplicates blobs, which means multiple nodes may refer to the same blob. So the curator would need to be aware of all nodes referring to a blob and only archive if all nodes agree it should be archived. One idea given was to use a similar pattern as for garbage collection, e.g. during a mark phase blobs can be marked as "don't move" if any node votes it shouldn't be moved to a lower priority tier.
  • Once something has been moved, instead of removing the reference at the higher tier the record could be replaced with a placeholder indicating where it moved to.

Toward the end of the discussion, it was brought up that perhaps we could get by without a Glacier data store delegate for composite data store and still allow users to store things in Glacier. It would require customization outside of Oak which could either be an application or simple administrative tasks. In such a case, as an example "archiving" a blob would mean it doesn't get deleted from a higher tier blob store, but rather gets replaced with some sort of marker that the user can understand as "archived". The custom code would copy the blob to archive. Unarchiving would require external effort to revert the process. So in other words the "application layer" (which could also just be admin scripts) is responsible for moving blobs from one tier to the other, as well as keeping track of the state of blobs. Difficulty comes in determining which blob is the one that should be moved and knowing whether it can be moved, dealing with multiple references, etc. but it is certainly possible to be done outside of Oak.

Action plan: MattR to explore the need further and determine if using cold storage options is a real requirement and if doing it in Oak is really needed.