Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

A snapshot is a complete representation of a model at a specific moment in its commit history. A branch is a mutable named reference to a specific commit. A ‘lock’ is an immutable, user-defined named reference to a specific commit. Branches and locks both act as references to commits, so collectively they are called ‘refs’.

MMS5 uses refs to track which commits should have snapshots. Branches are mutable refs that automatically get adjusted to point at the latest commit when an update is made. Locks are immutable refs that can only be created and destroyed. Users can create namespaced locks in order to ‘lock’ access to a particular commit’s data (via its corresponding snapshot), and then subsequently delete locks to ‘release’ access.

Motivation

MMS5 is designed to be a multi-tenant graph database for model management with version control history. To provide clients with random access to a model’s commit history and optimally perform arbitrary read-only queries against it, a system would have to build a complete snapshot for each state in that model’s commit history. However, large projects with frequent commits pose a resource challenge. Storing a complete copy of a project’s model for each commit can quickly impact query performance or even exhaust the storage capacity of the underlying triplestore.

Instead, MMS5 automatically builds and deletes snapshots according to a project’s set of refs. In this context, ‘refs’ include branches, such as master, develop, etc., so that the latest commit for any branch always has a corresponding snapshot. This approach defers decisions to the user about which other commits should have snapshots, i.e., users indicate which commits they want to be able to query by creating locks.

Branches

A branch is simply a type ref that MMS5 automatically adjusts to point at the latest commit upon successful updates. As long as a branch exists, it will always have a snapshot materialized in the triplestore ready to be queried and updated.

Namespaced locks

The lock-for-snapshot paradigm means that applications can create namespaced locks in order to ‘lock’ access to a particular commit, without interfering with other applications. The pseudoexample below demonstrates what this concept looks like in practice:

Application A creates the lock "app-a:e4a1c" on commit #e4a1c
MMS5 builds a snapshot for commit #e4a1c
Application B creates the lock "app-b:e4a1c" on commit #e4a1c
Application A queries the virtual endpoint for "app-a:e4a1c"
Application A deletes the lock "app-a:e4a1c" to release commit #e4a1c
Application B queries the virtual endpoint for "app-b:e4a1c"
Application B deletes the lock "app-b:e4a1c" to release commit #e4a1c
MMS5 drops the snapshot for commit #e4a1c

Staging graphs

Before the server can submit a SPARQL Update to the triplestore in order to create a new commit, it must ensure that the parent commit is materialized as a snapshot in a named graph. This snapshot is needed since the SPARQL Update mutates the contents of the dataset it is applied against. Once the update has been applied, the resulting dataset will represent a snapshot of the new commit.

Since this process necessarily destroys the original snapshot of the parent commit, conflicting updates from other clients would cause a sudden need to rebuild the destroyed snapshot. Instead, MMS5 mitigates these potential concurrency issues by maintaining an identical copy of each snapshot’s model graph in a “staging graph” that is dedicated to performing updates for new commits. This technique provides some buffer around a rapidly changing branch, allowing concurrent reads to succeed against the parent commit while also reducing latency for concurrent, conflicting updates.

Once a staging graph has been transformed into a new commit’s snapshot model graph, the parent commit’s snapshot is given a finite amount of time until it is marked as expired and can be gracefully dropped from the database. This amount of time represents the buffer given to concurrent reads/updates. Beyond this span of time, any reads or updates that depend on the parent commit must create a lock in order to rebuild the snapshot or they will fail.

  • No labels