Commits, Refs and Snapshots

A snapshot is a complete representation of a model at a specific moment in its commit history. A branch is a mutable named reference to a specific commit. A ‘lock’ is an immutable, user-defined named reference to a specific commit. Branches and locks both act as references to commits, so collectively they are called ‘refs’.

MMS5 uses refs to track which commits should have snapshots. Branches are mutable refs that automatically get adjusted to point at the latest commit when an update is made. Locks are immutable refs that can only be created and destroyed. Users can create namespaced locks in order to ‘lock’ access to a particular commit’s data (via its corresponding snapshot), and then subsequently delete locks to ‘release’ access.

Motivation

MMS5 is designed to be a multi-tenant graph database for model management with version control history. To provide clients with random access to a model’s commit history and optimally perform arbitrary read-only queries against it, a system would have to build a complete snapshot for each state in that model’s commit history. However, large projects with frequent commits pose a resource challenge. Storing a complete copy of a project’s model for each commit can quickly impact query performance or even exhaust the storage capacity of the underlying triplestore.

Instead, MMS5 automatically builds and deletes snapshots according to a project’s set of refs. In this context, ‘refs’ include branches, such as master, develop, etc., so that the latest commit for any branch always has a corresponding snapshot. This approach defers decisions to the user about which other commits should have snapshots, i.e., users indicate which commits they want to be able to query by creating locks.

Example

Let’s take a look at how these version control objects relate to one another by using an example project history. Notice that each commit points only to its parent, with the root commit pointing to a reserved constant rdf:nil . Refs (which includes locks and branches) connect snapshots to commits. A ref’s presence indicates that a snapshot should exist for the given commit, therefore no unlinked snapshots shall exist.

In the example below, Lock [app1:8fdd1] immutably points to Commit #8fdd1 and materializes a snapshot of the model at that state. In this case, the lock is held by some application ‘app1’. As long as this lock exists, so shall the snapshot. Once the lock is deleted, MMS5 is free to evict the snapshot from the database.

Commit #9822e and Commit #2c01c share the same parent Commit #73ab9 . This divergence in the chain manifests different states of the model, captured by the Branch [master] and Branch [develop] refs (each of which materializes a snapshot). Notice that Branch [master] and Lock app2:9822e point to the same commit and share the same snapshot. From the user’s perspective, the allocation of snapshots is opaque, but it is shown here to demonstrate that MMS5 conserves resources when possible.

Example version control state with commits, refs, and snapshots

Branches

A branch is simply a ref that gets automatically adjusted to point at the latest commit upon successful updates. As long as a branch exists, it will always have a snapshot materialized in the database ready to be queried and updated.

Namespaced locks

The lock-for-snapshot paradigm means that applications can create namespaced locks in order to ‘lock’ access to a particular commit, without interfering with other applications. The pseudoexample below demonstrates what this concept looks like in practice with two separate applications acquiring locks on the same commit:

1 2 3 4 5 6 7 8 Application A creates the lock "app-a:e4a1c" on commit #e4a1c MMS5 builds a snapshot for commit #e4a1c Application B creates the lock "app-b:e4a1c" on commit #e4a1c Application A queries the virtual endpoint for "app-a:e4a1c" Application A deletes the lock "app-a:e4a1c" to release commit #e4a1c Application B queries the virtual endpoint for "app-b:e4a1c" Application B deletes the lock "app-b:e4a1c" to release commit #e4a1c MMS5 drops the snapshot for commit #e4a1c

Staging graphs

Before the server can submit a SPARQL Update to the quadstore in order to create a new commit, it must ensure that the parent commit is materialized as a snapshot in a named graph. This snapshot is needed since the SPARQL Update mutates the contents of the dataset it is applied against. Once the update has been applied, the resulting dataset will represent a snapshot of the new commit.

Since this process necessarily destroys the original snapshot of the parent commit, conflicting updates from other clients would cause a sudden need to rebuild the destroyed snapshot. Instead, MMS5 mitigates these potential concurrency issues by maintaining an identical copy of each snapshot’s model graph in a “staging graph” that is dedicated to performing updates for new commits. This technique provides some buffer around a rapidly changing branch, allowing concurrent reads to succeed against the parent commit while also reducing latency for concurrent, conflicting updates.

Once a staging graph has been transformed into a new commit’s snapshot model graph, the parent commit’s snapshot is given a finite amount of time until it is marked as expired and can be gracefully dropped from the database. This amount of time represents the buffer given to concurrent reads/updates. Beyond this span of time, any reads or updates that depend on the parent commit must create a lock in order to rebuild the snapshot or they will fail.