Layer 1 Update Procedure

Prerequisite readings:

When attempting to apply a client’s update to some branch, the server must be able to gracefully handle the case that the update condition is not satisfiable on the latest snapshot (i.e., the graph pattern(s) in the user’s WHERE block). This can happen because (a) another client updated the model concurrently, (b) the local client was using stale knowledge, or (c) the client submitted an unsatisfiable update condition. No matter the cause, the sever should attempt to apply the client’s update onto a compatible version of the model that satisfies the update condition, if one exists. If successful, the server should deem the update as causing a merge conflict and create a new branch to be diff'ed against. If unsuccessful, the server can respond with the appropriate 412 Precondition Failed HTTP status.

In order for the server to conduct its attempts at finding a version of the model that satisfies the update condition, it must have snapshots of those versions to query against. Iterating through the entire commit ancestry of the current branch and building a snapshot for each version would be very costly in terms of time and resources. A more optimal approach would be to narrow the number of versions to check and only evaluate the update condition on existing snapshots. For these reasons, SPARQL updates submitted to Layer 1 virtual endpoints are required to specify the commit that the update is intended for by providing the request header X-MMS-Context-Commit-Id: {COMMIT_ID}. The server will then treat this as the earliest allowable commit from which to branch in case the update condition is not satisfiable on newer commits.

Merge Conflict Example

As an example, consider the initial state of the model having the commit hash e4a1c:

# snapshot mms-object:Snapshot.e4a1c
graph mms-graph:Model.Example.master {
  :Alice a :Person .

  :Bob a :Person ;
    :dislikes :Alice .
}

Client X submits the update:

POST /projects/example/ref/master
Content-Type: application/sparql-update
X-MMS-Context-Commit-Id: Commit.e4a1c

delete data {
  :Bob :dislikes :Alice .
}

Which gets applied and creates a new state of the model having commit hash 17ccd:

# snapshot mms-object:Snapshot.17ccd
graph mms-graph:Model.Example.master {
  :Alice a :Person .

  :Bob a :Person .
}

Then, Client Y submits an update targeting the same branch but a previous commit e4a1c:

POST /projects/example/ref/master
Content-Type: application/sparql-update
X-MMS-Context-Commit-Id: Commit.e4a1c

delete {
  :Alice foaf:knows :Bob .
}
where {
  :Bob :dislikes :Alice .
}

Notice that the update condition :Bob :dislikes :Alice is not satisfiable on the latest version 17ccd. The server then traverses up the commit ancestry until reaching an existing snapshot, or reaching the terminal commit e4a1c, at which point it retests the update condition. Since the condition is satisfied by e4a1c, the new commit must reference it as the parent commit, making the commit tree look like this:

            [master]
          ╭--17ccd
(x)--e4a1c
          ╰--8f155

Where 8f155 is the new commit created by Client Y's update. Since this new commit does not append to the HEAD of master, this unwanted divergence in the commit chain is classified as a merge conflict between 8f155 and 17ccd.

The server responds to Client Y’s update with the following headers:

X-MMS-Context-Project-Id: Project.Example
X-MMS-Context-Ref-Id: 
X-MMS-Context-Commit-Id: Commit.8f1dd
X-MMS-Conflict-Commit-Id: Commit.17ccd

Update algorithm

1.0. Input values

Let the following identifiers represent constant values given by the update request:

PROJECT_ID and REF_ID- taken from path parameters /projects/{PROJECT_ID}/refs/{REF_ID}
UPDATE_CONDITION - the WHERE block of the SPARQL update body
TERMINAL_COMMIT_ID - taken from the requisite X-MMS-Context-Commit-Id header

1.1. Commit selection

The first set of IO operations that take place on the server are about finding a version of the model that satisfies the UPDATE_CONDITION. If there is no UPDATE_CONDITION in the update request, the algorithm selects the latest commit associated with the given REF_ID (by targeting it in the upcoming metadata update operation) and proceeds to 1.2. .

Unfortunately, the SPARQL 1.1 Protocol does not standardize any type of response content to a SPARQL Update request. Some triplestores will provide detailed information about how many triples were affected by an update, while others simply indicate whether the request succeeded or failed as an atomic unit by using HTTP status codes. For these reasons, the server must first test the UPDATE_CONDITION using an ASK query before issuing an update operation. The ASK query returns a boolean to indicate whether the given WHERE block matches the specified dataset(s). Fortunately, the server can issue multiple ASK queries in parallel, allowing the triplestore to handle load-balancing the queries against separate graphs and reduce the total time spent waiting for query results to determine which commit will be used to apply the update. If zero graphs match the update condition, the request is deemed unsatisfiable and an appropriate HTTP error code is returned to the client. If multiple graphs match the update condition, the one corresponding to the most recent commit is selected. The result is that merge conflicts are automatically detected by employing the update condition to assess whether an update’s dependencies are satisfied on any given state of a model.

1.2. Effective staging graph

First, see https://openmbee.atlassian.net/wiki/spaces/OPENMBEE/pages/613613569/Tags+and+Snapshots#Staging-graphs for an explanation of staging graphs.

An ‘effective’ staging graph simply refers to a graph that is used to apply an update, whether or not it was previously dedicated to being a staging graph.

Normally, if a staging graph is available for the parent commit, then this graph is selected to be the effective staging graph for the update and the algorithm continues to 1.3 .

However, it is possible that a staging graph for the latest commit is not yet available. This can happen when multiple updates are made to the same branch concurrently and the triplestore has not yet had enough time to COPY the snapshot graph to a dedicated staging graph. In this scenario, MMS5 uses the snapshot graph as an effective staging graph. This technique ultimately defers the COPY operation to a later time once any concurrent writes have settled. See https://openmbee.atlassian.net/wiki/spaces/OPENMBEE/pages/616497153/Layer+1+Update+Procedure#Effective-staging-graph-example below for an example of using an ephemeral snapshot as an effective staging graph.

1.3. SPARQL update execution

Having selected an effective staging graph to apply the model update to, MMS5 then executes a SPARQL update against the triplestore that performs several operations at a single atomic unit. They are summarized as:

acquire an exclusive write lock to the effective staging graph
detach the effective staging graph from its current owner
create a new ephemeral commit object in the project’s metadata graph
apply the model update to the effective staging graph
attach the effective staging graph to the new ephemeral commit
release the write lock

1.4. Commit stabilization

Following a series of 1 or more successive writes to a branch, the server must stabilize commit data by creating new staging graphs and dropping old snapshots that are no longer needed. Delaying this action by some predetermined amount of time will improve the performance of any queries or updates that target or depend on the parent commit.

Effective staging graph example

In the following diagram, e4a1c represents the latest commit at state #01 of the model. Notice how a snapshot graph and a staging graph materialize this same commit:

#01:

 *snapshot -╮
            ├- *staging
            ╎
    (x)---e4a1c

Upon an update, a new commit 17ccd is created by using the staging graph to evolve the state of the model to #02:

#02:

 *snapshot -╮
            ╎       ╭*ephemeral-snapshot-1
            ╎       ╎
    (x)---e4a1c---17ccd

At this point, 17ccd does not yet have its own staging graph. Since it is brand new, it’s snapshot graph is marked as “ephemeral”, which means that (a) it does not yet have a staging graph and (b) it’s (super*)parent still has a snapshot being used for interim reads and updates. Before a staging graph for this new commit is built, MMS5 finishes processing any remaining concurrent updates. In this example, one such update creates a new commit 8f155. The server uses the aforementioned ephemeral snapshot graph as the effective staging graph to apply this update. The new state of the model at #03 looks like:

#03:

 *snapshot -╮
            ╎               ╭*ephemeral-snapshot-2
            ╎               ╎
    (x)---e4a1c---17ccd---8f155

Once the pipeline of concurrent writes have ceased, MMS5 is able to stabilize the latest commit’s snapshot at #04:

#04:

 *snapshot -╮    *snapshot -╮
            ╎               ├- *staging
            ╎               ╎
    (x)---e4a1c---17ccd---8f155

Finally, the now expired snapshot for the original commit is dropped and MMS5 settles on the resting state for this model at #05:

#05:

                 *snapshot -╮
                            ├- *staging
                            ╎
    (x)---e4a1c---17ccd---8f155