Layer 1 Update Procedure

Prerequisite readings:

When attempting to apply a client’s update to some branch, the server must be able to gracefully handle the case that the update condition is not satisfiable on the latest snapshot (i.e., the graph pattern(s) in the user’s WHERE block). This can happen because (a) another client updated the model concurrently, (b) the local client was using stale knowledge, or (c) the client submitted an unsatisfiable update condition. No matter the cause, the sever should attempt to apply the client’s update onto a compatible version of the model that satisfies the update condition, if one exists. If successful, the server should deem the update as causing a merge conflict and create a new branch to be diff'ed against. If unsuccessful, the server can respond with the appropriate 412 Precondition Failed HTTP status.

In order for the server to conduct its attempts at finding a version of the model that satisfies the update condition, it must have snapshots of those versions to query against. Iterating through the entire commit ancestry of the current branch and building a snapshot for each version would be very costly in terms of time and resources. A more optimal approach would be to narrow the number of versions to check and only evaluate the update condition on existing snapshots. For these reasons, SPARQL updates submitted to Layer 1 virtual endpoints are required to specify the commit that the update is intended for by providing the request header X-MMS-Context-Commit-Id: {COMMIT_ID}. The server will then treat this as the earliest allowable commit from which to branch in case the update condition is not satisfiable on newer commits.

Merge Conflict Example

As an example, consider the initial state of the model having the commit hash e4a1c:

# snapshot mms-object:Snapshot.e4a1c
graph mms-graph:Model.Example.master {
  :Alice a :Person .

  :Bob a :Person ;
    :dislikes :Alice .
}

Client X submits the update:

POST /projects/example/ref/master
Content-Type: application/sparql-update
X-MMS-Context-Commit-Id: Commit.e4a1c

delete data {
  :Bob :dislikes :Alice .
}

Which gets applied and creates a new state of the model having commit hash 17ccd:

# snapshot mms-object:Snapshot.17ccd
graph mms-graph:Model.Example.master {
  :Alice a :Person .

  :Bob a :Person .
}

Then, Client Y submits an update targeting the same branch but a previous commit e4a1c:

POST /projects/example/ref/master
Content-Type: application/sparql-update
X-MMS-Context-Commit-Id: Commit.e4a1c

delete {
  :Alice foaf:knows :Bob .
}
where {
  :Bob :dislikes :Alice .
}

Notice that the update condition :Bob :dislikes :Alice is not satisfiable on the latest version 17ccd. The server then traverses up the commit ancestry until reaching an existing snapshot, or reaching the terminal commit e4a1c, at which point it retests the update condition. Since the condition is satisfied by e4a1c, the new commit must reference it as the parent commit, making the commit tree look like this:

            [master]
          ╭--17ccd
(x)--e4a1c
          ╰--8f155

Where 8f155 is the new commit created by Client Y's update. Since this new commit does not append to the HEAD of master, this unwanted divergence in the commit chain is classified as a merge conflict between 8f155 and 17ccd.

The server responds to Client Y’s update with the following headers:

X-MMS-Context-Project-Id: Project.Example
X-MMS-Context-Ref-Id: 
X-MMS-Context-Commit-Id: Commit.8f1dd
X-MMS-Conflict-Commit-Id: Commit.17ccd

Update algorithm

1.0. Input values

Let the following identifiers represent constant values given by the update request:

PROJECT_ID and REF_ID- taken from path parameters /projects/{PROJECT_ID}/refs/{REF_ID}
UPDATE_CONDITION - the WHERE block of the SPARQL update body
TERMINAL_COMMIT_ID - taken from the requisite X-MMS-Context-Commit-Id header

1.1. Commit selection

The first set of IO operations that take place on the server are about finding a version of the model that satisfies the UPDATE_CONDITION. If there is no UPDATE_CONDITION in the update request, the algorithm selects the latest commit associated with the given REF_ID (by targeting it in the upcoming metadata update operation) and proceeds to 1.2. .

Unfortunately, the SPARQL 1.1 Protocol does not standardize any type of response content to a SPARQL Update request. Some triplestores will provide detailed information about how many triples were affected by an update, while others simply indicate whether the request succeeded or failed as an atomic unit by using HTTP status codes. For these reasons, the server must first test the UPDATE_CONDITION using an ASK query before issuing an update operation. The ASK query returns a boolean to indicate whether the given WHERE block matches the specified dataset(s). Fortunately, the server can issue multiple ASK queries in parallel, allowing the triplestore to handle load-balancing the queries against separate graphs and reduce the total time spent waiting for query results to determine which commit will be used to apply the update. Depending on several factors, using the read-only ASK operation to search multiple graphs in parallel to find those which satisfy the UPDATE_CONDITION might even outperform the aforementioned update-until-it-works technique, though this is only speculation.

1.2. Effective staging graph

check back soon..