Prerequisite readings:
When attempting to apply a client’s update to some branch, the server must be able to gracefully handle the case that the update condition is not satisfiable on the latest snapshot (i.e., the graph pattern(s) in the user’s WHERE
block). This can happen either because (a) another client updated the model concurrently, or (b) the local client was using stale knowledge, or (c) the client submitted an unsatisfiable update condition. No matter the cause, the sever should attempt to apply the client’s update onto a version of the model that satisfies the update condition. If successful, the server should deem the update as causing a merge conflict and create a new branch to be diff'ed against. If unsuccessful, the server can respond with the appropriate 412 Precondition Failed HTTP status.
In order for the server to conduct its attempts at finding a version of the model that satisfies the update condition, it must have snapshots of those versions to query against. Iterating through the entire commit ancestry of the current branch and building a snapshot for each version would be very costly in terms of time and resources. A more optimal approach would be to narrow the number of versions to check and only evaluate the update condition on existing snapshots. For these reasons, SPARQL updates submitted to Layer 1 virtual endpoints are required to specify the commit that the update is intended for by providing the request header X-MMS-Context-Commit-Id: {COMMIT_ID}
. The server will then treat this as the earliest allowable commit from which to branch in case the update condition is not satisfiable on newer commits.
Merge Conflict Example
As an example, consider the initial state of the model having the commit hash e4a1c
:
# snapshot mms-object:Snapshot.e4a1c graph mms-graph:Model.Example.master { :Alice a :Person . :Bob a :Person ; :dislikes :Alice . }
Client X submits the update:
POST /projects/example/ref/master Content-Type: application/sparql-update X-MMS-Context-Commit-Id: Commit.e4a1c delete data { :Bob :dislikes :Alice . }
Which gets applied and creates a new state of the model having commit hash 17ccd
:
# snapshot mms-object:Snapshot.17ccd graph mms-graph:Model.Example.master { :Alice a :Person . :Bob a :Person . }
Then, Client Y submits an update targeting the same branch but a previous commit e4a1c
:
POST /projects/example/ref/master Content-Type: application/sparql-update X-MMS-Context-Commit-Id: Commit.e4a1c delete { :Alice foaf:knows :Bob . } where { :Bob :dislikes :Alice . }
Notice that the update condition :Bob :dislikes :Alice
is not satisfiable on the latest version 17ccd
. The server then traverses up the commit ancestry until reaching an existing snapshot, or reaching the terminal commit e4a1c
, at which point it retests the update condition. Since the condition is satisfied by e4a1c
, the new commit must reference it as the parent commit, making the commit tree look like this:
[master] ╭--17ccd (x)--e4a1c ╰--8f155
Where 8f155
is the new commit created by Client Y's update. Since this new commit does not append to the HEAD of master, this unwanted divergence in the commit chain is classified as a merge conflict between 8f155
and 17ccd
.
The server responds to Client Y’s update with the following headers:
X-MMS-Context-Project-Id: Project.Example X-MMS-Context-Ref-Id: X-MMS-Context-Commit-Id: Commit.8f1dd X-MMS-Conflict-Commit-Id: Commit.17ccd
Update algorithm
1.0. Input values
Let the following identifiers represent constant values given by the update request:
PROJECT_ID
andREF_ID
- taken from path parameters/projects/{PROJECT_ID}/refs/{REF_ID}
UPDATE_CONDITION
- theWHERE
block of the SPARQL update bodyTERMINAL_COMMIT_ID
- taken from the requisiteX-MMS-Context-Commit-Id
header
1.1. Commit selection
The first set of IO operations that take place on the server are about finding a version of the model that satisfies the UPDATE_CONDITION
. If there is no UPDATE_CONDITION
in the update request, the algorithm selects the latest commit associated with the given REF_ID
(by targeting it in the upcoming metadata update operation) and proceeds to 1.2. .
Unfortunately, the SPARQL 1.1 Protocol does not standardize any type of response content to a SPARQL Update request. Some triplestores will provide detailed information about how many triples were affected by an update, while others simply indicate whether the request succeeded or failed as an atomic unit by using HTTP status codes. For these reasons, the server must first test the UPDATE_CONDITION
using an ASK
query before issuing an update operation. The ASK
query returns a boolean to indicate whether the given WHERE
block matches the specified dataset(s). Fortunately, the server can issue multiple ASK
queries in parallel, allowing the triplestore to handle load-balancing the queries against separate graphs and reduce the total time spent waiting for query results to determine which commit will be used to apply the update. Depending on several factors, using the read-only ASK
operation to search multiple graphs in parallel to find those which satisfy the UPDATE_CONDITION
might even outperform the aforementioned update-until-it-works technique, though this is only speculation.