Model Mutation Safety
Layer 1 of Flexo MMS handles the versioning of ‘models’ on an abstracted layer above the quadstore by emulating a git version control paradigm. Under the hood, Layer 1 must orchestrate the atomic insertion and deletion of triples to various named graphs on a single quadstore.
The SPARQL 1.1 Update specification guarantees that multiple mutation queries submitted together will succeed or fail as an atomic unit (see Transaction Isolation Levels in Neptune as an example), meaning that a single HTTP POST request to the quadstore can affect multiple sequential changes as a single transaction. However, this feature alone does not solve all potential concurrency issues in the context of version control at the ‘model’ level when multiple clients must communicate with a single gateway sitting behind a RESTful web service interface.
These issues arise when requests to modify the same version of a model are made concurrently. The trivial technique for handling such circumstances would be to create a separate branch in the commit history for each request, essentially declaring a merge conflict anytime more than one request to modify the same resource is processed. The problem with this approach however would be the ever-growing number of branches, and the frequent need for human intervention to resolve such merge conflicts. Simply put, the branch-on-concurrent-write technique does not scale.
Optimistic Concurrency Control
A trivial solution for handling competing transactions is to allow the client to use preconditions when submitting SPARQL Updates with HTTP conditional requests. Essentially, an application can require that no other modifications have been made to some resource in order for their update to take effect. Flexo MMS facilitates conditional requests using ETags that it generates based on the modifications made to database objects and model commits.
For example, say an application creates a new branch develop
:
PUT /orgs/openmbee/repos/demo/branches/develop
<> mms:ref <./main> .
Flexo MMS will respond with something like:
HTTP/* 200
ETag: d9a333f7-18da-4009-8e96-7170417ece14
prefix mms: <https://mms.openmbee.org/rdf/ontology/>
...
Now the application can perform the first commit to this new branch by guaranteeing that no other clients/applications have mutated the branch since it was created by using the returned ETag in an If-Match header to form a conditional request:
POST /orgs/openmbee/repos/demo/branches/develop/update
If-Match: d9a333f7-18da-4009-8e96-7170417ece14
insert data { ... }
If some other client or application has modified the branch, Flexo MMS will abort the update and respond with an HTTP 412 Precondition Failed code.
Intent
Returning to what causes the issue in the first place, clients cannot be expected to always have the most up-to-date knowledge about the state of the model they are attempting to mutate. However, concurrent requests do not necessarily need to cause merge conflicts. Instead, what is needed is an approach that allows for more flexibility, such that the use of stale knowledge when attempting to update a model does not automatically trigger a merge conflict.
In practice, a client’s stale knowledge of the model is tolerable if the difference in model state does not affect the intent behind their mutation. Fortunately, clients are able to encode intent into their SPARQL update queries by using patterns and conditions, effectively ensuring that their changes to the model will not be affected by any potential gaps in knowledge.
In SPARQL updates, intent can be encoded as a combination of update patterns and update conditions. An update pattern is one where the deletion or insertion block uses variables. For example, delete data { :Alice ?p ?o }
. An update condition is when some graph pattern(s) in the WHERE
block of the update is used to assert the existence or absence of triples that do not create bindings for the insert nor delete blocks. For example delete data { :Alice foaf:knows :Bob } where { :Bob :dislikes :Alice }
. The purpose of an update condition is to ensure that the state of the model meets some minimum requirements since a mutation may depend on the existence or absence or certain statements.
The following example demonstrates how a client can encode intent using SPARQL update.
Example of a Client Encoding Intent
Take the following set of triples to represent the initial state #01
of our model:
In short, it says that Alice, Bob and Charlie are persons. Additionally, Alice knows Bob.
Let’s assume that two clients, Client X and Client Y, are attempting to mutate the model at state #01
. At the times the clients submit their update, they each believe they are mutating the model at state #01
.
Client X wants to mutate this dataset by removing all persons whom Alice knows. Given the knowledge of the model at state #01
, Client X submits the following instructions:
However, while Client X was preparing to submit this update, another client, Client Y, successfully issued the following update:
making the new state of the model #02
:
Since Client X did not have the most up-to-date knowledge of the model, their intent to remove all persons whom Alice knows was lost in their update. In this hypothetical circumstance, the outcome of their update would have looked as follows:
Notice that Bob was removed as intended, however, Charlie is still present meaning that Client X’s intent to remove all persons whom Alice knows was lost since Client Y’s update created a gap in model state.
Alternatively, Client X can ensure their mutations will be satisfied by providing an update pattern in order to encode intent into their update thusly:
This update now instructs removal of triples about persons whom Alice knows, making the outcome of their update look as intended:
It is also important to note that Client X could have added an update condition which would lead to very different consequences. That is, if Client X wanted to encode “remove all persons whom Alice knows, as long as those persons are exclusively limited to: Bob”, then they could have issued this update:
In which case, their update would fail against the state of the model at #02
since it includes a statement about Alice knowing someone else other than Bob. In this circumstance, the server would then be able to recognize a conflict and handle the update accordingly.