Introduction

Since DRAGOS deals with large and complex data structures in a possibly distributed context, data consistency is a primary concern. This document describes the various aspects of the topic and the design and realization in DRAGOS.

Stale References

After deleting a graph entity or undeclaring a graph entity class or an Attribute, an application may still hold one or more references to it through the corresponding Java object(s). This is especially true for distributed applications.

Obviously, it does not make sense to perform "writing" operations on a deleted entity, e.g. creating nodes inside a graph that no longer exists. That is why at the beginning of any such method, the existence of the object it is called on is verified, and a runtime exception is thrown if it does not exist any more. This is necessary to avoid cluttering the database with inaccessible and inconsistent data and helps detecting usage errors in the application.

A disadvantage of these checks is that they add overhead to every method call. A correct application does not need these checks, and even in the distributed case the deletion of objects can be easily detected and reacted to through the event system. This is why we made the checks optional for "reading" operations (those that do not modify the graph or schema data).

You can toggle these checks through the forceExistenceCheckOnRead method of GraphPool. If deactivated, the results of a method call on a stale reference are undetermined: the implementation may throw an exception, return cached data or even purely random garbage. An implementation may even choose to always perform existence checks even when it is not forced to do so! If however you enforce these checks by setting the flag to true, all implementations must perform existence checks on every method, and throw an exception if the object does no longer exist. This provides a determinable and fail-fast behavior, and should be the modus operandi during development and testing of the application, and even during regular use if performance is of lesser concern.

Some methods are guaranteed to work even on stale references: getDataSourceURL(), getInternalIdentifier() and the methods used to access wrapped graph entities / graph entity classes / attributes. These methods are also excluded from the forced existence checks. One of the reasons for this are deletion events - the source information would be useless if you could not identify it or compare it to other objects. Several other basic methods, namely toString(), hashCode() and equals(Object) are also expected to work even after deletion - if this poses a problem for your implementation, consider using the proxies from the i3.dragos.gm.core.proxies package.

Graph and Schema Data Inconsistencies

In DRAGOS, we distinguish between two types of inconsistencies: Those that may never occur, and those that may occur temporarily during the course of a transaction.

Some inconsistencies have to be prevented to allow useful operation on the data at all. They are prevented by design, either through checks that throw an EntityInUseException if the object in question is still referenced somewhere, or through cascading deletion of dependent objects. See Schema.undeclareGraphEntityClass(GraphEntityClass) for an example of both techniques. If a graph model implementation or the application detects a violation of these invariants (e.g. through assertions), they should be treated as an internal error, usually by raising an exception, because the code is probably buggy and the database needs to be repaired manually.

However, it makes sense to allow some violations of the schema's restrictions. An obvious example would be minimum cardinalities, since you can not create multiple entities at once. A core graph model implementation must be prepared for these cases, and return correct data even in the presence of such inconsistencies. The expected behavior in these cases is described in the core GM API doc and below. An application should also be aware of these, especially during event handling with EventCouplingMode.IMMEDIATE. These inconsistencies must be corrected before the transaction can be committed - this is enforced through the checks automatically performed upon commit. If a commit is attempted while one or more inconsistencies still exist, the commit call will fail with an exception.

The following table should provide a mostly complete overview of the possible inconsistencies and their class. Keep in mind that references here means cases like the (deleted) node class is declared as target type of an edge class, not the stale references discussed above!

ProblemTemporarily allowed
during a transaction
Never allowed
Dangling edges, relation endsX (1)
Violation of edge or relation end cardinalitiesX (2)
Edges or relation ends pointing to deleted entitiesX
Entities without associated graph entity classX
Instances of undeclared attributesX
References to undeclared graph entity classes inside schema
(including dangling edge classes, relation end classes)
X
Inheritance cycles in the schemaX

Details

  1. Dangling references are represented by a value of null for the source or target. In most cases, no special treatment is required. A few methods however traverse edges / relation ends, and those will throw an DanglingReferenceException if a dangling reference is encountered. You can easily identify these methods by looking up use of this exception in the API doc.
  2. Cardinalities are only checked upon commit, since all core GM operations are defined without any regards to cardinalities. The only thing you have to keep in mind is that collections returned by various methods might not have the size you usually expect (e.g. be empty or contain more than one element), so be careful when iterating or using CollectionHelper.