Guide to Transactions

Introduction

Transactions encapsulate a number of related actions on a database. They serve two main purposes:

If a problem occurs during one of the later actions, you can easily undo all the previous actions by performing a so called rollback.
While you are performing a data modification consisting of multiple steps, other users of the database can not see the changes you have already made. This prevents reading of inconsistent transitional states. When you have completed the modification, you can commit the transaction, thus making the new data visible to everybody in a single atomic operation.

Different PEGS implementations may offer varying support for the transaction features described above. Please refer to the documentation of your implementation for more information.

Realization

There are two layers of transactions in DRAGOS, user transactions and database transactions. Only user transactions are directly accessed by the user or application, and map to one or more database transactions. This layered architecture allows the addition of sophisticated features like distributed transactions (spawning more than one graph pool, possibly running on different machines on a network) in the future.

To provide a central point for configuring and accessing transaction managers, a single TransactionManagerFactory runs as a service in the DRAGOS kernel. You can register any number of TranssactionManager implementations with this factory. Reason for doing so might be support for optional features like nested transactions, extensions like distributed transactions across several DRAGOS kernels, or support for special features of certain data sources, like a 2-phase commit protocol.

DRAGOS comes with a TranssactionManager implementation that supports both nested transactions and distributed transactions inside a single DRAGOS kernel, and should be sufficient for most situations.

To create a TranssactionManager instance, you simply specify which data sources it should manage, and which one of the registered implementations to use (or just go with the default). For obvious reasons, each data source can only be managed by one transaction manager at a time.

Do not let the name factory confuse you: Once created for any given DataSourceURL, the same transaction manager will be returned for every further call with the same argument. This caching not only increases performance, but also makes the TransactionListener mechanism much more useful and frees the application from keeping a reference to the current transaction manager (because it can be retrieved again at any time).

Transactions in practice

Using the TransactionManager - An example

TransactionManager is the central interface in the transaction architecture. Every instance is associated with at least one graph pool:

Suppose we have three graph pools: GP_1, GP_2 and GP_3. We want distributed transactions for GP_1 and GP_2, but not for GP_3, which belongs to another project running on the same DRAGOS kernel. Then we would have two instances of transaction managers: TM_A which handles GP_1 and GP_2 and TM_B which is associated with GP_3:

TM_A   [GP_1, GP_2]
TM_B   [GP_3]

If you call TM_A to start a user transaction, it will start two database transactions, one in each GP_1 and GP_2. If you did the same call on TM_B, it would only start one database transactions, in GP_3-

The TransactionManagerFactory API contains a method to retrieve the transaction manager instance responsible for a certain graph pool. So if you wanted to perform an operation on GP_2 in our example, you would call TransactionManagerFactory.getInstance().create(DataSourceURL) with the DataSourceURL of GP_2 as the parameter, and it would return TM_A. Then you can use TM_A to start an user transaction and perform the desired operations.

Everything happens inside a transaction

There is a simple rule in DRAGOS: Everything happens inside a transaction. This makes event handling and ensuring schema consistency much easier. Operations that may seem atomic might consist of any number of single steps depending on the PEGS implementation. Wrapping everything in a transaction ensures that you do not end up with a corrupted database if one of these steps fails. Even read-only operations may actually write to the database, e.g. if the value of a dynamic attribute is recalculated and cached.

Guide for Users

Please refer to the example above as well as the API documentation for information on how to use transactions in your application.

Automated transaction handling

DRAGOS aims to be user friendly, so if you do not want to deal with transactions, there is no need to - just activate dragos-ext-autocommit! This extension, implemented using the wrapper mechanism (see "Guide to Wrappers" for more information), intercepts every method call, checks whether a t transaction is currently active, and starts a new one if necessary. Before returning from the method call, the transaction is committed - but only if it was started automatically by this same method. What does this mean in practice? You can still start and commit transactions manually when you want to, without dragos-ext-autocommit getting in your way. But if you call any method without starting a transaction first, it will transparently encapsulate that call in a transaction, thus satisfying the requirement that everything happens inside the scope of a transaction.

Nested transactions and their events

This section deals with some of the finer points of transaction state changes and the events generated by those.

Generally, the order is pretty simple:

the BEFORE_XY event is fired
the actual action is performed
the state is changed to XY'ED
the AFTER_XY event is fired

However, two of the possible operations on transactions have a cascading effect, which complicates matters slightly:

commit() on a top-level transaction changes the state of all descendants from PRE_COMMIT to COMMITED
rollback() on a transaction anywhere in the hierachy changes the state of all descendants to ROLLED_BACK
These operations are atomic, affecting several transactions at once, which we obviously can not (and do not event want to) replicate in the generated events. We want the events to be fired in a sensible order, which in this case means the reverse order of creation, so that the events for any nested transactions are fired before their parent's event.
But since we also want to display the updated status to the outside world as soon as possible, and especially when the event generation described above begins, we first update the status field of all affected transactions before firing the first event. Thus, even during processing of the first AFTER_COMMIT or AFTER_ROLLBACK event (the one generated by the "youngest" transaction), getState() on its ancestors and other transactions in the hierachy will return the correct value.

Guide for PEGS Implementers

The details of implementing transaction support vary widely with the database used (if any). Usually, most of the transaction code will be handled in the implementation of
i3.dragos.core.services.datasources.DataSource and
i3.dragos.core.services.datasources.DataSourceTransaction.

If you are using a database with a JDBC back-end, consider using dragos-db-jdbc which takes care of the data source implementation, so you only have to deal with the actual graph model implementation. As a side effect you also get support for nested transactions for free if the underlying database supports savepoints.

Transactions vs. DataSourceTransactions

The user should be able to operate on graph data in a natural and intuitive way. This means that we did not want to include a Transaction parameter in each method call, instead we associated exactly one Transaction with each thread.

However, we also wanted to have as much flexibility as possible in the implementation of various parts of the system, especially allowing for distribution. This means we do not have the same 1:1 mapping for threads and DataSourceTransactions. Instead, it is up to the implementation to ask the Transaction for the associated DataSourceTransaction, and execute its commands in that context accordingly.

This affects not only the graph model implementation, but also the data source implementation. You have to be prepared to handle multiple parallel transactions in a single thread. The details are back-end specific; for a JDBC-compliant database, you would create a separate Connection for each top-level DataSourceTransaction, share the connection for all nested transaction, and close that connection as soon as the top-level transaction is committed or rolled back.

Nested transactions

A few words about nested transactions: During design and specification of the DRAGOS transaction services, we examined a number of popular RDBMS and ODBMS. It turned out that they either provided no support for nested transactions at all, or that these were simulated using so called checkpoints or savepoints in a single (top-level) transaction. This allows the rollback of nested transactions even if they are already in PRECOMMIT state, a useful feature which we thus decided to make a requirement of the DRAGOS transaction specification. If you have to implement a data source for a DBMS that does not support this, you can either try to simulate that behaviour, or decide not to support nested transactions at all (which should be documented, specified in the DataSourceMetaData, and any attempts to create nested transactions at runtime must result in a TransactionException.

Enforcing the transaction contract

What you have to keep in mind is that everything happens inside a transaction, so you should enforce this in your implementation, throwing a DragosException if no transaction is active when a method is called.

DRAGOS

Extensions

Graph Storages

GTS Integration

Development

Project Documentation