How do distributed transactions work




















Distributed Transaction. Techopedia Explains Distributed Transaction. What Does Distributed Transaction Mean? Techopedia Explains Distributed Transaction Databases are standard transactional resources, and transactions usually extend to a small number of such databases. In such cases, a distributed transaction may be viewed as a database transaction that should be synchronized between various participating databases allocated between various physical locations.

The isolation property presents a unique obstacle for multi-database transactions. Synonyms Distributed Transactions. Share this Term. Tech moves fast! The same thing happens while creating the order in the OrderMicroservice. Once the Coordinator has confirmed all microservices are ready to apply their changes, it will then ask them to apply their changes by requesting a commit with the transaction.

At this point, all objects will be unlocked. If at any point a single microservice fails to prepare, the Coordinator will abort the transaction and begin the rollback process.

Here is a diagram of a 2pc rollback for the customer order example:. In the above example, the CustomerMicroservice failed to prepare for some reason, but the OrderMicroservice has replied that it is prepared to create the order. The Coordinator will request an abort on the OrderMicroservice with the transaction and the OrderMicroservice will then roll back any changes made and unlock the database objects.

First, the prepare and commit phases guarantee that the transaction is atomic. The transaction will end with either all microservices returning successfully or all microservices have nothing changed. Secondly, 2pc allows read-write isolation. This means the changes on a field are not visible until the coordinator commits the changes. While 2pc has solved the problem, it is not really recommended for many microservice-based systems because 2pc is synchronous blocking.

The protocol will need to lock the object that will be changed before the transaction completes. In the example above, if a customer places an order, the "fund" field will be locked for the customer.

This prevents the customer from applying new orders. This makes sense because if a "prepared" object changed after it claims it is "prepared," then the commit phase could possibly not work. This is not good. In a database system, transactions tend to be fast—normally within 50 ms. However, microservices have long delays with RPC calls, especially when integrating with external services such as a payment service.

The lock could become a system performance bottleneck. Also, it is possible to have two transactions mutually lock each other deadlock when each transaction requests a lock on a resource the other requires. The Saga pattern is another widely used pattern for distributed transactions. It is different from 2pc, which is synchronous. The Saga pattern is asynchronous and reactive. In a Saga pattern, the distributed transaction is fulfilled by asynchronous local transactions on all related microservices.

The microservices communicate with each other through an event bus. In this example, sales is a database server and a client because the application also modifies data in the sales database.

A node that must reference data on other nodes to complete its part in the distributed transaction is called a local coordinator.

In Figure , sales is a local coordinator because it coordinates the nodes it directly references: warehouse and finance. The node sales also happens to be the global coordinator because it coordinates all the nodes involved in the transaction. A local coordinator is responsible for coordinating the transaction among the nodes it communicates directly with by:. The node where the distributed transaction originates is called the global coordinator. The database application issuing the distributed transaction is directly connected to the node acting as the global coordinator.

For example, in Figure , the transaction issued at the node sales references information from the database servers warehouse and finance. The global coordinator becomes the parent or root of the session tree. The global coordinator performs the following operations during a distributed transaction:. The job of the commit point site is to initiate a commit or roll back operation as instructed by the global coordinator.

The system administrator always designates one node to be the commit point site in the session tree by assigning all nodes a commit point strength. The node selected as commit point site should be the node that stores the most critical data.

Figure illustrates an example of distributed system, with sales serving as the commit point site:. The commit point site is distinct from all other nodes involved in a distributed transaction in these ways:. A distributed transaction is considered committed after all non-commit point sites are prepared, and the transaction has been actually committed at the commit point site.

The online redo log at the commit point site is updated as soon as the distributed transaction is committed at this node. Because the commit point log contains a record of the commit, the transaction is considered committed even though some participating nodes may still be only in the prepared state and the transaction not yet actually committed at these nodes.

In the same way, a distributed transaction is considered not committed if the commit has not been logged at the commit point site. Every database server must be assigned a commit point strength. If a database server is referenced in a distributed transaction, the value of its commit point strength determines which role it plays in the two-phase commit. Specifically, the commit point strength determines whether a given node is the commit point site in the distributed transaction and thus commits before all of the other nodes.

This section explains how Oracle determines the commit point site. The commit point site, which is determined at the beginning of the prepare phase, is selected only from the nodes participating in the transaction. The following sequence of events occurs:. Figure shows in a sample session tree the commit point strengths of each node in parentheses and shows the node chosen as the commit point site:.

As Figure illustrates, the commit point site and the global coordinator can be different nodes of the session tree. The commit point strength of each node is communicated to the coordinators when the initial connections are made.

The coordinators retain the commit point strengths of each node they are in direct communication with so that commit point sites can be efficiently selected during two-phase commits.

Therefore, it is not necessary for the commit point strength to be exchanged between a coordinator and a node each time a commit occurs. Unlike a transaction on a local database, a distributed transaction involves altering data on multiple databases. Consequently, distributed transaction processing is more complicated, because Oracle must coordinate the committing or rolling back of the changes in a transaction as a self-contained unit. In other words, the entire transaction commits, or the entire transaction rolls back.

Oracle ensures the integrity of data in a distributed transaction using the two-phase commit mechanism. In the prepare phase , the initiating node in the transaction asks the other participating nodes to promise to commit or roll back the transaction. During the commit phase , the initiating node asks all participating nodes to commit the transaction. If this outcome is not possible, then all nodes are asked to roll back.

All participating nodes in a distributed transaction should perform the same action: they should either all commit or all perform a rollback of the transaction. Oracle automatically controls and monitors the commit or rollback of a distributed transaction and maintains the integrity of the global database the collection of databases participating in the transaction using the two-phase commit mechanism.

This mechanism is completely transparent, requiring no programming on the part of the user or application developer. The commit mechanism has the following distinct phases, which Oracle performs automatically whenever a user commits a distributed transaction:. The initiating node, called the global coordinator , asks participating nodes other than the commit point site to promise to commit or roll back the transaction, even if there is a failure.

In all of these situations, when scalability is not a concern, we might consider distributed transactions an option. The technical requirements for two-phase commit are that you need a distributed transaction manager such as Narayana and a reliable storage layer for the transaction logs. If you are lucky to have the right data sources but run in a dynamic environment, such as Kubernetes , you also need an operator-like mechanism to ensure there is only a single instance of the distributed transaction manager.

The transaction manager must be highly available and must always have access to the transaction log. For implementation, you could explore a Snowdrop Recovery Controller that uses the Kubernetes StatefulSet pattern for singleton purposes and persistent volumes to store transaction logs. What all of these technologies have in common is that they implement the XA specification and have a central transaction coordinator.

In our example, shown in Figure 4, Service A is using distributed transactions to commit all changes to its database and a message to a queue without leaving any chance for duplicates or lost messages. Similarly, Service B can use distributed transactions to consume the messages and commit to Database B in a single transaction without any duplicates. Or, Service B can choose not to use distributed transactions, but use local transactions and implement the idempotent consumer pattern.

For the record, a more appropriate example for this section would be using WS-AtomicTransaction to coordinate the writes to Database A and Database A in a single transaction and avoid eventual consistency altogether. But that approach is even less common, these days, than what I've described.

The two-phase commit protocol offers similar guarantees to local transactions in the modular monolith approach, but with a few exceptions. Because there are two or more separate data sources involved in an atomic update, they may fail in a different manner and block the transaction.

But thanks to its central coordinator, it is still easy to discover the state of the distributed system compared to the other approaches I will discuss. With a modular monolith, we use local transactions and we always know the state of the system. With distributed transactions based on the two-phase commit protocol, we also guarantee a consistent state. The only exception would be an unrecoverable failure that involved the transaction coordinator.

But what if we wanted to ease the consistency requirements while still knowing the state of the overall distributed system and coordinating from a single place? In this case, we might consider an orchestration approach, where one of the services acts as the coordinator and orchestrator of the overall distributed state change.

The orchestrator service has the responsibility to call other services until they reach the desired state or take corrective actions if they fail. The orchestrator uses its local database to keep track of state changes, and it is responsible for recovering any failures related to state changes. The many homegrown systems implementing the Saga pattern are also in this category. In our example diagram, shown in Figure 5, we have Service A acting as the stateful orchestrator responsible to call Service B and recover from failures through a compensating operation if needed.

The crucial characteristic of this approach is that Service A and Service B have local transaction boundaries, but Service A has the knowledge and the responsibility to orchestrate the overall interaction flow. That is why its transaction boundary touches Service B endpoints. In terms of implementation, we could set this up with synchronous interactions, as shown in the diagram, or using a message queue in between the services in which case you could use a two-phase commit, too.

Orchestration is an eventually consistent approach that may involve retries and rollbacks to get the distribution into a consistent state. While it avoids the need for distributed transactions, orchestration requires the participating services to offer idempotent operations in case the coordinator has to retry an operation.

Participating services also must offer recovery endpoints in case the coordinator decides to roll back and fix the global state. The big advantage of this approach is the ability to drive heterogeneous services that might not support distributed transactions into a consistent state by using only local transactions.

The coordinator and the participating services need only local transactions, and it is always possible to discover the state of the system by asking the coordinator, even if it is in a partially consistent state. Doing that is not possible with the other approaches I will describe. As you've seen in the discussion so far, a single business operation can result in multiple calls among services, and it can take an indeterminate amount of time before a business transaction is processed end-to-end.

To manage this, the orchestration pattern uses a centralized controller service that tells the participants what to do. An alternative to orchestration is choreography , which is a style of service coordination where participants exchange events without a centralized point of control. With this pattern, each service performs a local transaction and publishes events that trigger local transactions in other services. Each component of the system participates in decision-making about a business transaction's workflow, instead of relying on a central point of control.

Historically, the most common implementation for the choreography approach was using an asynchronous messaging layer for the service interactions. Figure 6 illustrates the basic architecture of the choreography pattern. For message-based choreography to work, we need each participating service to execute a local transaction and trigger the next service by publishing a command or event to a messaging infrastructure.

Similarly, other participating services have to consume a message and perform a local transaction. That in itself is a dual-write problem within a higher-level dual-write problem.

When we develop a messaging layer with a dual write to implement the choreography approach, we could design it as a two-phase commit that spans a local database and a message broker. I covered that approach earlier.

Alternatively, we might use a publish-then-local-commit or local-commit-then-publish pattern:. The various ways of implementing a choreography architecture constrain every service to write only to a single data source with a local transaction, and nowhere else. Service B periodically polls Service A and detects new changes.

When it reads the change, Service B updates its own database with the change and also the index or timestamp up to which it picked up the changes. The critical part here is the fact that both services only write to their own database and commit with a local transaction.

This approach, illustrated in Figure 7, can be described as service choreography , or we could describe it using the good old data pipeline terminology. The possible implementation options are more interesting. The industry tries to avoid that level of coupling with shared tables, however, and for a good reason: Any change in Service A's implementation and data model could break Service B. We can make a few gradual improvements to this scenario, for example by using the Outbox pattern and giving Service A a table that acts as a public interface.



0コメント

  • 1000 / 1000