Two Phase Commit (2-PC) is a consistency protocol that is widely used in many classical distributed systems as well as database management. The main objective of this protocol is to have a distributed agreement among all participants on saving data updates under failures. In many distributed applications, we want all servers and their replicas to be consistent ,i.e., reflect the same data changes at all times. Similarly, in databases where a database might be partitioned into multiple machines for performance gains, a transaction might touch more than one partition. However, all partitions must be consistent. So, how do we guarantee consistency? Here’s where 2-PC comes in.
As the name suggests, 2-PC consists of two distinct phases that allow the system to rollback an update under failures. Let’s dive right into the protocol.
In the protocol, there is a special node called the Coordinator server which coordinates all operations among other nodes. The coordinator server can be selected through any Leader Election algorithm. First, the coordinator server sends a “prepare” message that asks all servers to prepare for an update. When servers receive this update, they save the update to the disk, but they do not commit the update, meaning that this is a temporary update that can be rolled back. Servers reply with a “Yes” or “No”- if a server sees the update and has saved it in the disk, it replies with a “Yes”. If, for example, the disk has been corrupted or there are other memory issues the server may reply back with a “No”. In the first phase of 2-PC, the update is temporary at each server.
In the second phase, if any of the servers reply with “No”, or if there is a timeout before all the replies are received, then the coordinator server issues an “Abort” message. The message stops the update and, once each server receives this message, they delete the update from the disk. If all of the servers reply back with “Yes” within the timeout value, the coordinator server issues a “Commit” message. Upon receiving this message, each server commits the update by pulling the update from the disk and transferring it into the server’s data store.
Once the update is finalized and usable, each server sends out an “OK” message to the coordinator server. As soon as the coordinator receives all the “OK” messages, it will determines if the update has been committed.
Lets now analyze how 2-PC deals with failures:
Through the failure analysis, we can observe that 2-PC is a consistency protocol with a stringent correctness condition, meaning that if one server commits the update, no other server rejects it. If one server rejects it, no other server commits it. Therefore it is considered all-or-nothing.
Free Resources