What is a two phase commit (2-PC)?

Two Phase Commit (2-PC) is a consistency protocol that is widely used in many classical distributed systems as well as database management. The main objective of this protocol is to have a distributed agreement among all participants on saving data updates under failures. In many distributed applications, we want all servers and their replicas to be consistent ,i.e., reflect the same data changes at all times. Similarly, in databases where a database might be partitioned into multiple machines for performance gains, a transaction might touch more than one partition. However, all partitions must be consistent. So, how do we guarantee consistency? Here’s where 2-PC comes in.

First Phase

In the protocol, there is a special node called the Coordinator server which coordinates all operations among other nodes. The coordinator server can be selected through any Leader Election algorithm. First, the coordinator server sends a “prepare” message that asks all servers to prepare for an update. When servers receive this update, they save the update to the disk, but they do not commit the update, meaning that this is a temporary update that can be rolled back. Servers reply with a “Yes” or “No”- if a server sees the update and has saved it in the disk, it replies with a “Yes”. If, for example, the disk has been corrupted or there are other memory issues the server may reply back with a “No”. In the first phase of 2-PC, the update is temporary at each server.

Second Phase

In the second phase, if any of the servers reply with “No”, or if there is a timeout before all the replies are received, then the coordinator server issues an “Abort” message. The message stops the update and, once each server receives this message, they delete the update from the disk. If all of the servers reply back with “Yes” within the timeout value, the coordinator server issues a “Commit” message. Upon receiving this message, each server commits the update by pulling the update from the disk and transferring it into the server’s data store.

Once the update is finalized and usable, each server sends out an “OK” message to the coordinator server. As soon as the coordinator receives all the “OK” messages, it will determines if the update has been committed.

Failures in 2-PC

Lets now analyze how 2-PC deals with failures:

Servers only finalize the update after the “Commit” message, which implies that all servers are ready and have the update on the disk.
If a server voted “No”, it can abort right away and delete the update from the disk as it knows that the coordinator server would never send a “Commit” message after receiving a “No”.
Each server saves the tentative update to the disk before replying “Yes” or “No” in the first phase, so even if a server fails, the update is retrievable after crash recovery.
The coordinator server logs every decision, including each sent and received message. If a coordinator server fails, then it can resume after recovery using these logs.
If the coordinator server cannot recover, a new coordinator will selected through a Leader Election algorithm. In this case, the new coordinator has two ways to proceed:
- it can poll other servers to get their responses again
- it can start the data operation anew

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources