Machine failure is widespread. Since we can’t stop it from happening, we must have a solution to recover the damage it has caused. Imagine a machine failure during program execution. Upon reboot, the machine must know what it has done, what it was doing, and what was still left to do.
For example, four clients send requests to perform different operations on the server. The server queues all the operation requests and executes them sequentially. When the server started to execute the operation request sent by Client C, the server crashed. After rebooting, the server requests all the clients to resend the requests. The server receives the requests from all the clients but not in the same order. The server redoes the same operations, but the data is no longer consistent.
We need to ensure data integrity and durability in the scenarios above. For this purpose, we need to log each operation that modifies data onto the disk in append-only mode. Such a log file is called the write-ahead log (WAL). It helps in the recovery process and atomicity of transactions in case of failure. This log file has different aliases, such as transaction log and commit log.
All the operations that modify the data are first logged onto the disk before performing the operation. The operation log should contain a sufficient amount of information, as shown below, that will help the system to either perform the UNDO
or REDO
operation.
REDO
.UNDO
.Because of WAL, the system only performs the missing operations, thereby reducing the number of disk operations and increasing performance. All the instructions are executed after the last commit
statement.
Imagine creating a list of things to do and appending the everyday to-do list in the same notebook. When we need to check today’s to-do list after a month, opening the notebook and finding the page containing today’s to-do list won’t be easy. If we do the same thing with the log, that won’t be a wise decision either. We can handle such issues by using segmented logs and low-water mark techniques.
Other than the size of the log file, there can be other issues too, such as a client getting disconnected and performing the same operation after getting connected again. So while logging operations in the log file, it is crucial to ensure that duplicate operations are not performed twice.
For example, let’s say a client requested the server to append some data to the file. The server logs that operation and then performs it. However, before the server was able to send the acknowledgment to the client, the client got disconnected. For the client, the operation is still not performed. Upon connecting to the server again, the client sends the request for the append operation once more. If we don’t ensure that duplicate operations are not performed, then the server can contain inconsistent data. We can do this by using a data structure like HashMap
, where all the values are hashed before storing and duplicates are avoided.
Let’s discuss some examples where WAL is used:
Free Resources