Database Durability and the WAL

How do databases guarantee your data is stored?

In other words, what makes a database “durable”?

Durability

Durability is the “D” in ACID, an acronym often thrown around to describe the basic guarantees that most developers and companies want to have when using a database in order to maintain usable, trustworthy, and reliable data.

Durability essentially means that when a database reports that a write succeeded - any kind of INSERT, UPDATE, or DELETE - that written data is guaranteed to be stored. If the database crashes the second after it reports a successful write, we should still see that new data when we reboot the server because the write was reported as successful.

We need to be confident that data is there if the database already said that it is.

So, how do databases guarantee this?

Introducing the WAL

The primary weapon of choice is the Write-Ahead Log or the WAL. Once a database has determined that the write request is valid, it first records the changes in the WAL - in other words, it writes ahead to the WAL before making the actual changes to the data in the table.

When the write to the WAL is successful, it can make the necessary changes to the data in the table and then consider the transaction successful.

When the database starts back up, it checks the WAL to determine if it needs to fix any incorrect or stale data in its tables due to a crash in the middle of a write. Thus making the database “durable”.

So don’t worry, if your Humpty Dumpty database falls off the WAL, all the storage engine’s horses and all its men can use the WAL to put your Humpty Dumpty database back together again.

Sounds Slow to Me!

It turns out that writing a data change twice takes more time and computation than writing it once. Changes to a hard drive are especially time-intensive at scale. So, can we turn off the WAL?

Most databases support this feature, but you have to remember that you’re voluntarily sacrificing durability when you do so. If catastrophic failure occurs, you have no guarantee that your saved data is actually safe.

It turns out some applications choose this for specific tables as a sort of cache, or to store non-critical logs - by adjusting settings like disabling the WAL, write and read operations on their chosen table speed up dramatically. It’s fine to lose some of this data as it is not critical to the application’s functionality.

Overall, it is wise to exercise extreme caution when turning off the WAL. Doing so means accepting the possibility of data loss, and your core application data should not be subject to that possiblity.

Using the WAL for replication

The WAL is also used as a part of replication in distributed leader-replication databases, where there is a “leader” database to which all write requests go, and from which all writes are replicated to its “follower” databases which can in turn receive the bulk of the read requests. Those replicated writes can either be communicated in the form of specific, lower level write details that look similar (or exactly the same) to how they are stored in the WAL, or they go in the form of repeating the SQL DDL query to the followers who then re-interpret it to the write operations needed and then determine what changes need to be made to the WAL and the data in the tables.

And that’s one of the simpler components of the many things databases do for us!

I hope you learned something. Take care.