Durability

Durability ensures that once a transaction is committed, it will remain committed even in the case of a system failure. Durability guarantees that the changes made by a transaction are permanent and will not be lost, however it introduces latency.

If the crash happens during the commit the database cannot guarantee durability. The system is durable only when the commit is successful( the data is fully written to disk). That is why commit speeds are critical, the faster you can commit the lower the chances of such corruption.

The process of persisting the writes that the clients make to the database to a non-volatile system storage is called write-ahead logging. Even if the database crashes, the changes made by the transaction are still available in the log files. When the database comes back online, it can replay the log files to recover the changes made by the transaction.

Durability Techniques

Write-Ahead Logging (WAL)

Write-Ahead Logging is a technique used by databases to ensure durability. In WAL, the database writes the changes to the log file before writing them to the actual data files.

When a transaction is committed, the database writes the changes to the log file and then to the data files. If the database crashes before the changes are written to the data files, it can replay the log files to recover the changes.

Writing a lot of data to disk is expensive, so databases use a technique called group commit to write multiple changes to the disk in a single operation. The database persists a compressed version of the changes to the log file, known as WAL Segments, and then writes them to the disk in a single operation.

Asynchronous Snapshot

In Asynchronous Snapshot, the database writes everything to memory and in the background takes a snapshot of the data and writes it to the disk at once. This technique is faster than Write-Ahead Logging because it writes to the disk in a single operation.

Append-Only File (AOF)

In Append-Only File, the database writes all the changes to a log file (similar to WAL). When the database starts, it reads the log file and applies the changes to the data files.

This technique is used by databases like Redis.

Database Implementations

Different databases implement durability differently: 1. MySQL: Uses InnoDB storage engine that supports Write-Ahead Logging. 2. PostgreSQL: Uses Write-Ahead Logging and MVCC (Multi-Version Concurrency Control) to ensure durability. 3. Redis: Uses Append-Only File to ensure durability. 4. MongoDB: Uses journaling to ensure durability. 5. Cassandra: Uses commit logs to ensure durability. 6. SQLite: Uses Write-Ahead Logging to ensure durability. 7. Oracle: Uses Redo Logs to ensure durability.

OS Cache

The Operating System Cache is a cache that the operating system uses to store data in memory. When the database writes data to the disk, it writes to the OS Cache first and then the OS Cache batch-writes to the disk. This is done to improve performance, as writing to the disk is slower than writing to memory.

The problem with that is that the OS will tell the database that the data has been written to the disk, even though it is still in the OS Cache. The database believes that because the OS has acknowledged the writes, the data is safe. If the system crashes before the OS Cache writes to the disk, the data will be lost. And it is assumed that the database is not durable.

Most of the applications use OS cache, which may not be fine for databases to use. Fsync OS command forces writes to always go to disk and bypass the cache. This can be expensive, but it ensures durability.

Databases that use Fsync:

PostgreSQL: Uses fsync to ensure durability.
MySQL: Uses InnoDB storage engine that supports fsync.
SQLite: Uses fsync to ensure durability.
Oracle: Uses Redo Logs to ensure durability.
Cassandra: Uses commit logs to ensure durability.
Redis: Uses Append-Only File to ensure durability.
MongoDB: Uses journaling to ensure durability.
SQL Server: Uses Write-Ahead Logging to ensure durability.
MariaDB: Uses InnoDB storage engine that supports fsync.

Redis gives you the option to choose between fsync and no fsync to trade-off between durability and performance. Fsync doesn't have to be used for every write, but it should be used for critical writes or when the system is shutting down.

Example

Consider a banking application where a user transfers money from one account to another. The transaction is successful, and the money is deducted from the sender's account and credited to the receiver's account. If the database crashes before the changes are written to the disk, the changes will be lost. This is where durability comes into play. The database ensures that the changes made by the transaction are permanent and will not be lost, even in the case of a system failure.

Summary

Durability ensures that once a transaction is committed, it will remain committed even in the case of a system failure.
Durability guarantees that the changes made by a transaction are permanent and will not be lost.
The process of persisting the writes that the clients make to the database to a non-volatile system storage is called write-ahead logging.
Different databases implement durability differently, using techniques like Write-Ahead Logging, Asynchronous Snapshot, and Append-Only File.
The Operating System Cache is a cache that the operating system uses to store data in memory.
Fsync OS command forces writes to always go to disk and bypass the cache, ensuring durability.
Databases like PostgreSQL, MySQL, SQLite, Oracle, Cassandra, Redis, MongoDB, SQL Server, and MariaDB use Fsync to ensure durability.
Redis gives you the option to choose between fsync and no fsync to trade-off between durability and performance.
Fsync doesn't have to be used for every write, but it should be used for critical writes or when the system is shutting down.