Understanding Database Transactions: Atomicity, Consistency, Isolation, and Durability
Database transactions are essential for keeping systems reliable and consistent. They follow the ACID principles (Atomicity, Consistency, Isolation, Durability), which ensure that data operations are completed accurately, even in complex environments with many processes happening at the same time. In this blog, we will look at the four main parts of ACID: Atomicity, Consistency, Isolation, and Durability, and explain how they work and why they matter.
What is transaction?
A transaction is a series of operations that are treated as a single unit of work. For example, a transaction can involve transferring money between accounts or updating multiple records. The key is ensuring that the entire operation either completes successfully or fails entirely, leaving the system unchanged. For instance:
-
Deduct $100 from Account A.
-
Add $100 to Account B.
If either step fails, the transaction rolls back, undoing all changes to maintain consistency and integrity. This reliability is achieved through the ACID properties.
Atomicity
Atomicity ensures that a transaction is all-or-nothing. If any part of a transaction fails, all changes made during the transaction are rolled back. This guarantees that incomplete or corrupt data never persists in the database.
For instance, during a fund transfer, if the system crashes after deducting money from one account but before adding it to the other, atomicity ensures the transaction is undone, preserving data consistency.
Consistency
Consistency ensures that a database transitions from one valid state to another. It can be categorized into:
- Consistency in Data:
- This means the data stored in the database must follow certain rules and constraints, such as referential integrity, which links related data. For instance, in a table that stores picture information, there might be a column for "likes." The total likes in this table should match the number of actual records in a related table tracking individual likes.
- If data is inconsistent—for example, if a picture record claims five likes but only two are recorded in the related table—it leads to errors or corruption.
- Similarly, if there are likes pointing to a picture that no longer exists, it creates orphaned records, which further break consistency.
- Consistency in Reads:
- This ensures that once a transaction is completed and changes are committed, those changes are immediately visible to any subsequent read operations.
- However, in distributed systems, this can be tricky because data changes might take time to sync across different parts of the system, leading to stale or outdated reads.
Eventual Consistency:
In distributed databases, systems sometimes allow temporary inconsistencies to improve speed and scalability. This is known as eventual consistency. It means that while a newly committed change may not be visible immediately, it will eventually sync across the system. For example, if a like is added in one database shard, it might take a few seconds before it shows up everywhere. This trade-off between strong (instant) consistency and performance is often necessary in large-scale systems.
Isolation
Isolation ensures that multiple transactions running at the same time do not interfere with each other. It protects the database's integrity by making sure the operations of one transaction remain independent of another. Without proper isolation, you might encounter issues where data is inconsistent or operations are not applied correctly.
Issues Caused by Poor Isolation
-
Dirty Reads:
- This happens when a transaction reads data that another transaction has changed but not yet committed. If the other transaction rolls back, the data read becomes invalid, leading to inconsistencies.
- Example: Transaction A updates a product's price but doesn’t commit. Meanwhile, Transaction B reads the updated price. If Transaction A rolls back, the price Transaction B used is incorrect.
-
Non-Repeatable Reads:
- Occurs when a transaction reads the same data multiple times but gets different results because another transaction modified the data in the meantime.
- Example: Transaction A reads a user’s balance as $100. While it’s running, Transaction B changes the balance to $150. When Transaction A re-reads the balance, it sees $150 instead of $100.
-
Phantom Reads:
- A transaction retrieves rows that match a condition. If another transaction inserts or deletes rows that also match the condition, the original transaction will see different results if it repeats the query.
- Example: Transaction A queries all orders placed today. Meanwhile, Transaction B adds a new order for today. When Transaction A re-runs its query, it now sees the new order.
-
Lost Updates:
- Two transactions update the same data simultaneously, and one of the updates is overwritten, resulting in lost information.
- Example: Transaction A sets a product’s stock to 10. Transaction B, unaware of A’s change, sets the stock to 5. Transaction A’s update is lost.
Isolation Levels Isolation levels define how much interaction is allowed between concurrent transactions. The trade-off lies between strict consistency (which can slow down the system) and high performance (which risks inconsistency). Here’s a deeper dive into each level:
- Read Uncommitted
This is the least restrictive isolation level. Transactions can see changes made by other transactions even if those changes haven’t been committed yet. This can lead to inconsistent or invalid data being read. This type of isolation level is very fast in execution but it result in highly inconsistent read.
This type of isolation level result in
- Dirty Reads: Reading data that might be rolled back later.
- Non-Repeatable Reads: Re-reading data might give different results because of concurrent changes.
- Phantom Reads: Rows added or removed during the transaction may appear or disappear.
- Lost Updates: Concurrent updates might overwrite each other.
Example: Transaction A updates a product price from $10 to $12 but hasn’t committed. Transaction B reads the price as $12, assuming it’s valid. Later, Transaction A rolls back, making Transaction B’s read invalid.
Read Uncommitted type of isolation level rarely used in critical applications but might be acceptable for logging or non-critical reporting.
- Read Committed
This level ensures that only committed changes are visible to other transactions. A transaction cannot read changes made by others until those changes are committed. It prevent dirty reads but it stills allow
- Non-Repeatable Reads: Data read once might differ if it’s read again, as another transaction may have modified it in the meantime.
- Phantom Reads: New rows might appear in subsequent reads due to concurrent inserts.
- Lost Updates: Concurrent updates might overwrite each other.
Example: Transaction A commits an update to a product price from $10 to $12. Transaction B, running concurrently, will only see the new price once Transaction A commits.
Read Committed isolation level commonly used in systems where dirty reads must be avoided but occasional inconsistencies are acceptable, such as order systems.
- Repeatable Read
In this type of isolation level, Transactions are guaranteed to see the same data when reading the same rows multiple times, even if other transactions modify the data. However, new rows (phantoms) may still appear in range queries. While it prevents dirty reads and non-repeatable reads but still allow
- Phantom Reads: Range queries might still include new rows added by other transactions.
- Lost Updates: Concurrent updates may still overwrite each other. Example: Transaction A reads a row of data twice. Even if Transaction B updates the row in the meantime, Transaction A sees the original value both times. However, if Transaction B inserts a new row, Transaction A might include it in subsequent range queries.
it is Useful for financial systems where ensuring stable reads is critical, such as generating reports.
- Serializable The strictest isolation level. Transactions are executed one at a time, as if serialized. No other transaction can interfere while a transaction is running, ensuring full isolation. It revents all anomalies like dirty reads, non-repeatable reads, phantom reads, and lost updates. While it prevent all anomalies, However performance can degrade significantly in high-concurrency environments due to locking or queueing.
Example: Transaction A runs a complex query and locks the data it accesses. Transaction B must wait until Transaction A finishes before it can execute, ensuring no conflicts. This type of isolation level essential for scenarios requiring the highest level of accuracy, such as banking or inventory management.
- Snapshot Isolation
A middle ground between performance and consistency. Each transaction works with a snapshot of the database as it was when the transaction began, ensuring consistent reads throughout the transaction. Updates are handled carefully to avoid conflicts.
It prevents dirty reads, non-repeatable reads, and often phantom reads, but conflicts might arise when multiple transactions attempt to update the same data.
Example: If Transaction A reads a snapshot of the database and Transaction B updates data, Transaction A continues to work with the snapshot version, ignoring Transaction B’s changes until it starts a new transaction.
Snapshot isolation Often used in databases like PostgreSQL for applications needing consistent reads with better performance than Serializable.
Choosing the Right Isolation Level The choice depends on the specific application’s needs:
- Performance Priority: Use Read Uncommitted or Read Committed.
- Balanced: Use Repeatable Read or Snapshot Isolation.
- Highest Accuracy: Use Serializable.
Each level represents a trade-off between strictness and efficiency. Understanding your application’s requirements is key to choosing the optimal isolation level.
Durability
Durability ensures that once a transaction has been committed, its changes are permanently stored, even in the event of a system crash or power failure. This is achieved by writing the changes to non-volatile storage such as disk.
For example:
- In a banking system, if a transaction to transfer funds is committed, the updated balances for both accounts will remain intact even if the database server crashes immediately afterward.
Implementation:
-
Many databases use a write-ahead log (WAL) or similar mechanisms to ensure durability. Changes are first recorded in the log before being applied to the database.
-
In distributed systems, durability often involves replicating data across multiple nodes to prevent data loss in case of hardware failure.
By guaranteeing durability, databases provide reliability and trust that committed data will not be lost, regardless of unforeseen events.
Practical Implementation of isolation level in Databases
This section discusses how databases manage concurrency, ensuring consistent and isolated transactions using two main approaches: Pessimistic Concurrency Control and Optimistic Concurrency Control. Here’s a deeper dive into each concept:
- Pessimistic Concurrency Control This approach relies on locking resources (e.g., rows, tables, or pages) to prevent conflicts between concurrent transactions. When a transaction accesses a resource, it locks it, ensuring no other transaction can modify it until the lock is released.
- Benefits:
- Guarantees strict isolation by preventing other transactions from interfering.
- Ideal for high-conflict environments where many transactions may try to access the same data simultaneously.
- Drawbacks:
- Can lead to lock contention, where multiple transactions wait for the same resource, reducing performance.
- Deadlocks can occur if transactions hold locks while waiting for others, creating a cycle.
- Example:
- When updating a bank account balance, the database locks the account row. Other transactions trying to access this row must wait until the first transaction finishes and releases the lock.
- Optimistic Concurrency Control
Instead of locking, this approach assumes that conflicts are rare. Transactions execute without restrictions, and any conflicts are detected at the commit phase. If a conflict is found, the transaction is retried.
- Benefits:
- No locks mean better performance in low-conflict environments.
- Reduces the overhead of lock management.
- Drawbacks:
- Conflicts detected late in the transaction can require expensive rollbacks and retries.
- Not suitable for high-conflict scenarios where retries would frequently occur.
- Example:
- A transaction reads a product’s stock level as 10 and decrements it to 9. If another transaction updates the stock to 8 during the process, the conflict is detected at commit, and one transaction retries.
Database-Specific Implementations
- PostgreSQL:
Uses Multi-Version Concurrency Control (MVCC) to manage concurrency. Instead of locking rows, it creates new versions of rows when updates are made. Transactions see a consistent snapshot of the database, improving performance and avoiding many locking issues.
- MySQL and SQL Server:
Use undo logs to maintain old versions of data for managing conflicts. Locking mechanisms ensure that critical resources are accessed safely, especially under higher isolation levels like Serializable.
Summary
Understanding and implementing Atomicity, Consistency, Isolation, and Durability is crucial for building robust database systems. Each property addresses specific challenges:
- Atomicity prevents incomplete operations from corrupting data.
- Consistency ensures data integrity and reliable reads.
- Isolation protects transactions from concurrency issues.
- Durability guarantees that committed data persists even in adverse conditions.
By balancing these principles with system requirements, database architects can design systems that are not only performant but also maintain high data integrity. Whether working with relational databases or NoSQL systems, mastering these concepts is foundational for any developer or database administrator.