Iceberg has ACID Transactions: A Quick Guide to What that Means
If databases didn’t have ACID transactions, we would all be using spreadsheets still.
If databases didn’t have ACID transactions, we would all be using spreadsheets still.
Transactions on a database are a single operation of work.
ACID transactions have the properties of atomicity, consistency, isolation, and durability.
Iceberg enables ACID transactions on files in your data lake. A key component of having a data lakehouse architecture.
Here’s an explanation of ACID transactions and how Iceberg offers ACID properties.
ACID Transactions
Database systems have ACID transactions to ensure data is consistent for concurrent operations.
Atomicity: All statements will succeed as a single unit or the transaction will fail.
Consistency: A transaction will not violate any rules of the database (i.e. Primary Keys are unique if your database enforces this).
Isolation: Concurrent operations do not impact each other.
Durability: A committed transaction will persist.
ACID Transactions in Iceberg
Atomicity: Every write in Iceberg creates a new snapshot. The snapshot is the current state of the table represented by a list of all files in that moment. The snapshot captures all table changes. Iceberg only updates a table when it replaces the current snapshot with a new snapshot.
Consistency: Iceberg supports schema evolution allowing schema changes to not disrupt existing data. Schema evolution enables a reader to always see the same view of the table. Writers are able to evolve the schema without needing the table to be re-written.
Iceberg, along with data warehouses like Snowflake do not enforce PRIMARY KEY constraints.
Isolation: Iceberg uses optimistic concurrency to support concurrent writes. This means writers assume no other writer is operating. By atomically swapping snapshots, Iceberg guarantees serializable isolation. This ensures all writes are sequential.
Durability: The data resides in durable storage systems like cloud object stores. After the Iceberg snapshot updates, the data and state of the table will persist.
With ACID transactions in Iceberg we can have concurrent reads and writes on our data lakes.