A Log-Structured Merge-Tree (LSM Tree) is a high-performance storage engine data structure that optimizes for high-throughput write operations by batching and sequentially writing data to disk. Its core mechanism involves two primary components: a memory-resident memtable and a hierarchy of sorted, immutable files on disk called SSTables (Sorted String Tables).
- Write Path: All incoming writes (inserts, updates, deletes) are first appended to a write-ahead log (WAL) for durability, then applied to the in-memory memtable, which is typically a balanced tree like a skip list.
- Flush: When the memtable reaches a size threshold, it is frozen, and its sorted contents are written to disk as a new SSTable at Level 0 (L0).
- Compaction: A background process called compaction merges these SSTables from one level to the next, combining data, discarding overwritten values, and deleting tombstoned entries. This creates larger, sorted SSTables at higher levels (L1, L2, etc.).
This design transforms random writes into sequential disk I/O, making it exceptionally fast for write-heavy workloads common in logging, time-series data, and agentic memory systems where state updates are frequent.