Seek and Range Query Performance · TidesDB v6.1.0 vs RocksDB v10.7.5
by Alex Gaetano Padula
After optimizing TidesDB’s seek and range query performance, I want to share the architectural decisions that led to some remarkable results. The benchmarks show TidesDB achieving 4.17x faster random seeks and 4.74x faster sequential seeks compared to RocksDB. This isn’t luck or cherry-picked workloads - it’s the result of specific design choices around block caching, index structures, and memory management.
You can download the raw benchmark report here
You can find the benchtool source code here and run your own benchmarks!
Test Environment
Hardware
- Intel Core i7-11700K (8 cores, 16 threads) @ 4.9GHz
- 48GB DDR4
- Western Digital 500GB WD Blue 3D NAND Internal PC SSD (SATA)
- Ubuntu 23.04 x86_64 6.2.0-39-generic
Software Versions
- TidesDB v6.1.0
- RocksDB v10.7.5
- GCC with -O3 optimization
Test Configuration
- Sync Mode · DISABLED (maximum performance)
- Threads · 8 concurrent threads
- Key Size · 16 bytes (unless specified)
- Value Size · 100 bytes (unless specified)
The Seek Performance Problem
LSM-trees are traditionally optimized for writes, not point lookups. When you seek to a specific key, you potentially need to:
- Check bloom filters across multiple SSTables
- Load block indices from disk
- Binary search the index to find the right block
- Read and decompress the block
- Binary search within the block for the key
Each disk I/O adds milliseconds. With dozens of SSTables across multiple levels, seeks can become prohibitively expensive. RocksDB addresses this with block caching and table caching, but there’s room for improvement.
The TidesDB Approach · Cache Everything That Matters
I made a fundamental architectural decision: keep SSTable metadata in memory aggressively. Not just temporarily, but until it’s provably safe to free. This includes:
- Block indices · sparse sampled indices with prefix compression
- Bloom filters · 1% false positive rate, ~10 bits per key
- Min/max keys · enables range filtering without disk access
- File handles · kept open until memory pressure requires closing
The trade-off is explicit - use more memory to eliminate disk I/O on the critical path.
Random Seek Performance · 4.17x Advantage
The Numbers
TidesDB (5M operations, 8 threads)
- Throughput · 3.73M ops/sec
- Median latency · 2μs
- P99 latency · 5μs
- CPU utilization · 469%
RocksDB (5M operations, 8 threads)
- Throughput · 893K ops/sec
- Median latency · 8μs
- P99 latency · 17μs
- CPU utilization · 635%
Why the 4.17x Difference?
1. Cache-First Seek Path
When TidesDB seeks to a key, the first operation is a cache lookup:
tidesdb_ref_counted_block_t *rc_block = NULL;tidesdb_klog_block_t *kb = tidesdb_cache_block_get( sst->db, cf_name, sst->klog_path, cursor->current_pos, &rc_block);If the block is cached (which it often is for hot data), we get:
- Zero disk I/O
- Zero decompression overhead
- Direct memory access to the deserialized block
- Atomic reference counting prevents use-after-free
The cache hit path is pure memory operations. No syscalls, no decompression, no deserialization.
2. Block Index in Memory
Even on cache misses, TidesDB’s block indices are already in memory. The seek path
- Check bloom filter (memory) -> 99% of non-existent keys eliminated
- Binary search block index (memory) → exact block position
- Seek to block position (single disk I/O)
- Read and cache block
RocksDB’s approach requires loading the index from disk on cold starts, adding I/O overhead.
3. Reference-Counted Block Ownership
TidesDB uses atomic reference counting for cached blocks
typedef struct { tidesdb_klog_block_t *block; atomic_int ref_count; size_t block_memory;} tidesdb_ref_counted_block_t;When a seek operation uses a cached block
- Increment refcount atomically
- Store reference in merge source
- Block stays alive until all references released
- No locks, no contention, no race conditions
This enables safe concurrent access without serialization overhead.
4. Partitioned CLOCK Cache
TidesDB’s block cache uses
- 2 partitions per CPU core (reduces contention)
- CLOCK eviction (second-chance algorithm)
- Lock-free atomic operations
- Zero-copy API (no memcpy on hits)
With 8 cores, that’s 16 independent cache partitions. Threads rarely contend for the same partition lock.
Sequential Seek Performance · 4.74x Advantage
The Numbers
TidesDB
- Throughput · 8.87M ops/sec
- Median latency · 1μs
- P99 latency · 2μs
RocksDB
- Throughput · 1.87M ops/sec
- Median latency · 3μs
- P99 latency · 7μs
Why Sequential Seeks Are Even Faster
Sequential access patterns hit the same blocks repeatedly. Once a block is cached:
- First seek in block · cache miss, load block (2-3μs)
- Next 1000+ seeks in same block · cache hits (<1μs each)
The 8.87M ops/sec throughput means sub-microsecond latency for cached seeks. This is approaching memory bandwidth limits, not storage limits.
With TidesDB’s block-level caching, sequential seeks become pure memory operations after the first block load.
Zipfian Seek Performance · 5.25x Advantage
The Numbers
TidesDB
- Throughput · 3.56M ops/sec
- Median latency · 1μs
- Database size · 10.13 MB (650K unique keys)
RocksDB
- Throughput · 679K ops/sec
- Median latency · 11μs
- Database size · 37.20 MB (656K unique keys)
Why Zipfian Workloads Favor TidesDB
Zipfian distributions hit a small set of “hot” keys repeatedly (80/20 rule). TidesDB’s aggressive caching means:
- Hot keys’ blocks stay in cache
- Bloom filters eliminate cold key checks
- DCA compacts obsolete versions aggressively
- Database size shrinks (10MB vs 37MB)
The 5.25x advantage comes from cache affinity. Hot data stays hot, cold data gets compacted away.
Range Query Performance · Competitive But Not Dominant
Small Ranges (100 keys, Random Access)
TidesDB
- Throughput · 776K ops/sec
- Median latency · 0μs (sub-microsecond)
- P99 latency · 38μs
- CPU utilization · 463%
RocksDB
- Throughput · 322K ops/sec
- Median latency · 22μs
- P99 latency · 49μs
- CPU utilization · 713%
Result · TidesDB is 2.41x faster on small random ranges.
Large Ranges (1000 keys, Random Access)
TidesDB
- Throughput · 48.6K ops/sec
- Median latency · 137μs
- P99 latency · 279μs
- CPU utilization · 725%
RocksDB
- Throughput · 43.6K ops/sec
- Median latency · 154μs
- P99 latency · 439μs
- CPU utilization · 771%
Result · TidesDB is 1.11x faster on large random ranges (roughly competitive).
Sequential Ranges (100 keys)
TidesDB
- Throughput · 313K ops/sec
- Median latency · 15μs
- P99 latency · 67μs
RocksDB
- Throughput · 275K ops/sec
- Median latency · 20μs
- P99 latency · 55μs
Result · TidesDB is 1.14x faster on sequential ranges.
Why Range Queries Are Different
Unlike point seeks, range queries require:
- Seeking to the start key (benefits from TidesDB’s fast seeks)
- Iterating through N consecutive keys (both engines are similar)
- Reading and deserializing values (I/O bound for large ranges)
The 2.41x advantage on small ranges comes from TidesDB’s fast seek positioning. Once positioned, iteration speed is comparable between engines.
For large ranges (1000 keys), both engines spend most time iterating, not seeking. The 1.11x difference is within measurement variance—essentially a tie.
The Honest Assessment
Range queries are not TidesDB’s killer feature. The advantages are
- Small ranges · 2.41x faster (seek overhead dominates)
- Large ranges · Roughly competitive (iteration dominates)
- Sequential ranges · Slightly faster (1.14x)
If your workload is range-scan heavy with large ranges, TidesDB won’t provide dramatic speedups. The real wins are in point seeks and small range queries where seek overhead matters.
The Memory Trade-Off Quantified
Random Seek Test (5M keys)
TidesDB memory usage
- Peak RSS · 851 MB
- Block cache · ~64 MB (default)
- SSTable metadata · ~787 MB (indices, bloom filters, structures)
RocksDB memory usage
- Peak RSS · 246 MB
- Block cache · ~200 MB (estimated)
- Metadata loaded on-demand
The trade-off · TidesDB uses 3.46x more memory for 4.17x faster seeks.
Is This Worth It?
For modern servers (128GB+ RAM):
- 851 MB is negligible
- Sub-microsecond seek latency is transformative
- Memory is cheap, latency is expensive
For memory-constrained environments:
- 3.46x memory overhead matters
- Containers with 2GB limits
- Hundreds of database instances per server
- Edge devices with limited RAM
The decision depends on your deployment constraints.
The Block Cache Architecture
Why CLOCK Over LRU?
TidesDB uses CLOCK eviction instead of LRU because:
- Better concurrency · no global LRU list to lock
- Simpler atomics · just a reference bit per entry
- Good-enough approximation · CLOCK is “LRU-ish” without the overhead
- Lock-free reads · cache hits don’t acquire locks
Partitioning Strategy
With 2 partitions per CPU core:
- 8 cores = 16 partitions
- Hash key -> partition (deterministic)
- Each partition has independent lock
- Contention reduced by 16x
Reference Counting Lifecycle
// Seek operationrc_block = cache_get(...); // Atomic incrementstore_in_merge_source(rc_block); // Transfer ownership// ... use block ...tidesdb_block_release(rc_block); // Atomic decrementOnly when refcount reaches zero is the block freed. This prevents:
- Use-after-free during concurrent seeks
- Race conditions during compaction
- Need for complex locking schemes
The Index Design
Sparse Sampling with Prefix Compression
TidesDB’s block indices use
- Sampling ratio · 1:1 by default (every block sampled)
- Prefix length · 16 bytes for min/max keys
- Position encoding · Delta-encoded with varint compression
For 1M entries across 1000 blocks
- Index size · ~50 KB (with compression)
- Lookup time · O(log n) binary search in memory
- Disk I/O · Zero (index is resident)
Why This Matters
Traditional LSM-trees load indices on-demand:
- Open SSTable file
- Read index block from end of file
- Parse and build in-memory structure
- Now you can seek
TidesDB loads indices once during SSTable open and keeps them resident. Every subsequent seek benefits.
The Bloom Filter Strategy
Configuration
- False positive rate · 1% (configurable)
- Bits per key · ~10 bits
- Memory cost · 1.25 MB per 1M keys
Why Keep Bloom Filters in Memory?
For a database with 10 levels and 100 SSTables
- Bloom filter memory · ~125 MB total
- Disk I/O eliminated · 99% of non-existent key checks
- Latency saved · 1-2ms per eliminated check
The memory cost is paid once. The I/O savings happen on every seek.
CPU Utilization Analysis
Random Seek Test
TidesDB · 469% CPU utilization RocksDB · 635% CPU utilization
TidesDB achieves 4.17x higher throughput with 26% less CPU. This suggests:
- Lock-free architecture reduces contention
- Cache hits eliminate decompression overhead
- Memory-resident indices reduce syscall overhead
- Atomic operations scale better than mutexes
RocksDB’s higher CPU usage likely comes from
- More cache misses -> more decompression
- More index loads -> more syscalls
- More lock contention -> more context switches
What I Learned Optimizing Seeks
1. Cache Locality Trumps Everything
The difference between a cache hit (1μs) and cache miss (100μs+) is two orders of magnitude. Optimizing for cache hits matters more than optimizing cache misses.
2. Memory Is Cheaper Than You Think
Using 3.46x more memory for 4.17x faster seeks is an excellent trade on modern hardware. Don’t optimize for 2011 constraints.
3. Reference Counting Provides Safety Without Locks
Atomic refcounts eliminate entire classes of race conditions. The overhead is negligible compared to the safety gained.
4. Block-Level Caching Beats Key-Level Caching
Caching entire blocks (1000+ keys) means one cache entry serves hundreds of seeks. The memory efficiency is better than key-level caches.
5. Partitioned Caches Scale Linearly
With 16 partitions, 8 threads rarely contend. Lock-free reads + partitioned writes = near-linear scaling.
6. Range Queries Need Different Optimizations
Point seeks benefit from aggressive caching. Range queries benefit from sequential I/O and prefetching. The same architecture doesn’t optimize both equally.
7. Sub-Microsecond Latency Is Achievable
The 1μs median latency on sequential seeks shows that with proper caching, storage engine latency can approach memory latency. This opens up new use cases.
When TidesDB’s Seek Performance Matters
Use TidesDB when
- Point lookups are frequent (OLTP workloads)
- Small range queries (<100 keys) are common
- Latency matters more than memory (user-facing services)
- Working set fits in memory (hot data caching)
- Concurrent access is high (many threads seeking)
- Memory is available (servers with 64GB+ RAM)
Use RocksDB when
- Large range scans (1000+ keys) dominate your workload
- Memory is severely constrained (containers with 2GB limits)
- Working set exceeds available memory (cold data dominant)
- Predictable memory usage is required (strict limits)
- Operational maturity is critical (10+ years of production use)
The Path Forward
What Needs More Work
- Large range query optimization · prefetching and sequential I/O hints
- Cold start performance · how fast do caches warm up?
- Memory pressure behavior · what happens when cache is full?
- Very large datasets · does performance hold at 100M+ keys?
- Mixed workloads · seeks + writes + range queries simultaneously
Conclusion
The benchmark results show clear performance characteristics:
Point Seeks (TidesDB’s Strength)
- Random seeks · 4.11x faster (3.34M vs 813K ops/sec)
- Sequential seeks · 4.58x faster (8.23M vs 1.80M ops/sec)
- Zipfian seeks · 6.31x faster (3.47M vs 550K ops/sec)
Range Queries (Competitive)
- Small ranges (100 keys) · 2.41x faster
- Large ranges (1000 keys) · 1.11x faster (essentially tied)
- Sequential ranges · 1.14x faster
The architectural decisions that enable this
- Aggressive metadata caching · indices and bloom filters stay resident
- Reference-counted blocks · safe concurrent access without locks
- Partitioned CLOCK cache · scales with CPU cores
- Cache-first seek path · eliminates disk I/O on hot data
- Block-level granularity · one cache entry serves many keys
The memory trade-off (3.46x more RAM) is acceptable for modern servers. The latency improvement (4-6x faster seeks) is transformative for latency-sensitive workloads.
TidesDB makes different trade-offs than RocksDB. If your workload is seek-heavy with memory available, TidesDB delivers substantial advantages. If your workload is range-scan-heavy with large ranges, the benefits are minimal.
Thanks for reading!