High-Performance Concurrency Control Mechanism for Main-Memory Databases

This paper describes the concurrency control mechanisms of the Hekaton storage engine of Microsoft SQL Server, one is optimistic and one is pessimistic. The authors also implemented a simple single-version locking concurrency control method optimized for in-memory databases, and report performance comparisons of the three.

The authors found that single-version locking is very fragile, in that it works well for short transactions and low contention, but its performance degrades rapidly under high contention, or if there is even one long running transaction. On the contrary, multi-version concurrency control method perform well under high contention and when there are long running transactions. The optimistic method consistently performs better than the pessimistic method.

The authors first experimented with homogeneous workload of short update transactions, which represents an extreme update-heavy scenario, and varied the multi-programming level of the system.

The pessimistic method, under low contention and read committed isolation level, has 30% lower performance than the optimistic method (because of extra writes for tracking dependencies and locks, which cause increased memory traffic. It takes pessimistic method 20% more cycles to execute the same number of instructions, and the additional control logic translates into 10% more instructions per transaction. Under high contention, the optimistic method performed the best throughput, with pessimistic method close behind. Single-verison locking achieved good throughput numbers initially but did not scale beyond 8 threads.

If we fix the contention level (low, 24 threads) and transaction workload type but vary the isolation level, optimistic MVCC’s performance degrades the fastest, followed by pessimistic MVCC and the single-version locking’s performance reduces only by 1.8%.

Screen Shot 2020-04-21 at 11.13.18 AM.png

Then the authors experimented with heterogeneous workload, fixing the multi-programming level and varying the percentage of short and long read-only transactions in the mix.

Starting with short read transactions, under low contention, single-version locking out-performed both MV until 90% read-only workload, at which point it was surpassed by both MV schemes. Under high contention MVCC schemes start to show more advantage as percentage of read-only transactions increases. When 80% of the transactions are read-only, two MVCC schemes achieve 63% and 73% high throughput than 1V.

If we mix in long-running read transactions instead, the advantage of MVCC schemes shows even more. 1V achieves twice the update throughput of the MV schemes when there is 0 long read transactions. However mixing in even one long read transaction, 1V’s update throughput drops by 75%, in contrast to only 5% drop of MV schemes. When 50% of the active transactions are long readers, MV has 80x high update throughput than 1V. In terms of throughput, MV schemes consistently outperform 1V as the gap widens as the number of long readers increases.

TATP benchmark simulates a telecommunications application. 80% of the transactions are queries, 16% of them are updates, 2% are inserts, and 2% are deletes. 1V achieved 4,220,119 txn/s, pessmisitic MV achieved 3,129,816 txn/s and optimistic MV achieved 3,121,494 txn/s.

I am only summarizing the performance results here, for detailed description of the two multi-version concurrency control implementations, I suggest that readers go to the original paper.

Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores

This paper from 2014 argued that DBMSs were not ready to take advantage of the on-chip parallelism offered by many-core machines. As the number of cores increases, the problem of concurrency control schemes becomes more challenging to deal with. The authors implemented the following 7 concurrency control schemes from two families:

  • Two-Phase Locking
    • 2PL with Deadlock Detection (DL_DETECT)
    • 2PL with Non-waiting Deadlock Prevention (NO_WAIT)
    • 2PL with Waiting Deadlock Prevention (WAIT_DIE)
  • Timestamp Ordering
    • Basic T/O (TIMESTAMP)
    • Multi-version Concurrency Control (MVCC)
    • Optimistic Concurrency Control (OCC)
    • T/O with Partition-level Locking (H-STORE)

and evaluated their performance and scalability under various OLTP workloads, using YCSB and TPC-C benchmarks. For details of the results please see the figures and Experimental Analysis section of the original paper. In summary, all of the seven current concurrency control schemes suffered from some kind of bottleneck when scaled to many cores, especially under contention. The different bottlenecks were summarized nicely in Table 2 of the paper.

Screen Shot 2020-04-13 at 11.32.26 AM.png

The authors thought that some of these bottlenecks could be addressed by new hardware support.

  • All T/O schemes suffer from the timestamp allocation bottleneck, which can be addressed by using synchronized clocks across CPU cores (currently only supported by Intel CPUs) or built-in hardware counter (no CPU currently supports this).
  • Memory-caused bottlenecks can be alleviated by hardware accelerators on the CPU that copies memory in the background (which eliminates the need to load all data through the CPU’s pipeline), and by using optimized memory allocation schemes.

Although distributed DBMSs can achieve better performance than a single-node DBMS, they suffer from a different bottleneck: the need of an Atomic commit protocol to support distributed transactions. The authors believe that a single many-core node with a large amount of DRAM might outperform a distributed DBMS for all but the largest OLTP applications.