Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores

This paper from 2014 argued that DBMSs were not ready to take advantage of the on-chip parallelism offered by many-core machines. As the number of cores increases, the problem of concurrency control schemes becomes more challenging to deal with. The authors implemented the following 7 concurrency control schemes from two families:

Two-Phase Locking
- 2PL with Deadlock Detection (DL_DETECT)
- 2PL with Non-waiting Deadlock Prevention (NO_WAIT)
- 2PL with Waiting Deadlock Prevention (WAIT_DIE)
Timestamp Ordering
- Basic T/O (TIMESTAMP)
- Multi-version Concurrency Control (MVCC)
- Optimistic Concurrency Control (OCC)
- T/O with Partition-level Locking (H-STORE)

and evaluated their performance and scalability under various OLTP workloads, using YCSB and TPC-C benchmarks. For details of the results please see the figures and Experimental Analysis section of the original paper. In summary, all of the seven current concurrency control schemes suffered from some kind of bottleneck when scaled to many cores, especially under contention. The different bottlenecks were summarized nicely in Table 2 of the paper.

Screen Shot 2020-04-13 at 11.32.26 AM.png

The authors thought that some of these bottlenecks could be addressed by new hardware support.

All T/O schemes suffer from the timestamp allocation bottleneck, which can be addressed by using synchronized clocks across CPU cores (currently only supported by Intel CPUs) or built-in hardware counter (no CPU currently supports this).
Memory-caused bottlenecks can be alleviated by hardware accelerators on the CPU that copies memory in the background (which eliminates the need to load all data through the CPU’s pipeline), and by using optimized memory allocation schemes.

Although distributed DBMSs can achieve better performance than a single-node DBMS, they suffer from a different bottleneck: the need of an Atomic commit protocol to support distributed transactions. The authors believe that a single many-core node with a large amount of DRAM might outperform a distributed DBMS for all but the largest OLTP applications.

Share this:

Leave a comment Cancel reply