Serializable Snapshot Isolation in PostgreSQL

This paper documents how the authors implemented a serializable snapshot isolation in PostgreSQL, and out performed strict-2PL on some workloads, while incurring low cost (less than 7%) relative to snapshot isolation.

PostgreSQL, like many other database systems, use the weakest ANSI SQL isolation level: READ COMMITTED, by default in order to get good performance. The highest isolation level that PostgreSQL provides, and what the user gets when she asks for SERIALIZABLE, is actually SNAPSHOT ISOLATION. Although SNAPSHOT ISOLATION avoids all three anomalies (dirty reads, non-repeatable reads, and phantom reads) defined in the ANSI standard it stills allows some anomalies that are not commonly known, two of which are simple write skew, and batch processing anomalies. A descriptions of these two anomalies is in section 2.1 of the paper.

Although there are techniques that can be applied to avoid these anomalies under snapshot isolation, the analysis required to identify them is often costly, especially when they inherently concerns the interactions between multiple transactions. In comparison, SERIALIZABLE avoids all problems because transactions can be treated as if they are running serially in complete isolation.

Instead of implementing serializability using Strict 2 Phase Locking (S2PL), the authors implemented Serializable Snapshot Isolation (SSI). I will very briefly explain SSI here. Curious readers can go read the original paper. In short, SSI runs transactions using snapshot isolation, but performs additional checks to determine whether anomalies can possibly happen. SSI is similar to other concurrency control protocols that uses serialization graph testing. But instead of detecting cycles, SSI looks for “dangerous structures” (at least two adjacent rw-antidependency edges) in a serialization graph. Using this method, SSI may give some false positives, but it is more efficient, because it is less restrictive: it allow some rw-conflicts as long as they do not for a “dangerous structure”. Therefore SSI offers better performance compared to S2PL.

To identify rw-antidependencies, SSI have transactions acquire special “SIREAD” locks on the data they read. These “SIREAD” locks do not block conflicting writes, but a conflict between write lock and a “SIREAD” lock will be flagged as a rw-antidependency. Because conflicts can occur even after a reader commits, “SIREAD” locks need to persist until all concurrent transactions commit.

The authors also documented two optimizations, safe snapshot and deferrable transactions, that they implemented for read-only transactions in section 4 of the paper. Details of SSI implementation can be found in section 5.

SSI does have potential unbounded memory usage problem, because a transaction can hold a large number of locks, and cannot release them until all concurrent transactions commit. Other transaction states such as the rw-antidependencies used to check for dangerous structures may also be held for a long time. A single long-running transaction can prevent thousands of other transactions from being cleaned up. The solution was to bound the memory usage of SSI and gracefully degrade (with potentially higher false positive abort rate). Additionally, four techniques that can limit the memory usage were given in section 6.

The authors used three benchmarks to evaluate the performance of SSI: a modified TPC-C, the SIBENCH microbenchmark and the RUBiS web application benchmark. On the SIBENCH, SSI has an overhead of 10-20% for tracking read dependencies without the read-only optimizations. On the TPC-C benchmark with “credit check” transactions from TPC-C++ variant, SSI causes 5% slowdown relative to snapshot isolation with the read-only optimizations turned on. On the RUBiS benchmark, SSI archieved throughput of 422 req/s, compared to 435 req/s of snapshot isolation and 208 req/s of S2PL. Details of experimental setup and benchmarks are in section 8 of the paper.

In summary, SSI is a novel method of achieving serializability on top of snapshot isolation, without using S2PL. It achieved better performance than S2PL while paying a small cost relative to snapshot isolation.