Eliminating unscalable communication in transaction processing

Johnson, Ryan; Pandis, Ippokratis; Ailamaki, Anastasia

doi:10.1007/s00778-013-0312-3

Eliminating unscalable communication in transaction processing

Regular Paper
Published: 21 April 2013

Volume 23, pages 1–23, (2014)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Ryan Johnson¹,
Ippokratis Pandis² &
Anastasia Ailamaki³

2046 Accesses
13 Citations
Explore all metrics

Abstract

Multicore hardware demands software parallelism. Transaction processing workloads typically exhibit high concurrency, and, thus, provide ample opportunities for parallel execution. Unfortunately, because of the characteristics of the application, transaction processing systems must moderate and coordinate communication between independent agents; since it is notoriously difficult to implement high performing transaction processing systems that incur no communication whatsoever. As a result, transaction processing systems cannot always convert abundant, even embarrassing, request-level parallelism into execution parallelism due to communication bottlenecks. Transaction processing system designers must therefore find ways to achieve scalability while still allowing communication to occur. To this end, we identify three forms of communication in the system—unbounded, fixed, and cooperative—and argue that only the first type poses a fundamental threat to scalability. The other two types tend not impose obstacles to scalability, though they may reduce single-thread performance. We argue that proper analysis of communication patterns in any software system is a powerful tool for improving the system’s scalability. Then, we present and evaluate under a common framework techniques that attack significant sources of unbounded communication during transaction processing and sketch a solution for those that remain. The solutions we present affect fundamental services of any transaction processing engine, such as locking, logging, physical page accesses, and buffer pool frame accesses. They either reduce such communication through caching, downgrade it to a less-threatening type, or eliminate it completely through system design. We find that the later technique, revisiting the transaction processing architecture, is the most effective. The final design cuts unbounded communication by roughly an order of magnitude compared with the baseline, while exhibiting better scalability on multicore machines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A brief introduction to distributed systems

Article Open access 16 August 2016

Maarten van Steen & Andrew S. Tanenbaum

A survey on transactional stream processing

Article Open access 27 September 2023

Shuhao Zhang, Juan Soto & Volker Markl

A program logic for obstruction-freedom

Article 28 December 2023

Zhao-Hui Li & Xin-Yu Feng

Notes

The situation has improved markedly since 2009. Developers of the various engines have focused on improving their scalability, sometimes reporting to the authors that they used techniques summarized in this paper.
Distributed transaction processing systems use distributed logging to avoid unnecessary partition crossings and for fault tolerance purposes, e.g., [19, 49], but scale poorly when transactions span multiple nodes. Many distributed systems, including Rdb/VMS [50], actually maintain a single shared log, usually at a dedicated node.
In a well-partitioned system, exclusive locks seldom move between nodes, and caching them is extremely effective.
Because delete operations do not update pages on disk, a page which was deleted and then re-allocated just before a crash may still announce its old ownership during recovery.
Libraries such as libnuma support socket-aware allocation and migration of memory regions.
The machines used in Sect. 8 do not suffer significant inter-socket communication latencies.
http://tatpbenchmark.sourceforge.net/.
http://www.tpc.org/tpcb.
http://www.tpc.org/tpcc.
Two of the other transactions in TPC-C are read-only, while the last one, Delivery, causes logical contention and low concurrency.
The current public release of Shore-MT contains all these features.
See, e.g., http://blogs.msdn.com/b/psssql/archive/2009/01/28/hot-it-works-sql-server-superlatch-ing-sub-latches.aspx.
See, e.g., http://www.oracle.com/technetwork/database/clustering/overview and http://www.postgresql.org/docs/9.0/static/wal-async-commit.html.

References

Achyutuni, K.J., Omiecinski, E., Navathe, S.B.: Two techniques for on-line index modification in shared nothing parallel databases. In: SIGMOD, pp. 125–136 (1996)
Ailamaki, A., DeWitt, D.J., Hill, M.D.: Walking four machines by the shore. In: CAECW (2001)
Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: VLDB, pp. 266–277 (1999)
Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10), 1064–1075 (2012)
Google Scholar
Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: AFIPS, pp. 483–485 (1967)
Aspnes, J., Herlihy, M., Shavit, N.: Counting networks. J. ACM 41(5), 1020–1048 (1994)
Google Scholar
Attiya, H., Bar-Noy, A., Dolev, D.: Sharing memory robustly in message-passing systems. J. ACM 42(1), 124–142 (1995)
Google Scholar
Barroso, L.A., Gharachorloo, K., Bugnion, E.: Memory system characterization of commercial workloads. In: ISCA, pp. 3–14 (1998)
Baumann, A., et al.: The multikernel: a new OS architecture for scalable multicore systems. In: SOSP, pp. 29–44 (2009)
Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indices. In: SIGFIDET, pp. 107–141 (1970)
Bernstein, P.A., Goodman, N.: Multiversion concurrency control—theory and algorithms. ACM TODS 8(4), 465–483 (1983)
Google Scholar
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: PACT, pp. 72–81 (2008)
Carey, M.J., et al.: Shoring up persistent applications. In: SIGMOD, pp. 383–394 (1994)
Chang, F., et al.: Bigtable: A distributed storage system for structured data. In: OSDI, p. 15 (2006)
Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM TODS 32(3), 116–127 (2007)
Google Scholar
Cieslewicz, J., Ross, K.A.: Adaptive aggregation on chip multiprocessors. In: VLDB, pp. 339–350 (2007)
Clark, K.L., McCabe, F.G.: Go! a multi-paradigm programming language for implementing multi-threaded agents. Ann. Math. Artif. Intell. 41, 171–206 (2004)
Google Scholar
Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 48–57 (2010)
Google Scholar
Daniels, D.S., Spector, A.Z., Thompson, D.S.: Distributed logging for transaction processing. SIGMOD Rec. 16, 82–96 (1987)
Google Scholar
Davis, J.D., Laudon, J., Olukotun, K.: Maximizing CMP throughput with mediocre cores. In: PACT, pp. 51–62(2005)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, p. 10 (2004)
DeCandia, G., et al.: Dynamo: Amazon’s highly available key-value store. SIGOPS OSR 41(6), 205–220 (2007)
Google Scholar
Dewitt, D.J., et al.: The Gamma database machine project. IEEE TKDE 2(1), 44–62 (1990)
Google Scholar
Dragojevic, A., Guerraoui, R., Kapalka, M.: Dividing transactional memories by zero. In: TRANSACT (2008)
Graefe, G.: Hierarchical locking in B-tree indexes. In: BTW, pp. 18–42 (2007)
Gray, J., Helland, P., O’Neil, P., Shasha, D.: The dangers of replication and a solution. In: SIGMOD, pp. 173–182 (1996)
Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (1992)
Google Scholar
Hardavellas, N., et al.: Database servers on chip multiprocessors: limitations and opportunities. In: CIDR, pp. 79–87 (2007)
Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: SIGMOD, pp. 981–992 (2008)
Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: QPipe: a simultaneously pipelined relational query engine. In: SIGMOD, pp. 383–394 (2005)
Helland, P.: Life beyond distributed transactions: an apostate’s opinion. In: CIDR, pp. 132–141 (2007)
Helland, P., et al.: Group commit timers and high volume transaction systems. In: HPTS, pp. 301–329 (1989)
Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13(1), 124–149 (1991)
Google Scholar
Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. SIGARCH Comput. Archit. News 21(2), 289–300 (1993)
Google Scholar
Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41, 33–38 (2008)
Google Scholar
Hunt, G.C., Larus, J.R.: Singularity: rethinking the software stack. SIGOPS OSR 41(2), 37–49 (2007)
Google Scholar
Jaluta, I., Sippu, S., Soisalon-Soininen, E.: B-tree concurrency control and recovery in page-server database systems. ACM TODS 31, 82–132 (2006)
Google Scholar
Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: EDBT, pp. 24–35 (2009)
Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. PVLDB 3, 681–692 (2010)
Google Scholar
Jones, E., Abadi, D.J., Madden, S.: Low overhead concurrency control for partitioned main memory databases. In: SIGMOD, pp. 603–614 (2010)
Jorwekar, S., Fekete, A., Ramamritham, K., Sudarshan, S.: Automating the detection of snapshot isolation anomalies. In: VLDB, pp. 1263–1274 (2007)
Kemper, A., Neumann, T.: HyPer—a hybrid OLTP &OLAP main memory database system based on virtual memory snapshots. In: ICDE, pp. 195–206 (2011)
Kimura, H., Graefe, G., Kuno, H.: Efficient locking techniques for databases on modern hardware. In: ADMS (2012)
Kung, H.T., Robinson, J.T.: On optimistic methods for concurrency control. ACM TODS 6(2), 213–226 (1981)
Google Scholar
Larson, P.A., et al.: High-performance concurrency control mechanisms for main-memory databases. PVLDB 5(4), 298–309 (2011)
Google Scholar
Lauer, H.C., Needham, R.M.: On the duality of operating system structures. SIGOPS OSR 13, 3–19 (1979)
Google Scholar
Lee, J., Kim, K., Cha, S.K.: Differential logging: a commutative and associative logging scheme for highly parallel main memory database. In: ICDE, pp. 173–184 (2001)
Lee, M.L., Kitsuregawa, M., Ooi, B.C., Tan, K.L., Mondal, A.: Towards self-tuning data placement in parallel database systems. In: SIGMOD, pp. 225–236 (2000)
Lomet, D.: Recovery for shared disk systems using multiple redo logs. Technical report CRL-90-4 (1990)
Lomet, D., Anderson, R., Rengarajan, T.K., Spiro, P.: How the Rdb/VMS data sharing system became fast. Technical report CRL-92-4 (1992)
Magnusson, P.S., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: ISPP, pp. 165–171 (1994)
Mellor-Crummey, J.M., Scott, M.L.: Scalable reader-writer synchronization for shared-memory multiprocessors. SIGPLAN Not. 26(7), 106–113 (1991)
Google Scholar
Mohan, C.: ARIES/KVL: a key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In: VLDB, pp. 392–405 (1990)
Mohan, C., Levine, F.: ARIES/IM: an efficient and high concurrency index management method using write-ahead logging. In: SIGMOD, pp. 371–380 (1992)
Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free FIFO queues. In: SPAA, pp. 253–262 (2005)
Pandis, I., Johnson, R., Hardavellas, N., Ailamaki, A.: Data-oriented transaction execution. PVLDB 3(1), 928–939 (2010)
Google Scholar
Pandis, I., Tözün, P., Johnson, R., Ailamaki, A.: PLP: page latch-free shared-everything OLTP. PVLDB 4(10), 610–621 (2011)
Google Scholar
Porobic, D., Pandis, I., Branco, M., Tözün, P., Ailamaki, A.: OLTP on hardware islands. PVLDB 5(11), 1447–1458 (2012)
Google Scholar
Ranganathan, P., Gharachorloo, K., Adve, S.V., Barroso, L.A.: Performance of database workloads on shared-memory systems with out-of-order processors. In: ASPLOS, pp. 307–318 (1998)
Rao, J., Ross, K.A.: Cache conscious indexing for decision-support in main memory. In: VLDB, pp. 78–89 (1999)
Rao, J., Ross, K.A.: Making B+-trees cache conscious in main memory. In: SIGMOD, pp. 475–486 (2000)
Shavit, N., Touitou, D.: Software transactional memory. In: PODC, pp. 204–213 (1995)
Smith, A.J.: Sequentiality and prefetching in database systems. ACM TODS 3, 223–247 (1978)
Google Scholar
Soisalon-Soininen, E., Ylönen, T.: Partial strictness in two-phase locking. In: ICDT, pp. 139–147 (1995)
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM, pp. 149–160 (2001)
Stonebraker, M.: The case for shared nothing. IEEE Database, Eng. Bull. 9, 4–9 (1986)
Google Scholar
Stonebraker, M.: Stonebraker on nosql and enterprises. Commun. ACM 54, 10–11 (2011)
Google Scholar
Stonebraker, M., et al.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)
Thomson, A., Abadi, D.J.: The case for determinism in database systems. PVLDB 3, 70–80 (2010)
Google Scholar
Tözün, P., Pandis, I., Johnson, R., Ailamaki, A.: Scalable and dynamically balanced shared-everything OLTP with physiological partitioning. VLDB J 1–25 (2012)
Vogels, W.: Eventually consistent. Commun. ACM 52, 40–44 (2009)
Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable internet services. In: SOSP, pp. 230–243 (2001)
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: ISCA, pp. 24–36 (1995)

Download references

Acknowledgments

The authors are deeply grateful for all the members of Carnegie Mellon’s StagedDB/CMP team and EPFL’s DIAS laboratory who made this work possible through their research efforts, helpful feedback, and encouragement. We are especially indebted to Pinar Tözün who helped us with an additional set of experiments. We would also like to thank the many anonymous reviewers whose thoughtful and constructive remarks helped improve both this paper and the research papers summarized here. This research was supported by an IBM PhD fellowship; grants and equipment from Intel and Sun; a Sloan research fellowships; an IBM faculty partnership award; NSF grants CCR-0205544, CCR-0509356, IIS-0133686, and IIS-0713409; an ESF EurYI award; and Swiss National Foundation funds.

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, Toronto, ON, Canada
Ryan Johnson
IBM Almaden Research Center, San Jose, CA, USA
Ippokratis Pandis
School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, VD, Switzerland
Anastasia Ailamaki

Authors

Ryan Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Ippokratis Pandis
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Ailamaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryan Johnson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Johnson, R., Pandis, I. & Ailamaki, A. Eliminating unscalable communication in transaction processing. The VLDB Journal 23, 1–23 (2014). https://doi.org/10.1007/s00778-013-0312-3

Download citation

Received: 13 February 2012
Revised: 12 March 2013
Accepted: 15 March 2013
Published: 21 April 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s00778-013-0312-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Eliminating unscalable communication in transaction processing

Abstract

Access this article

Similar content being viewed by others

A brief introduction to distributed systems

A survey on transactional stream processing

A program logic for obstruction-freedom

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Eliminating unscalable communication in transaction processing

Abstract

Access this article

Similar content being viewed by others

A brief introduction to distributed systems

A survey on transactional stream processing

A program logic for obstruction-freedom

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation