Skip to main content
Log in

Eliminating unscalable communication in transaction processing

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Multicore hardware demands software parallelism. Transaction processing workloads typically exhibit high concurrency, and, thus, provide ample opportunities for parallel execution. Unfortunately, because of the characteristics of the application, transaction processing systems must moderate and coordinate communication between independent agents; since it is notoriously difficult to implement high performing transaction processing systems that incur no communication whatsoever. As a result, transaction processing systems cannot always convert abundant, even embarrassing, request-level parallelism into execution parallelism due to communication bottlenecks. Transaction processing system designers must therefore find ways to achieve scalability while still allowing communication to occur. To this end, we identify three forms of communication in the system—unbounded, fixed, and cooperative—and argue that only the first type poses a fundamental threat to scalability. The other two types tend not impose obstacles to scalability, though they may reduce single-thread performance. We argue that proper analysis of communication patterns in any software system is a powerful tool for improving the system’s scalability. Then, we present and evaluate under a common framework techniques that attack significant sources of unbounded communication during transaction processing and sketch a solution for those that remain. The solutions we present affect fundamental services of any transaction processing engine, such as locking, logging, physical page accesses, and buffer pool frame accesses. They either reduce such communication through caching, downgrade it to a less-threatening type, or eliminate it completely through system design. We find that the later technique, revisiting the transaction processing architecture, is the most effective. The final design cuts unbounded communication by roughly an order of magnitude compared with the baseline, while exhibiting better scalability on multicore machines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. The situation has improved markedly since 2009. Developers of the various engines have focused on improving their scalability, sometimes reporting to the authors that they used techniques summarized in this paper.

  2. Distributed transaction processing systems use distributed logging to avoid unnecessary partition crossings and for fault tolerance purposes, e.g., [19, 49], but scale poorly when transactions span multiple nodes. Many distributed systems, including Rdb/VMS [50], actually maintain a single shared log, usually at a dedicated node.

  3. In a well-partitioned system, exclusive locks seldom move between nodes, and caching them is extremely effective.

  4. Because delete operations do not update pages on disk, a page which was deleted and then re-allocated just before a crash may still announce its old ownership during recovery.

  5. Libraries such as libnuma support socket-aware allocation and migration of memory regions.

  6. The machines used in Sect. 8 do not suffer significant inter-socket communication latencies.

  7. http://tatpbenchmark.sourceforge.net/.

  8. http://www.tpc.org/tpcb.

  9. http://www.tpc.org/tpcc.

  10. Two of the other transactions in TPC-C are read-only, while the last one, Delivery, causes logical contention and low concurrency.

  11. The current public release of Shore-MT contains all these features.

  12. See, e.g., http://blogs.msdn.com/b/psssql/archive/2009/01/28/hot-it-works-sql-server-superlatch-ing-sub-latches.aspx.

  13. See, e.g., http://www.oracle.com/technetwork/database/clustering/overview and http://www.postgresql.org/docs/9.0/static/wal-async-commit.html.

References

  1. Achyutuni, K.J., Omiecinski, E., Navathe, S.B.: Two techniques for on-line index modification in shared nothing parallel databases. In: SIGMOD, pp. 125–136 (1996)

  2. Ailamaki, A., DeWitt, D.J., Hill, M.D.: Walking four machines by the shore. In: CAECW (2001)

  3. Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: VLDB, pp. 266–277 (1999)

  4. Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10), 1064–1075 (2012)

    Google Scholar 

  5. Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: AFIPS, pp. 483–485 (1967)

  6. Aspnes, J., Herlihy, M., Shavit, N.: Counting networks. J. ACM 41(5), 1020–1048 (1994)

    Google Scholar 

  7. Attiya, H., Bar-Noy, A., Dolev, D.: Sharing memory robustly in message-passing systems. J. ACM 42(1), 124–142 (1995)

    Google Scholar 

  8. Barroso, L.A., Gharachorloo, K., Bugnion, E.: Memory system characterization of commercial workloads. In: ISCA, pp. 3–14 (1998)

  9. Baumann, A., et al.: The multikernel: a new OS architecture for scalable multicore systems. In: SOSP, pp. 29–44 (2009)

  10. Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indices. In: SIGFIDET, pp. 107–141 (1970)

  11. Bernstein, P.A., Goodman, N.: Multiversion concurrency control—theory and algorithms. ACM TODS 8(4), 465–483 (1983)

    Google Scholar 

  12. Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: PACT, pp. 72–81 (2008)

  13. Carey, M.J., et al.: Shoring up persistent applications. In: SIGMOD, pp. 383–394 (1994)

  14. Chang, F., et al.: Bigtable: A distributed storage system for structured data. In: OSDI, p. 15 (2006)

  15. Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM TODS 32(3), 116–127 (2007)

    Google Scholar 

  16. Cieslewicz, J., Ross, K.A.: Adaptive aggregation on chip multiprocessors. In: VLDB, pp. 339–350 (2007)

  17. Clark, K.L., McCabe, F.G.: Go! a multi-paradigm programming language for implementing multi-threaded agents. Ann. Math. Artif. Intell. 41, 171–206 (2004)

    Google Scholar 

  18. Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 48–57 (2010)

    Google Scholar 

  19. Daniels, D.S., Spector, A.Z., Thompson, D.S.: Distributed logging for transaction processing. SIGMOD Rec. 16, 82–96 (1987)

    Google Scholar 

  20. Davis, J.D., Laudon, J., Olukotun, K.: Maximizing CMP throughput with mediocre cores. In: PACT, pp. 51–62(2005)

  21. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, p. 10 (2004)

  22. DeCandia, G., et al.: Dynamo: Amazon’s highly available key-value store. SIGOPS OSR 41(6), 205–220 (2007)

    Google Scholar 

  23. Dewitt, D.J., et al.: The Gamma database machine project. IEEE TKDE 2(1), 44–62 (1990)

    Google Scholar 

  24. Dragojevic, A., Guerraoui, R., Kapalka, M.: Dividing transactional memories by zero. In: TRANSACT (2008)

  25. Graefe, G.: Hierarchical locking in B-tree indexes. In: BTW, pp. 18–42 (2007)

  26. Gray, J., Helland, P., O’Neil, P., Shasha, D.: The dangers of replication and a solution. In: SIGMOD, pp. 173–182 (1996)

  27. Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (1992)

    Google Scholar 

  28. Hardavellas, N., et al.: Database servers on chip multiprocessors: limitations and opportunities. In: CIDR, pp. 79–87 (2007)

  29. Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: SIGMOD, pp. 981–992 (2008)

  30. Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: QPipe: a simultaneously pipelined relational query engine. In: SIGMOD, pp. 383–394 (2005)

  31. Helland, P.: Life beyond distributed transactions: an apostate’s opinion. In: CIDR, pp. 132–141 (2007)

  32. Helland, P., et al.: Group commit timers and high volume transaction systems. In: HPTS, pp. 301–329 (1989)

  33. Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13(1), 124–149 (1991)

    Google Scholar 

  34. Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. SIGARCH Comput. Archit. News 21(2), 289–300 (1993)

    Google Scholar 

  35. Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41, 33–38 (2008)

    Google Scholar 

  36. Hunt, G.C., Larus, J.R.: Singularity: rethinking the software stack. SIGOPS OSR 41(2), 37–49 (2007)

    Google Scholar 

  37. Jaluta, I., Sippu, S., Soisalon-Soininen, E.: B-tree concurrency control and recovery in page-server database systems. ACM TODS 31, 82–132 (2006)

    Google Scholar 

  38. Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: EDBT, pp. 24–35 (2009)

  39. Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. PVLDB 3, 681–692 (2010)

    Google Scholar 

  40. Jones, E., Abadi, D.J., Madden, S.: Low overhead concurrency control for partitioned main memory databases. In: SIGMOD, pp. 603–614 (2010)

  41. Jorwekar, S., Fekete, A., Ramamritham, K., Sudarshan, S.: Automating the detection of snapshot isolation anomalies. In: VLDB, pp. 1263–1274 (2007)

  42. Kemper, A., Neumann, T.: HyPer—a hybrid OLTP &OLAP main memory database system based on virtual memory snapshots. In: ICDE, pp. 195–206 (2011)

  43. Kimura, H., Graefe, G., Kuno, H.: Efficient locking techniques for databases on modern hardware. In: ADMS (2012)

  44. Kung, H.T., Robinson, J.T.: On optimistic methods for concurrency control. ACM TODS 6(2), 213–226 (1981)

    Google Scholar 

  45. Larson, P.A., et al.: High-performance concurrency control mechanisms for main-memory databases. PVLDB 5(4), 298–309 (2011)

    Google Scholar 

  46. Lauer, H.C., Needham, R.M.: On the duality of operating system structures. SIGOPS OSR 13, 3–19 (1979)

    Google Scholar 

  47. Lee, J., Kim, K., Cha, S.K.: Differential logging: a commutative and associative logging scheme for highly parallel main memory database. In: ICDE, pp. 173–184 (2001)

  48. Lee, M.L., Kitsuregawa, M., Ooi, B.C., Tan, K.L., Mondal, A.: Towards self-tuning data placement in parallel database systems. In: SIGMOD, pp. 225–236 (2000)

  49. Lomet, D.: Recovery for shared disk systems using multiple redo logs. Technical report CRL-90-4 (1990)

  50. Lomet, D., Anderson, R., Rengarajan, T.K., Spiro, P.: How the Rdb/VMS data sharing system became fast. Technical report CRL-92-4 (1992)

  51. Magnusson, P.S., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: ISPP, pp. 165–171 (1994)

  52. Mellor-Crummey, J.M., Scott, M.L.: Scalable reader-writer synchronization for shared-memory multiprocessors. SIGPLAN Not. 26(7), 106–113 (1991)

    Google Scholar 

  53. Mohan, C.: ARIES/KVL: a key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In: VLDB, pp. 392–405 (1990)

  54. Mohan, C., Levine, F.: ARIES/IM: an efficient and high concurrency index management method using write-ahead logging. In: SIGMOD, pp. 371–380 (1992)

  55. Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free FIFO queues. In: SPAA, pp. 253–262 (2005)

  56. Pandis, I., Johnson, R., Hardavellas, N., Ailamaki, A.: Data-oriented transaction execution. PVLDB 3(1), 928–939 (2010)

    Google Scholar 

  57. Pandis, I., Tözün, P., Johnson, R., Ailamaki, A.: PLP: page latch-free shared-everything OLTP. PVLDB 4(10), 610–621 (2011)

    Google Scholar 

  58. Porobic, D., Pandis, I., Branco, M., Tözün, P., Ailamaki, A.: OLTP on hardware islands. PVLDB 5(11), 1447–1458 (2012)

    Google Scholar 

  59. Ranganathan, P., Gharachorloo, K., Adve, S.V., Barroso, L.A.: Performance of database workloads on shared-memory systems with out-of-order processors. In: ASPLOS, pp. 307–318 (1998)

  60. Rao, J., Ross, K.A.: Cache conscious indexing for decision-support in main memory. In: VLDB, pp. 78–89 (1999)

  61. Rao, J., Ross, K.A.: Making B+-trees cache conscious in main memory. In: SIGMOD, pp. 475–486 (2000)

  62. Shavit, N., Touitou, D.: Software transactional memory. In: PODC, pp. 204–213 (1995)

  63. Smith, A.J.: Sequentiality and prefetching in database systems. ACM TODS 3, 223–247 (1978)

    Google Scholar 

  64. Soisalon-Soininen, E., Ylönen, T.: Partial strictness in two-phase locking. In: ICDT, pp. 139–147 (1995)

  65. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM, pp. 149–160 (2001)

  66. Stonebraker, M.: The case for shared nothing. IEEE Database, Eng. Bull. 9, 4–9 (1986)

    Google Scholar 

  67. Stonebraker, M.: Stonebraker on nosql and enterprises. Commun. ACM 54, 10–11 (2011)

    Google Scholar 

  68. Stonebraker, M., et al.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)

  69. Thomson, A., Abadi, D.J.: The case for determinism in database systems. PVLDB 3, 70–80 (2010)

    Google Scholar 

  70. Tözün, P., Pandis, I., Johnson, R., Ailamaki, A.: Scalable and dynamically balanced shared-everything OLTP with physiological partitioning. VLDB J 1–25 (2012)

  71. Vogels, W.: Eventually consistent. Commun. ACM 52, 40–44 (2009)

  72. Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable internet services. In: SOSP, pp. 230–243 (2001)

  73. Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: ISCA, pp. 24–36 (1995)

Download references

Acknowledgments

The authors are deeply grateful for all the members of Carnegie Mellon’s StagedDB/CMP team and EPFL’s DIAS laboratory who made this work possible through their research efforts, helpful feedback, and encouragement. We are especially indebted to Pinar Tözün who helped us with an additional set of experiments. We would also like to thank the many anonymous reviewers whose thoughtful and constructive remarks helped improve both this paper and the research papers summarized here. This research was supported by an IBM PhD fellowship; grants and equipment from Intel and Sun; a Sloan research fellowships; an IBM faculty partnership award; NSF grants CCR-0205544, CCR-0509356, IIS-0133686, and IIS-0713409; an ESF EurYI award; and Swiss National Foundation funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryan Johnson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Johnson, R., Pandis, I. & Ailamaki, A. Eliminating unscalable communication in transaction processing. The VLDB Journal 23, 1–23 (2014). https://doi.org/10.1007/s00778-013-0312-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-013-0312-3

Keywords

Navigation