skip to main content
chapter

The end of an architectural era: it's time for a complete rewrite

Published: 01 December 2018 Publication History

Abstract

In previous papers [SC05, SBC+07], some of us predicted the end of "one size fits all" as a commercial relational DBMS paradigm. These papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1--2 orders of magnitude by specialized engines in the data warehouse, stream processing, text, and scientific database markets.
Assuming that specialized engines dominate these markets over time, the current relational DBMS code lines will be left with the business data processing (OLTP) market and hybrid markets where more than one kind of capability is required. In this paper we show that current RDBMSs can be beaten by nearly two orders of magnitude in the OLTP market as well. The experimental evidence comes from comparing a new OLTP prototype, H-Store, which we have built at M.I.T. to a popular RDBMS on the standard transactional benchmark, TPC-C.
We conclude that the current RDBMS code lines, while attempting to be a "one size fits all" solution, infact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of "from scratch" specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for yesterday's needs.

References

[1]
A. Arasu, S. Babu, and J. Widom. "The CQL Continuous Query Language Semantic Foundations and Query Execution." iThe VLDB Journali, 15(2), June 2006.
[2]
R. Agrawal, M. J. Carey, and M. Livny. "Concurrency control performance modeling alternatives and implications." iACM Trans. Database Syst. 12(4)i, Nov. 1987.
[3]
D. Abadi, A. Marcus, S. Madden, and K. Hollenbach. "Scalable Semantic Web Data Management Using Vertical Partitioning." In iProc. VLDBi, 2007.
[4]
ANTs Software. ANTs Data Server-Technical White Paper, httpwww.ants.com, 2007.
[5]
P. A. Bernstein, D. Shipman, and J. B. Rothnie. "Concurrency Control in a System for Distributed Databases (SDD-1)." iACM Trans. Database Syst. 5(1)i, March 1980.
[6]
P. A. Boncz. "Monet A Next-Generation DBMS Kernel For Query-Intensive Applications." Ph.D. Thesis, Universiteit van Amsterdam, Amsterdam, The Netherlands, May 2002.
[7]
C. J. Date. "An Architecture for High-Level Language Database Extensions." In iProc. SIGMODi, 1976.
[8]
C. J. Date. "A critique of the SQL database language." In iSIGMOD Record i14(3)8-54, Nov. 1984.
[9]
D. J. Dewitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. Hsiao, and R. Rasmussen. "The Gamma Database Machine Project." iIEEE Transactions on Knowledge and Data Engineering 2(1)44-62i, March 1990.
[10]
P. Helland. "Life beyond Distributed Transactions an Apostate's Opinion." In iProc. CIDRi, 2007.
[11]
M. Herlihy and J. E. Moss. "Transactional memory architectural support for lock-free data structures." In iProc. ISCAi, 1993.
[12]
H. T. Kung and J. T. Robinson. "On optimistic methods for concurrency control." iACM Trans. Database Syst. 6(2)213-226i, June 1981.
[13]
E. Lau and S. Madden. "An Integrated Approach to Recovery and High Availability in an Updatable, Distributed Data Warehouse." In iProc. VLDBi, 2006.
[14]
C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. "ARIES a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging." iACM Trans. Database Syst. 17(1)94-162i, March 1992.
[15]
D. L. Mills. "On the Accuracy and Stability of Clocks Synchronized by the Network Time Protocol in the Internet System." iSIGCOMM Comput. Commun. Rev. 20(1)65-75i, Dec. 1989.
[16]
F. Manola and E. Miller, (eds). RDF Primer. W3C Specification, February 10, 2004. httpwww.w3.orgTRREC-rdf-primer-20040210
[17]
J. Rao and K. A. Ross. "Cache Conscious Indexing for Decision-Support in Main Memory." In iProc. VLDBi, 1999.
[18]
J. Rao and K. A. Ross. "Making B+-trees cache conscious in main memory." In iSIGMOD Record, 29(2)475-486i, June 2000.
[19]
L. A. Rowe and K. A. Shoens. "Data Abstractions, Views and Updates in RIGEL." In iProc. SIGMODi, 1979.
[20]
Randall Rustin (Ed.) Proceedings of 1974 ACM-SIGMOD Workshop on Data Description, Access and Control, Ann Arbor, Michigan, May 1-3, 1974, 2 Volumes.
[21]
M. Stonebraker, D. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. "C-Store A Column-oriented DBMS." In iProc. VLDBi, 2005.
[22]
M. Stonebraker, C. Bear, U. Cetintemel, M. Cherniack, T. Ge, N. Hachem, S. Harizopoulos, J. Lifter, J. Rogers, and S. Zdonik. "One Size Fits All-Part 2 Benchmarking Results." In iProc. CIDRi, 2007.
[23]
M. Stonebraker and U. Cetintemel. "One Size Fits All An Idea whose Time has Come and Gone." In iProc. ICDEi, 2005.
[24]
J.W. Schmidt, et al. "PascalR Report."UHamburg, Fachbereich Informatik, Report 66, Jan 1980.
[25]
The Transaction Processing Council. TPC-C Benchmark (Revision 5.8.0), 2006. httpwww.tpc.orgtpccspectpcc_current.pdf
[26]
D. Abadi, Y. Ahmad, M. Balazinska, U. Çetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. 2005. The design of the Borealis stream processing engine. Proc. of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR'05), Asilomar, CA, January.
[27]
Z. Abedjan, L. Golab, and F. Naumann. August 2015. Profiling relational data: a survey. The VLDB Journal, 24(4): 557-581.
[28]
ACM. 2015a. Announcement: Michael Stonebraker, Pioneer in Database Systems Architecture, Receives 2014 ACM Turing Award. http://amturing.acm.org/award_winners/stonebraker_1172121.cfm. Accessed February 5, 2018.
[29]
ACM. March 2015b. Press Release: MIT's Stonebraker Brought Relational Database Systems from Concept to Commercial Success, Set the Research Agenda for the Multibillion-Dollar Database Field for Decades. http://sigmodrecord.org/publications/sigmodRecord/1503/pdfs/04_announcements_Stonebraker.pdf. Accessed February 5, 2018.
[30]
ACM. 2016. A.M. Turing Award Citation and Biography. http://amturing.acm.org/award_winners/stonebraker_1172121.cfm. Accessed September 24, 2018.
[31]
Y. Ahmad, B. Berg, U. Çetintemel, M. Humphrey, J. Hwang, A. Jhingran, A. Maskey, O. Papaemmanouil, A. Rasin, N. Tatbul, W. Xing, Y. Xing, and S. Zdonik. June 2005. Distributed operation in the Borealis Stream Processing Engine. Demonstration, ACM SIGMOD International Conference on Management of Data (SIGMOD'05). Baltimore, MD. Best Demonstration Award.
[32]
M. M. Astrahan, M.W. Blasgen, D. D. Chamberlin, K. P. Eswaran, J. N. Gray, P. P. Griffiths, W. F. King, R. A. Lorie, P. R. McJones, J. W. Mehl, G. R. Putzolu, I. L. Traiger, B. W. Wade, and V. Watson. 1976. System R: relational approach to database management. ACM Transactions on Database Systems, 1(2): 97-137.
[33]
P. Bailis, E. Gan, S. Madden, D. Narayanan, K. Rong, and S. Suri. 2017. Macrobase: Prioritizing attention in fast data. Proc. of the 2017 ACM International Conference on Management of Data. ACM.
[34]
Berkeley Software Distribution. n.d. In Wikipedia. http://en.wikipedia.org/wiki/Berkeley_Software_Distribution. Last accessed March 1, 2018.
[35]
G. Beskales, I.F. Ilyas, L. Golab, and A. Galiullin. 2013. On the relative trust between inconsistent data and inaccurate constraints. Proc. of the IEEE International Conference on Data Engineering, ICDE 2013, pp. 541-552. Australia.
[36]
L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, R. C. Whaley. 2017. ScaLAPACK Users' Guide. Society for Industrial and Applied Mathematics http://netlib.org/scalapack/slug/index.html. Last accessed December 31, 2017.
[37]
D. Bitton, D. J. DeWitt, and C. Turbyfill. 1983. Benchmarking database systems--a systematic approach. Computer Sciences Technical Report #526, University of Wisconsin. http://minds.wisconsin.edu/handle/1793/58490.
[38]
P. A. Boncz, M. L. Kersten, and S. Manegold. December 2008. Breaking the memory wall in MonetDB. Communications of the ACM, 51(12): 77-85.
[39]
M. L. Brodie. June 2015. Understanding data science: an emerging discipline for data-intensive discovery. In S. Cutt, editor, Getting Data Right: Tackling the Challenges of Big Data Volume and Variety. O'Reilly Media, Sebastopol, CA.
[40]
Brown University, Department of Computer Science. Fall 2002. Next generation stream-based applications. Conduit Magazine, 11(2). https://cs.brown.edu/about/conduit/conduit_v11n2.pdf. Last accessed May 14, 2018.
[41]
BSD licenses. n.d. In Wikipedia. http://en.wikipedia.org/wiki/BSD_licenses. Last accessed March 1, 2018.
[42]
M. Cafarella and C. Ré. April 2018. The last decade of database research and its blindingly bright future. or Database Research: A love song. DAWN Project, Stanford University. http://dawn.cs.stanford.edu/2018/04/11/db-community/.
[43]
M. J. Carey, D. J. DeWitt, M. J. Franklin, N. E Hall, M. L. McAuliffe, J. F. Naughton, D. T. Schuh, M. H. Solomon, C. K. Tan, O. G. Tsatalos, S. J. White, and M. J. Zwilling. 1994. Shoring up persistent applications. Proc. of the 1994 ACM SIGMOD international conference on Management of data (SIGMOD '94), 383-394.
[44]
M. J. Carey, D. J. Dewitt, M. J. Franklin, N. E. Hall, M. L. McAuliffe, J. F. Naughton, D. T. Schuh, M. H. Solomon, C. K. Tan, O. G. Tsatalos, S. J. White, and M. J. Zwilling. 1994. Shoring up persistent applications. In Proc. of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD '94), pp. 383-394.
[45]
M. J. Carey, L. M. Haas, P. M. Schwarz, M. Arya, W. E. Cody, R. Fagin, M. Flickner, A. W. Luniewski, W. Niblack, and D. Petkovic. 1995. Towards heterogeneous multimedia information systems: The garlic approach. In Research Issues in Data Engineering, 1995: Distributed Object Management, Proceedings, pp. 124-131. IEEE.
[46]
CERN. http://home.cern/about/computing. Last accessed December 31, 2017.
[47]
D. D. Chamberlin and R. F. Boyce. 1974. SEQUEL: A structured English query language. In Proc. of the 1974 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control (SIGFIDET '74), pp. 249-264. ACM, New York.
[48]
D. D. Chamberlin, M. M. Astrahan, K. P. Eswaran, P. P. Griffiths, R. A. Lorie, J. W. Mehl, P. Reisner, and B. W. Wade. 1976. SEQUEL 2: a unified approach to data definition, manipulation, and control. IBM Journal of Research and Development, 20(6): 560-575.
[49]
S. Chandrasekaran, O, Cooper, A. Deshpande, M.J. Franklin, J.M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah. 2003. TelegraphCQ: Continuous dataflow processing for an uncertain world. Proc. of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD '03), pp. 668-668. ACM, New York.
[50]
J. Chen, D.J. DeWitt, F. Tian, and Y. Wang. 2000. NiagaraCQ: A scalable continuous query system for Internet databases. Proc. of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD '00), pp. 379-390. ACM, New York.
[51]
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Çetintemel, Y. Xing, and S. Zdonik. 2003. Scalable distributed stream processing. Proc. of the First Biennial Conference on Innovative Database Systems (CIDR'03), Asilomar, CA, January.
[52]
C. M. Christensen. 1997. The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail. Harvard Business School Press, Boston, MA.
[53]
X. Chu, I. F. Ilyas, and P. Papotti. 2013a. Holistic data cleaning: Putting violations into context. Proc. of the IEEE International Conference on Data Engineering, ICDE 2013, pp. 458-469. Australia.
[54]
X. Chu, I. F. Ilyas, and P. Papotti. 2013b. Discovering denial constraints. Proc. of the VLDB Endowment, PVLDB 6(13): 1498-1509.
[55]
X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang, and Y. Ye. 2015. Katara: A data cleaning system powered by knowledge bases and crowdsourcing. In Proc. of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15), pp. 1247-1261. ACM, New York.
[56]
P. J. A. Cock, C. J. Fields, N. Goto, M. L. Heuer, and P. M. Rice. 2009. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research 38.6: 1767-1771.
[57]
E. F. Codd. June 1970. A relational model of data for large shared data banks. Communications of the ACM, 13(6): 377-387.
[58]
M. Collins. 2016. Thomson Reuters uses Tamr to deliver better connected content at a fraction of the time and cost of legacy approaches. Tamr blog, July 28. https://www.tamr.com/video/thomson-reuters-uses-tamr-deliver-better-connected-content-fraction-time-cost-legacy-approaches/. Last accessed January 24, 2018.
[59]
G. Copeland and D. Maier. 1984. Making smalltalk a database system. Proc. of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD '84), pp. 316-325. ACM, New York.
[60]
C. Cranor, T. Johnson, V. Shkapenyuk, and O. Spatscheck. 2003. Gigascope: A stream database for network applications. Proc. of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD '03), pp. 647-651. ACM, New York.
[61]
A. Crotty, A. Galakatos, K. Dursun, T. Kraska, U. Cetintemel, and S. Zdonik. 2015. Tupleware: "Big Data, Big Analytics, Small Clusters." CIDR.
[62]
M. Dallachiesa, A. Ebaid, A. Eldawi, A. Elmagarmid, I. F. Ilyas, M. Ouzzani, and N. Tang. 2013. NADEEF, a commodity data cleaning system. Proc. of the 2013 ACM SIGMOD Conference on Management of Data, pp. 541-552. New York.
[63]
T. Dasu and J. M. Loh. 2012. Statistical distortion: Consequences of data cleaning. PVLDB, 5(11): 1674-1683.
[64]
C. J. Date and E. F. Codd. 1975. The relational and network approaches: Comparison of the application programming interfaces. In Proc. of the 1974 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control: Data Models: Data-Structure-Set Versus Relational (SIGFIDET '74), pp. 83-113. ACM, New York.
[65]
D. J. DeWitt. 1979a. Direct a multiprocessor organization for supporting relational database management systems. IEEE Transactions of Computers, 28(6), 395-406.
[66]
D. J. DeWitt. 1979b. Query execution in DIRECT. In Proc. of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD '79), pp. 13-22. ACM, New York.
[67]
D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna. 1986. GAMMA--a high performance dataflow database machine. Proc. of the 12th International Conference on Very Large Data Bases (VLDB '86), W. W. Chu, G. Gardarin, S. Ohsuga, and Y. Kambayashi, editors, pp. 228-237. Morgan Kaufmann Publishers Inc., San Francisco, CA.
[68]
D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H.-I. Hsiao, and R. Rasmussen. March 1990. The Gamma database machine project. IEEE Transactions on Knowledge and Data Engineering, 2(1): 44-62.
[69]
D. DeWitt and J. Gray. June 1992. Parallel database systems: the future of high performance database systems. Communications of the ACM, 35(6): 85-98.
[70]
D. J. DeWitt, A. Halverson, R. Nehme, S. Shankar, J. Aguilar-Saborit, A. Avanes, M. Flasza, and J. Gramling. 2013. Split query processing in polybase. Proc. of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13), pp. 1255-1266. ACM, New York.
[71]
C. Diaconu, C. Freedman, E. Ismert, P-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In Proc. of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13), pp. 1243-1254. ACM, New York.
[72]
K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L. Traiger. November 1976. The notions of consistency and predicate locks in a database system. Communications of the ACM, 19(11): 624-633.
[73]
W. Fan, J. Li, S. Ma, N. Tang, and W. Yu. April 2012. Towards certain fixes with editing rules and master data. The VLDB Journal, 21(2): 213-238.
[74]
D. Fogg. September 1982. Implementation of domain abstraction in the relational database system INGRES. Master of Science Report, Dept. of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA.
[75]
T. Flory, A. Robbin, and M. David. May 1988. Creating SIPP longitudinal analysis files using a relational database management system. CDE Working Paper No. 88-32, Institute for Research on Poverty, University of Wisconsin-Madison, Madison, WI.
[76]
V. Gadepally, J. Kepner, W. Arcand, D. Bestor, B. Bergeron, C. Byun, L. Edwards, M. Hubbell, P. Michaleas, J. Mullen, A. Prout, A. Rosa, C. Yee, and A. Reuther. 2015. D4M: Bringing associative arrays to database engines. High Performance Extreme Computing Conference (HPEC). IEEE, 2015.
[77]
V. Gadepally, K. O'Brien, A. Dziedzic, A. Elmore, J. Kepner, S. Madden, T. Mattson, J. Rogers, Z. She, and M. Stonebraker. September 2017. BigDAWG Version 0.1. IEEE High Performance Extreme.
[78]
J. Gantz and D. Reinsel. 2013. The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East--United States, IDC, February.
[79]
L. Gerhardt, C. H. Faham, and Y. Yao. 2015. Accelerating scientific analysis with SciDB. Journal of Physics: Conference Series, 664(7).
[80]
B. Grad. 2007. Oral history of Michael Stonebraker, Transcription. Recorded: August 23, 2007. Computer History Museum, Moultonborough, NH. http://archive.computerhistory.org/resources/access/text/2012/12/102635858-05-01-acc.pdf. Last accessed April 8, 2018.
[81]
A. Guttman. 1984. R-trees: a dynamic index structure for spatial searching. In Proc. of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD '84), pp. 47-57. ACM, New York.
[82]
L. M. Haas, J. C. Freytag, G. M. Lohman, and H. Pirahesh. 1989. Extensible query processing in starburst. In Proc. of the 1989 ACM SIGMOD International Conference on Management of Data (SIGMOD '89), pp. 377-388. ACM, New York.
[83]
D. Halperin, V. Teixeira de Almeida, L. L. Choo, S. Chu, P. Koutris, D. Moritz, J. Ortiz, V. Ruamviboonsuk, J. Wang, A. Whitaker. 2014. Demonstration of the Myria big data management service. Proc. of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14), p. 881-884. ACM, New York.
[84]
B. Haynes, A. Cheung, and M. Balazinska. 2016. PipeGen: Data pipe generator for hybrid analytics. Proc. of the Seventh ACM Symposium on Cloud Computing (SoCC '16), M. K. Aguilera, B. Cooper, and Y. Diao, editors, pp. 470-483. ACM, New York.
[85]
M. A. Hearst. 2009. Search user interfaces. Cambridge University Press, New York.
[86]
J. M. Hellerstein, J. F. Naughton, and A. Pfeffer. 1995. Generalized search trees for database systems. In Proc. of the 21th International Conference on Very Large Data Bases (VLDB '95), pp. 562-573. Morgan Kaufmann Publishers Inc., San Francisco, CA. http://dl.acm.org/citation.cfm?id=645921.673145.
[87]
J. M. Hellerstein, E. Koutsoupias, D. P. Miranker, C. H. Papadimitriou, V. Samoladas. 2002. On a model of indexability and its bounds for range queries, Journal of the ACM (JACM), 49.1: 35-55.
[88]
IBM. 1997. Special Issue on IBM's S/390 Parallel Sysplex Cluster. IBM Systems Journal, 36(2).
[89]
S. Idreos, F. Groffen, N. Nes, S. Manegold, S. K. Mullender, and M. L. Kersten. 2012. MonetDB: two decades of research in column-oriented database architectures. IEEE Data Engineering Bulletin, 35(1): 40-45.
[90]
N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. Çetintemel, M. Cherniack, R. Tibbetts, and S. Zdonik. 2008. Towards a streaming SQL standard. Proc. VLDB Endowment, pp. 1379-1390. August 1-2.
[91]
A. E. W. Johnson, T. J. Pollard, L. Shen, L. H. Lehman, M. Feng, M. Ghassemi, B. E. Moody, P. Szolovits, L. A. G. Celi, and R. G. Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific Data 3: 160035
[92]
V. Josifovski, P. Schwarz, L. Haas, and E. Lin. 2002. Garlic: a new flavor of federated query processing for DB2. In Proc. of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD '02), pp. 524-532. ACM, New York.
[93]
J. W. Josten, C. Mohan, I. Narang, and J. Z. Teng. 1997. DB2's use of the coupling facility for data sharing. IBM Systems Journal, 36(2): 327-351.
[94]
S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. 2011. Wrangler: Interactive visual specification of data transformation scripts. In Proc. of the SIGCHI Conference on Human Factors in Computing Systems (CHI '11), pp. 3363-3372. ACM, New York.
[95]
R. Katz. editor. June 1982. Special issue on design data management. IEEE Database Engineering Newsletter, 5(2).
[96]
J. Kepner, V. Gadepally, D. Hutchison, H. Jensen, T. Mattson, S. Samsi, and A. Reuther. 2016. Associative array model of SQL, NoSQL, and NewSQL Databases. IEEE High Performance Extreme Computing Conference (HPEC) 2016, Waltham, MA, September 13-15.
[97]
V. Kevin and M. Whitney. 1974. Relational data management implementation techniques. In Proc. of the 1974 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control (SIGFIDET '74), pp. 321-350. ACM, New York.
[98]
Z. Khayyat, I.F. Ilyas, A. Jindal, S. Madden, M. Ouzzani, P. Papotti, J.-A. Quiané-Ruiz, N. Tang, and S. Yin. 2015. Bigdansing: A system for big data cleansing. In Proc. of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15), pp. 1215-1230. ACM, New York.
[99]
R. Kimball and M. Ross. 2013. The Data Warehouse Toolkit. John Wiley & Sons, Inc. https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/. Last accessed March 2, 2018.
[100]
M. Kornacker, C. Mohan, and J.M. Hellerstein. 1997. Concurrency and recovery in generalized search trees. In Proc. of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD '97), pp. 62-72. ACM, New York.
[101]
A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear. August 2012. The Vertica Analytic Database: C-Store 7 years later. Proc. VLDB Endowment, 5(12): 1790-1801.
[102]
L. Lamport. 2001. Paxos Made Simple. http://lamport.azurewebsites.net/pubs/paxos-simple.pdf. Last accessed December 31, 2017.
[103]
D. Laney. 2001. 3D data management: controlling data volume, variety and velocity. META Group Research, February 6. https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Last accessed April 22, 2018.
[104]
P-A. Larson, C. Clinciu, E.N. Hanson, A. Oks, S.L. Price, S. Rangarajan, A. Surna, and Q. Zhou. 2011. SQL server column store indexes. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11), pp. 1177-1184. ACM, New York.
[105]
J. LeFevre, J. Sankaranarayanan, H. Hacigumus, J. Tatemura, N. Polyzotis, and M. J. Carey. 2014. MISO: Souping up big data query processing with a multistore system. Proc. of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14), pp. 1591-1602. ACM, New York.
[106]
B. G. Lindsay. 1987. A retrospective of R*: a distributed database management system. In Proc. of the IEEE, 75(5): 668-673.
[107]
B. Liskov and S.N. Zilles. 1974. Programming with abstract data types. SIGPLAN Notices, 9(4): 50-59.
[108]
S. Marcin and A. Csillaghy. 2016. Running scientific algorithms as array database operators: Bringing the processing power to the data. 2016 IEEE International Conference on Big Data. pp. 3187-3193.
[109]
T. Mattson, V. Gadepally, Z. She, A. Dziedzic, and J. Parkhurst. 2017. Demonstrating the BigDAWG polystore system for ocean metagenomic analysis. CIDR'17 Chaminade, CA. http://cidrdb.org/cidr2017/papers/p120-mattson-cidr17.pdf.
[110]
J. Meehan, C. Aslantas, S. Zdonik, N. Tatbul, and J. Du. 2017. Data ingestion for the connected world. Conference on Innovative Data Systems Research (CIDR'17), Chaminade, CA, January.
[111]
A. Metaxides, W. B. Helgeson, R. E. Seth, G. C. Bryson, M. A. Coane, D. G. Dodd, C. P. Earnest, R. W. Engles, L. N. Harper, P. A. Hartley, D. J. Hopkin, J. D. Joyce, S. C. Knapp, J. R. Lucking, J. M. Muro, M. P. Persily, M. A. Ramm, J. F. Russell, R. F. Schubert, J. R. Sidlo, M. M. Smith, and G. T. Werner. April 1971. Data Base Task Group Report to the CODASYL Programming Language Committee. ACM, New York.
[112]
C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. 1992. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems, 17(1), 94-162.
[113]
R. Motwani, J. Widom, A. Arasu B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. 2003. Query processing, approximation, and resource management in a data stream management system. Proc. of the First Biennial Conference on Innovative Data Systems Research (CIDR), January.
[114]
A. Oloso, K-S Kuo, T. Clune, P. Brown, A. Poliakov, H. Yu. 2016. Implementing connected component labeling as a user defined operator for SciDB. Proc. of 2016 IEEE International Conference on Big Data (Big Data). Washington, DC.
[115]
M. A. Olson. 1993. The design and implementation of the inversion file system. USENIX Winter. http://www.usenix.org/conference/usenix-winter-1993-conference/presentation/design-and-implementation-inversion-file-syste. Last accessed January 22, 2018.
[116]
J. C. Ong. 1982. Implementation of abstract data types in the relational database system INGRES, Master of Science Report, Dept. of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, September 1982.
[117]
A. Palmer. 2013. Culture matters: Facebook CIO talks about how well Vertica, Facebook people mesh. Koa Labs Blog, December 20. http://koablog.wordpress.com/2013/12/20/culture-matters-facebook-cio-talks-about-how-well-vertica-facebook-people-mesh. Last accessed March 14, 2018.
[118]
A. Palmer. 2015a. The simple truth: happy people, healthy company. Tamr Blog, March 23. http://www.tamr.com/the-simple-truth-happy-people-healthy-company/. Last accessed March 14, 2018.
[119]
A. Palmer. 2015b. Where the red book meets the unicorn, Xconomy, June 22. http://www.xconomy.com/boston/2015/06/22/where-the-red-book-meets-the-unicorn/ Last accessed March 14, 2018.
[120]
A. Pavlo and M. Aslett. September 2016. What's really new with NewSQL? ACM SIGMOD Record, 45(2): 45-55.
[121]
G. Press. 2016. Cleaning big data: most time-consuming, least enjoyable data science task, survey says. Forbes, May 23. https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#79e14e326f63.
[122]
N. Prokoshyna, J. Szlichta, F. Chiang, R. J. Miller, and D. Srivastava. 2015. Combining quantitative and logical data cleaning. PVLDB, 9(4): 300-311.
[123]
E. Ryvkina, A. S. Maskey, M. Cherniack, and S. Zdonik. 2006. Revision processing in a stream processing engine: a high-level design. Proc. of the 22nd International Conference on Data Engineering (ICDE'06), pp. 141-. Atlanta, GA, April. IEEE Computer Society, Washington, DC.
[124]
C. Saracco and D. Haderle. 2013. The history and growth of IBM's DB2. IEEE Annals of the History of Computing, 35(2): 54-66.
[125]
N. Savage. May 2015. Forging relationships. Communications of the ACM, 58(6): 22-23.
[126]
M. C. Schatz and B. Langmead. 2013. The DNA data deluge. IEEE Spectrum Magazine. https://spectrum.ieee.org/biomedical/devices/the-dna-data-deluge.
[127]
Z. She, S. Ravishankar, and J. Duggan. 2016. BigDAWG polystore query optimization through semantic equivalences. High Performance Extreme Computing Conference (HPEC). IEEE, 2016.
[128]
SIGFIDET panel discussion. 1974. In Proc. of the 1974 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control: Data Models: Data-Structure-Set Versus Relational (SIGFIDET '74), pp. 121-144. ACM, New York.
[129]
R. Snodgrass. December 1982. Monitoring distributed systems: a relational approach. Ph.D. Dissertation, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA.
[130]
A. Szalay. June 2008. The Sloan digital sky survey and beyond. ACM SIGMOD Record, 37(2): 61-66.
[131]
Tamr. 2017. Tamr awarded patent for enterprise-scale data unification system. Tamr blog. February 9 2017. https://www.tamr.com/tamr-awarded-patent-enterprise-scale-data-unification-system-2/. Last accessed January 24, 2018.
[132]
R. Tan, R. Chirkova, V. Gadepally, and T. Mattson. 2017. Enabling query processing across heterogeneous data models: A survey. IEEE Big Data Workshop: Methods to Manage Heterogeneous Big Data and Polystore Databases, Boston, MA.
[133]
N. Tatbul and S. Zdonik. 2006. Window-aware Load Shedding for Aggregation Queries over Data Streams. In Proc. of the 32nd International Conference on Very Large Databases (VLDB'06), Seoul, Korea.
[134]
N. Tatbul, U. Çetintemel, and S. Zdonik. 2007. "Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing." International Conference on Very Large Data Bases (VLDB'07), Vienna, Austria.
[135]
R. P. van de Riet. 1986. Expert database systems. In Future Generation Computer Systems, 2(3): 191-199,
[136]
M. Vartak, S. Rahman, S. Madden, A. Parameswaran, and N. Polyzotis. September 2015. Seedb: Efficient data-driven visualization recommendations to support visual analytics. PVLDB, 8(13): 2182-2193.
[137]
B. Wallace. June 9, 1986. Data base tool links to remote sites. Network World. http://books.google.com/books?id=aBwEAAAAMBAJ&pg=PA49&lpg=PA49&dq=ingres+star&source=bl&ots=FSMIR4thMj&sig=S1fzaaOT5CHRq4cwbLFEQp4UYCs&hl=en&sa=X&ved=0ahUKEwjJ1J_NttvZAhUG82MKHco2CfAQ6AEIYzAP#v=onepage&q=ingres%20star&f=false. Last accessed March 14, 2018.
[138]
J. Wang and N. J. Tang. 2014. Towards dependable data repairing with fixing rules. In Proc. of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14), pp. 457-468. ACM, New York.
[139]
E. Wong and K. Youssefi. September 1976. Decomposition--a strategy for query processing. ACM Transactions on Database Systems, 1(3): 223-241.
[140]
E. Wu and S. Madden. 2013. Scorpion: Explaining away outliers in aggregate queries. PVLDB, 6(8): 553-564.
[141]
Y. Xing, S. Zdonik, and J.-H. Hwang. April 2005. Dynamic load distribution in the Borealis Stream Processor. Proc. of the 21st International Conference on Data Engineering (ICDE'05), Tokyo, Japan.

Cited By

View all
  • (2024)LazyLog: A New Shared Log Abstraction for Low-Latency ApplicationsProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695983(296-312)Online publication date: 4-Nov-2024
  • (2024)CCBPS: A Hardware-based Data Loss Prevention Approach2024 IEEE 35th International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW63542.2024.00043(31-36)Online publication date: 28-Oct-2024
  • (2024)MTable: Visual Query Interface for Browsing and Navigation in NoSQL Data StoresJournal of Computer Languages10.1016/j.cola.2024.101312(101312)Online publication date: Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Books
Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker
December 2018
725 pages
ISBN:9781947487192
DOI:10.1145/3226595

Publisher

Association for Computing Machinery and Morgan & Claypool

Publication History

Published: 01 December 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Chapter

Appears in

ACM Books

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)6
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)LazyLog: A New Shared Log Abstraction for Low-Latency ApplicationsProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695983(296-312)Online publication date: 4-Nov-2024
  • (2024)CCBPS: A Hardware-based Data Loss Prevention Approach2024 IEEE 35th International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW63542.2024.00043(31-36)Online publication date: 28-Oct-2024
  • (2024)MTable: Visual Query Interface for Browsing and Navigation in NoSQL Data StoresJournal of Computer Languages10.1016/j.cola.2024.101312(101312)Online publication date: Dec-2024
  • (2024)Implement a Storage Engine to Improve Query Performance Based on a Computational Storage DeviceAdvances in Computer Science and Ubiquitous Computing10.1007/978-981-97-2447-5_59(377-382)Online publication date: 29-Sep-2024
  • (2023)Transforming RTLS Data Architecture to Big Data Architecture for Location Data2023 Innovations in Intelligent Systems and Applications Conference (ASYU)10.1109/ASYU58738.2023.10296768(1-6)Online publication date: 11-Oct-2023
  • (2022)Blockchain-Based Architecture Design for Personal Health Record: Development and Usability StudyJournal of Medical Internet Research10.2196/3501324:4(e35013)Online publication date: 13-Apr-2022
  • (2022)Indexing Metric Spaces for Exact Similarity SearchACM Computing Surveys10.1145/353496355:6(1-39)Online publication date: 7-Dec-2022
  • (2022)Karst: Transactional Data Ingestion Without Blocking on a Scalable ArchitectureIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.301151034:5(2241-2253)Online publication date: 1-May-2022
  • (2022)A Performance Study of Epoch-based Commit Protocols in Distributed OLTP Databases2022 41st International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS55811.2022.00026(189-200)Online publication date: Sep-2022
  • (2022)A quorum-based data consistency approach for non-relational databaseCluster Computing10.1007/s10586-021-03531-w25:2(1515-1540)Online publication date: 22-Jan-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media