Machine Learning Meets Databases

Günnemann, Stephan

doi:10.1007/s13222-017-0247-8

Machine Learning Meets Databases

Kurz erklärt
Published: 31 January 2017

Volume 17, pages 77–83, (2017)
Cite this article

Datenbank-Spektrum Aims and scope Submit manuscript

Stephan Günnemann ORCID: orcid.org/0000-0001-7772-5059¹

1379 Accesses
5 Citations
Explore all metrics

Abstract

Machine Learning has become highly popular due to several success stories in data-driven applications. Prominent examples include object detection in images, speech recognition, and text translation. According to Gartner’s 2016 Hype Cycle for Emerging Technologies, Machine Learning is currently at its peak of inflated expectations, with several other application domains trying to exploit the use of Machine Learning technology. Since data-driven applications are a fundamental cornerstone of the database community as well, it becomes natural to ask how these fields relate to each other. In this article, we will therefore provide a brief introduction to the field of Machine Learning, we will discuss its interplay with other fields such as Data Mining and Databases, and we provide an overview of recent data management systems integrating Machine Learning functionality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Not every problem at hand needs to be tackled by Machine Learning. For example, the detection of people’s resumes on the Web via Machine Learning has not shown to be advantageous over the manual design of an algorithm to discover resumes [15]: “Since everyone who has looked at or written a resume has a pretty good idea of what resumes contain, there was no mystery about what makes a Web page a resume.”
Often these concepts are not well separated. The most prominent example is the k‑means clustering model, where the default algorithm to solve it (Lloyd’s algorithm) is itself often called k‑means.
Usually the whole process is called Knowledge Discovery, while phase 4 is referred to as Data Mining.

References

Aref M, ten Cate B, Green TJ, Kimelfeld B, Olteanu D, Pasalic E, Veldhuizen TL, Washburn G (2015) Design and implementation of the logicblox system. In: SIGMOD, pp 1371–1382
Google Scholar
Bishop CM (2006) Pattern Recognition and Machine Learning. Springer, New York
MATH Google Scholar
Böhm M, Burdick DR, Evfimievski AV, Reinwald B, Reiss FR, Sen P, Tatikonda S (2014) and Y. Tian. Systemml’s optimizer: Plan generation for large-scale machine learning programs. IEEE Data Eng Bull 37(3):52–62
Google Scholar
Cai Z, Vagena Z, Perez LL, Arumugam S, Haas PJ, Jermaine CM (2013) Simulation of database-valued markov chains using simsql. In: SIGMOD, pp 637–648
Google Scholar
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache flink™: Stream and batch processing in a single engine. IEEE Data Eng Bull 38(4):28–38
Google Scholar
Chaudhuri S, Narasayya VR (2007) Self-tuning database systems: A decade of progress. In: VLDB, pp 3–14
Google Scholar
Das S, Li F, Narasayya VR, König AC (2016) Automated demand-driven resource scaling in relational database-as-a-service. In: SIGMOD, pp 1923–1934
Google Scholar
Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV et al (2012) Large scale distributed deep networks. In: NIPS, pp 1223–1231
Google Scholar
Elnaffar S, Martin TP, Horman R (2002) Automatically classifying database workloads. In: CIKM, pp 622–624
Google Scholar
Ganapathi A, Kuno HA, Dayal U, Wiener JL, Fox A, Jordan MI, Patterson DA (2009) Predicting multiple metrics for queries: Better decisions enabled by machine learning. In: ICDE, pp 592–603
Google Scholar
Hellerstein JM, Ré C, Schoppmann F, Wang DZ, Fratkin E, Gorajek A, Ng KS, Welton C, Feng X, Li K, Kumar A (2012) The madlib analytics library or MAD skills, the SQL. PVLDB 5(12):1700–1711
Google Scholar
Holze M, Ritter N (2008) Autonomic databases: Detection of workload shifts with n‑gram-models. In: ADBIS, pp 127–142
Google Scholar
Kraska T, Talwalkar A, Duchi JC, Griffith R, Franklin MJ, Jordan MI (2013) Mlbase: A distributed machine-learning system. In: CIDR
Google Scholar
Kunft A, Alexandrov A, Katsifodimos A, Markl V (2016) Bridging the gap: towards optimization across linear and relational algebra. In: Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR@SIGMOD, pp 1–4
Chapter Google Scholar
Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets. Cambridge University Press, Cambridge
Book Google Scholar
Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su B-Y (2014) Scaling distributed machine learning with the parameter server. In: OSDI, pp 583–598
Google Scholar
Mozafari B, Curino C, Jindal A, Madden S (2013) Performance and resource modeling in highly-concurrent OLTP workloads. In: SIGMOD, pp 301–312
Google Scholar
Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press, Cambridge
MATH Google Scholar
Passing L, Then M, Hubig N, Lang H, Schreier M, Günnemann S, Kemper A, Neumann T (2017) Sql- and operator-centric data analytics in relational main-memory databases. In: EDBT
Google Scholar
Pavlo A et al (2017) Self-driving database management systems. In: CIDR
Google Scholar
Recht B, Re C, Wright S, Niu F (2011) Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: NIPS, pp 693–701
Google Scholar
Roy N, Dubey A, Gokhale AS (2011) Efficient autoscaling in the cloud using predictive models for workload forecasting. In: CLOUD, pp 500–507
Google Scholar
Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):210–229
Article MathSciNet Google Scholar
Sapia C (2000) PROMISE: predicting query behavior to enable predictive caching strategies for OLAP systems. In: DaWaK, pp 224–233
Google Scholar
Schelter S, Palumbo A, Quinn S, Marthi S, Musselman A (2016) Samsara: Declarative machine learning on distributed dataflow systems. In: Machine Learning Systems workshop at NIPS
Google Scholar
Shearer C (2000) The crisp-dm model: the new blueprint for data mining. J Data Warehous 5(4):13–22
Google Scholar
Tamayo P et al (2005) Oracle data mining – data mining in the database environment. In: The Data Mining and Knowledge Discovery Handbook, pp 1315–1329
Chapter Google Scholar
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: Cluster computing with working sets. In: HotCloud, pp 1–7
Google Scholar

Download references

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Stephan Günnemann

Authors

Stephan Günnemann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephan Günnemann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Günnemann, S. Machine Learning Meets Databases. Datenbank Spektrum 17, 77–83 (2017). https://doi.org/10.1007/s13222-017-0247-8

Download citation

Published: 31 January 2017
Issue Date: March 2017
DOI: https://doi.org/10.1007/s13222-017-0247-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning Meets Databases

Abstract

Access this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

Machine learning and deep learning

A survey on large language model based autonomous agents

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Machine Learning Meets Databases

Abstract

Access this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

Machine learning and deep learning

A survey on large language model based autonomous agents

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation