Skip to main content
Log in

Machine Learning Meets Databases

  • Kurz erklärt
  • Published:
Datenbank-Spektrum Aims and scope Submit manuscript

Abstract

Machine Learning has become highly popular due to several success stories in data-driven applications. Prominent examples include object detection in images, speech recognition, and text translation. According to Gartner’s 2016 Hype Cycle for Emerging Technologies, Machine Learning is currently at its peak of inflated expectations, with several other application domains trying to exploit the use of Machine Learning technology. Since data-driven applications are a fundamental cornerstone of the database community as well, it becomes natural to ask how these fields relate to each other. In this article, we will therefore provide a brief introduction to the field of Machine Learning, we will discuss its interplay with other fields such as Data Mining and Databases, and we provide an overview of recent data management systems integrating Machine Learning functionality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Not every problem at hand needs to be tackled by Machine Learning. For example, the detection of people’s resumes on the Web via Machine Learning has not shown to be advantageous over the manual design of an algorithm to discover resumes [15]: “Since everyone who has looked at or written a resume has a pretty good idea of what resumes contain, there was no mystery about what makes a Web page a resume.”

  2. Often these concepts are not well separated. The most prominent example is the k‑means clustering model, where the default algorithm to solve it (Lloyd’s algorithm) is itself often called k‑means.

  3. Usually the whole process is called Knowledge Discovery, while phase 4 is referred to as Data Mining.

References

  1. Aref M, ten Cate B, Green TJ, Kimelfeld B, Olteanu D, Pasalic E, Veldhuizen TL, Washburn G (2015) Design and implementation of the logicblox system. In: SIGMOD, pp 1371–1382

    Google Scholar 

  2. Bishop CM (2006) Pattern Recognition and Machine Learning. Springer, New York

    MATH  Google Scholar 

  3. Böhm M, Burdick DR, Evfimievski AV, Reinwald B, Reiss FR, Sen P, Tatikonda S (2014) and Y. Tian. Systemml’s optimizer: Plan generation for large-scale machine learning programs. IEEE Data Eng Bull 37(3):52–62

    Google Scholar 

  4. Cai Z, Vagena Z, Perez LL, Arumugam S, Haas PJ, Jermaine CM (2013) Simulation of database-valued markov chains using simsql. In: SIGMOD, pp 637–648

    Google Scholar 

  5. Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache flink™: Stream and batch processing in a single engine. IEEE Data Eng Bull 38(4):28–38

    Google Scholar 

  6. Chaudhuri S, Narasayya VR (2007) Self-tuning database systems: A decade of progress. In: VLDB, pp 3–14

    Google Scholar 

  7. Das S, Li F, Narasayya VR, König AC (2016) Automated demand-driven resource scaling in relational database-as-a-service. In: SIGMOD, pp 1923–1934

    Google Scholar 

  8. Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV et al (2012) Large scale distributed deep networks. In: NIPS, pp 1223–1231

    Google Scholar 

  9. Elnaffar S, Martin TP, Horman R (2002) Automatically classifying database workloads. In: CIKM, pp 622–624

    Google Scholar 

  10. Ganapathi A, Kuno HA, Dayal U, Wiener JL, Fox A, Jordan MI, Patterson DA (2009) Predicting multiple metrics for queries: Better decisions enabled by machine learning. In: ICDE, pp 592–603

    Google Scholar 

  11. Hellerstein JM, Ré C, Schoppmann F, Wang DZ, Fratkin E, Gorajek A, Ng KS, Welton C, Feng X, Li K, Kumar A (2012) The madlib analytics library or MAD skills, the SQL. PVLDB 5(12):1700–1711

    Google Scholar 

  12. Holze M, Ritter N (2008) Autonomic databases: Detection of workload shifts with n‑gram-models. In: ADBIS, pp 127–142

    Google Scholar 

  13. Kraska T, Talwalkar A, Duchi JC, Griffith R, Franklin MJ, Jordan MI (2013) Mlbase: A distributed machine-learning system. In: CIDR

    Google Scholar 

  14. Kunft A, Alexandrov A, Katsifodimos A, Markl V (2016) Bridging the gap: towards optimization across linear and relational algebra. In: Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR@SIGMOD, pp 1–4

    Chapter  Google Scholar 

  15. Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets. Cambridge University Press, Cambridge

    Book  Google Scholar 

  16. Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su B-Y (2014) Scaling distributed machine learning with the parameter server. In: OSDI, pp 583–598

    Google Scholar 

  17. Mozafari B, Curino C, Jindal A, Madden S (2013) Performance and resource modeling in highly-concurrent OLTP workloads. In: SIGMOD, pp 301–312

    Google Scholar 

  18. Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press, Cambridge

    MATH  Google Scholar 

  19. Passing L, Then M, Hubig N, Lang H, Schreier M, Günnemann S, Kemper A, Neumann T (2017) Sql- and operator-centric data analytics in relational main-memory databases. In: EDBT

    Google Scholar 

  20. Pavlo A et al (2017) Self-driving database management systems. In: CIDR

    Google Scholar 

  21. Recht B, Re C, Wright S, Niu F (2011) Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: NIPS, pp 693–701

    Google Scholar 

  22. Roy N, Dubey A, Gokhale AS (2011) Efficient autoscaling in the cloud using predictive models for workload forecasting. In: CLOUD, pp 500–507

    Google Scholar 

  23. Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):210–229

    Article  MathSciNet  Google Scholar 

  24. Sapia C (2000) PROMISE: predicting query behavior to enable predictive caching strategies for OLAP systems. In: DaWaK, pp 224–233

    Google Scholar 

  25. Schelter S, Palumbo A, Quinn S, Marthi S, Musselman A (2016) Samsara: Declarative machine learning on distributed dataflow systems. In: Machine Learning Systems workshop at NIPS

    Google Scholar 

  26. Shearer C (2000) The crisp-dm model: the new blueprint for data mining. J Data Warehous 5(4):13–22

    Google Scholar 

  27. Tamayo P et al (2005) Oracle data mining – data mining in the database environment. In: The Data Mining and Knowledge Discovery Handbook, pp 1315–1329

    Chapter  Google Scholar 

  28. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: Cluster computing with working sets. In: HotCloud, pp 1–7

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephan Günnemann.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Günnemann, S. Machine Learning Meets Databases. Datenbank Spektrum 17, 77–83 (2017). https://doi.org/10.1007/s13222-017-0247-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13222-017-0247-8

Keywords

Navigation