tutorial

Database systems research on data mining

Authors:
Carlos Ordonez

University of Houston, Houston, TX, USA

University of Houston, Houston, TX, USA
View Profile

,
Javier García-García

Universidad Nacional Autónoma de México, Mexico City, Mexico

Universidad Nacional Autónoma de México, Mexico City, Mexico
View Profile

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataJune 2010Pages 1253–1254https://doi.org/10.1145/1807167.1807335

Published:06 June 2010Publication History

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Pages 1253–1254

ABSTRACT

Data mining remains an important research area in database systems. We present a review of processing alternatives, storage mechanisms, algorithms, data structures and optimizations that enable data mining on large data sets. We focus on the computation of well-known multidimensional statistical and machine learning models. We pay particular attention to SQL and MapReduce as two competing technologies for large scale processing. We conclude with a summary of solved major problems and open research issues.

References

J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2006. Google ScholarDigital Library
C. Ordonez. Integrating K-means clustering with a relational DBMS using SQL. IEEE Transactions on Knowledge and Data Engineering (TKDE), 18(2):188--201, 2006. Google ScholarDigital Library
C. Ordonez. Statistical model computation with UDFs. IEEE Transactions on Knowledge and Data Engineering (TKDE), 22, 2010. Google ScholarDigital Library
M. Stonebraker, D. Abadi, D.J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin. MapReduce and parallel DBMSs: friends or foes? Commun. ACM, 53(1):64--71, 2010. Google ScholarDigital Library

Index Terms

Database systems research on data mining
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

One-pass data mining algorithms in a DBMS with UDFs
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Data mining research is extensive, but most work has proposed efficient algorithms, data structures and optimizations that work outside a DBMS, mostly on flat files. In contrast, we present a data mining system that can work on top of a relational DBMS ...
Read More
Building statistical models and scoring with UDFs
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Multidimensional statistical models are generally computed outside a relational DBMS, exporting data sets. This article explains how fundamental multidimensional statistical models are computed inside the DBMS in a single table scan exploiting SQL and ...
Read More
Comparing SQL and MapReduce to compute Naive Bayes in a single table scan
CloudDB '10: Proceedings of the second international workshop on Cloud data management

Most data mining processing is currently performed on flat files outside the DBMS. We propose novel techniques to process such data mining computations inside the DBMS. We focus on the popular Naive Bayes classification algorithm. In contrast to most ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
June 2010
1286 pages
ISBN:9781450300322
DOI:10.1145/1807167
General Chair:
Ahmed Elmagarmid
Purdue University, USA
,
Program Chair:
Divyakant Agrawal
University of California at Santa Barbara, USA
Copyright © 2010 Copyright is held by the owner/author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2010
Check for updates
Author Tags
dbms
mapreduce
sql
statistical model
udf
Qualifiers
- tutorial
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 720
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Database systems research on data mining

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

One-pass data mining algorithms in a DBMS with UDFs

Building statistical models and scoring with UDFs

Comparing SQL and MapReduce to compute Naive Bayes in a single table scan