Seamless Integration of Data Mining with DBMS and Applications

Lu, Hongjun

doi:10.1007/3-540-45357-1_3

Hongjun Lu⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1362 Accesses

Abstract

Data mining has been widely recognized as a powerful tool for exploring added value from data accumulated in the daily operations of an organization. A large number of data mining algorithms have been developed during the past decade. Those algorithms can be roughly divided into two groups. The first group of techniques, such as classification, clustering, prediction and deviation analysis, has been studied for a long time in machine learning, statistics, and other fields. The second group of techniques, such as association rule mining, mining in spatial-temporal databases and mining from the Web, addresses problems related to large amounts of data. Most classical algorithms in the first group assume that the data to be mined is somehow available in memory. Although initial effort in data mining has concentrated on making those algorithms scalable with respect to large volume of data, most of those scalable algorithms, even developed by database researchers, are still stand-alone. It is often assumed that data is available in desired forms, without considering the fact that most organizations store their data in databases managed by database management systems (DBMS). As such, most data mining algorithms can only be loosely coupled with data infrastructures in organizations and are difficult to infuse into existing mission-critical applications. Seamlessly integrating data mining techniques with database applications and database management systems remains an open problem.

In this paper, we propose to tackle the problem of seamless integration of data mining with DBMS and applications from three directions. First, with the recent development of database technology, most database management systems have extended their functionality in data analysis. Such capability should be fully explored to develop DBMS-awre data mining algorithms. Ideally, data mining algorithms can be fully implemented using DBMS supported functions so that they become database application themselves. Second, major difficulties in integrating data mining with applications are algorithm selection and parameter setting. Reducing or eliminating mining parameters as much as possible and developing automatic or semi-automatic mining algorithm selection techniques will greatly increase the application friendliness of data mining systems. Lastly, standardizing the interface among databases, data mining algorithms and applications can also facilitate the integration to certain extent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Data Mining in Databases: Languages and Indices

Enhancing Big Data Warehousing for Efficient, Integrated and Advanced Analytics

Dynamic Data Mart for Business Intelligence

Author information

Authors and Affiliations

The Hong Kong University of Science and Technology, Hong Kong China
Hongjun Lu

Authors

Hongjun Lu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Information Systems, The University of Hong Kong, Pokfulam, Hong Kong China
David Cheung
CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia
Graham J. Williams
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong China
Qing Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, H. (2001). Seamless Integration of Data Mining with DBMS and Applications. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_3

Download citation

DOI: https://doi.org/10.1007/3-540-45357-1_3
Published: 11 April 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics