Identifying relevant databases for multidatabase mining

Liu, Huan; Lu, Hongjun; Yao, Jun

doi:10.1007/3-540-64383-4_18

Huan Liu⁹,
Hongjun Lu⁹ &
Jun Yao⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1394))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1671 Accesses
25 Citations

Abstract

Various tools and systems for knowledge discovery and data mining are developed and available for applications. However, when we are immersed in heaps of databases, an immediate question facing practitioners is where we should start mining. In this paper, breaking away from the conventional data mining assumption that many databases be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most likely relevant to an application; without doing so, the mining process can be lengthy, aimless and ineffective. A relevance measure is thus proposed to identify relevant databases for mining tasks with an objective to find patterns or regularities about certain attributes. An efficient implementation for identifying relevant databases is described. Experiments are conducted to validate the measure's performance and to show its promising applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, T. Imielinski, and A. Swami. Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Engineering, 5(6):914–925, Dec 1993.
Article Google Scholar
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression. Wadsworth & Brooks/Cole Advanced & Books Software, 1984.
Google Scholar
P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3:261–283, 1989.
Google Scholar
U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 1–34. AAAI Press / The MIT Press, 1996.
Google Scholar
J. Han and Y. Fu. Attribute-oriented induction in data mining. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 399–421. AAAI Press / The MIT Press, 1996.
Google Scholar
J. Hong and C. Mao. Incremental discovery of rules and structure by hierarchical and parellel clustering. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases, pages 177–194. AAAI / The MIT Press, 1991.
Google Scholar
M. Kamber and R. Shinghal. Evaluating the interestingness of characteristic rules. In Proceedings of the Second International Conference on Data Mining (KDD-96), pages 263–266. AAAI Press, 1996.
Google Scholar
B. Liu and W. Hsu. Post-analysis of learned rules. In Proceedings of the Thirteenth National Conference on Artificial Intelligence AAAI-96, pages 828–834. AAAI press/ The MIT press, August 1996.
Google Scholar
J. A. Major and J. Mangano. Selecting among rules induced from a hurricane database. In G. Piatetsky-Shapiro, editor, Proceedings of AAAI-93 workshop on Knowledge Discovery in Databases, pages 28–44, 1993.
Google Scholar
C. J. Matheus, G. Piatetsky-Shapiro, and D. McNeill. Selecting and reporting what is interesting. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 495–514. AAAI Press / The MIT Press, 1996.
Google Scholar
R. S. Michalski, I. Mozetic, J. Hong, and N. Lavrac. The multi-purpose incremental learning system aq15 and its testing application to three medical domains. In Proceedings of the Fifth National Conference on Artificial Intelligence, pages 1041–1045, 1986.
Google Scholar
G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and W. J. Frawley, editors, Knowledge Discovery in Databases, pages 229–248. AAAI / The MIT Press, 1991.
Google Scholar
G. Piatetsky-Shapiro, C. Matheus, P. Smyth, and R. Uthurusamy. KDD93: progress and challenges. In AI magazine, pages 77–87, Fall 1994.
Google Scholar
G. Piatetsky-Shapiro and C. J. Matheus. The interestingness of deviations. In AAAI-84 Workshop on Knowledge Discovery in Databases, pages 25–36, 1994.
Google Scholar
J. R. Quinlan. C4.5: Program for machine learning. Morgan Kaufmann, 1993.
Google Scholar
A. Silberschatz and A. Tuzhilin.On subjective measures of interestingness in knowledge discovery. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 275–281, 1995.
Google Scholar
R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In Proceedings of the ACM SIGMOD Conference on Management of Data, 1996.
Google Scholar
R. Zembowicz and J. M. Zytkow. From contigency tables to various forms of knowledge in databases. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 329–349. AAAI Press / The MIT Press, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems and Computer Science, National University of Singapore, 119260, Kent Ridge, Singapore
Huan Liu, Hongjun Lu & Jun Yao

Authors

Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hongjun Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Software Engineering, Monash university, 900 Dandenong Road, Caulfield East, Victoria, 3145, Australia
Xindong Wu
Department of Computer Science, The University of Melbourne, Parkville, Victoria, 3052, Australia
Ramamohanarao Kotagiri
School of Computer Science and Engineering, Monash university, Clayton, Victoria, 3168, Australia
Kevin B. Korb

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, H., Lu, H., Yao, J. (1998). Identifying relevant databases for multidatabase mining. In: Wu, X., Kotagiri, R., Korb, K.B. (eds) Research and Development in Knowledge Discovery and Data Mining. PAKDD 1998. Lecture Notes in Computer Science, vol 1394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-64383-4_18

Download citation

DOI: https://doi.org/10.1007/3-540-64383-4_18
Published: 25 August 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64383-8
Online ISBN: 978-3-540-69768-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics