Exploring the Power of Heuristics and Links in Multi-relational Data Mining

Yin, Xiaoxin; Han, Jiawei

doi:10.1007/978-3-540-68123-6_2

Xiaoxin Yin¹ &
Jiawei Han²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4994))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

1056 Accesses
1 Citations

Abstract

Relational databases are the most popular repository for structured data, and are thus one of the richest sources of knowledge in the world. Because of the complexity of relational data, it is a challenging task to design efficient and scalable data mining approaches in relational databases. In this paper we discuss two methodologies to address this issue. The first methodology is to use heuristics to guide the data mining procedure, in order to avoid aimless, exhaustive search in relational databases. The second methodology is to assign certain property to each object in the database, and let different objects interact with each other along the links. Experiments show that both approaches achieve high efficiency and accuracy in real applications.

The work was supported in part by the U.S. National Science Foundation NSF IIS-05-13678 and NSF BDI-05-15813. Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of the funding agencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast Algorithms for Projected Clustering. In: Proc. 1999 ACM SIGMOD Int’l. Conf. on Management of Data (SIGMOD 1999), Philadelphia, Pennsylvania (June 1999)
Google Scholar
Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of logical decision trees. In: Proc. Fifteenth Int’l. Conf. on Machine Learning (ICML 1998), Madison, WI (July 1998)
Google Scholar
Dzeroski, S.: Inductive logic programming and knowledge discovery in databases. In: Advances in Knowledge Discovery and Data Mining, AAAI Press, Menlo Park (1996)
Google Scholar
Dzeroski, S.: Multi-relational data mining: an introduction. ACM SIGKDD Explorations Newsletter 5(1), 1–16 (2003)
Article Google Scholar
Fogaras, D., Rácz, B.: Scaling link-base similarity search. In: Proc. 14th Int’l. Conf. World Wide Web, China, Japan (May 2005)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31, 264–323 (1999)
Article Google Scholar
Jeh, G., Widom, J.: SimRank: A measure of structural-context similarity. In: Proc. Eighth Int’l. Conf. on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Canada (July 2002)
Google Scholar
Kirsten, M., Wrobel, S.: Relational Distance-Based Clustering. In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, Springer, Heidelberg (1998)
Chapter Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Article MATH MathSciNet Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
Google Scholar
Quinlan, J.R., Cameron-Jones, R.M.: FOIL: A midterm report. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, Springer, Heidelberg (1993)
Google Scholar
Wang, J.D., Zeng, H.J., Chen, Z., Lu, H.J., Tao, L., Ma, W.Y.: ReCoM: Reinforcement clustering of multi-type interrelated data objects. In: Proc. 26th Int’l. Conf. on Research and Development in Information Retrieval, Toronto, Canada (July 2003)
Google Scholar
Yin, X., Han, J., Yu, P.S.: LinkClus: Efficient Clustering via Heterogeneous Semantic Links. In: Proc. 32nd Int’l. Conf. on Very Large Data Bases (VLDB 2006), Seoul, Korea (September 2006)
Google Scholar
Yin, X., Han, J., Yu, P.S.: CrossClus: User-guided multi-relational clustering. Data Mining and Knowledge Discovery 15(3), 321–348 (2007)
Article MATH Google Scholar
Yin, X., Han, J., Yu, P.S.: Truth Discovery with Multiple Conflicting Information Providers on the Web. In: Proc. 13th Intl. Conf. on Knowledge Discovery and Data Mining, San Jose, CA (August 2007)
Google Scholar
Yin, X., Han, J., Yang, J., Yu, P.S.: CrossMine: Efficient Classification Across Multiple Database Relations. In: Proc. 20th Int’l. Conf. on Data Engineering (ICDE 2004), Boston, Massachusetts (March 2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, One Microsoft Way, Redmond, WA, 98052, USA
Xiaoxin Yin
University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
Jiawei Han

Authors

Xiaoxin Yin
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Aijun An Stan Matwin Zbigniew W. Raś Dominik Ślęzak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, X., Han, J. (2008). Exploring the Power of Heuristics and Links in Multi-relational Data Mining. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds) Foundations of Intelligent Systems. ISMIS 2008. Lecture Notes in Computer Science(), vol 4994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68123-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-68123-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68122-9
Online ISBN: 978-3-540-68123-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics