CrossClus: user-guided multi-relational clustering

Yin, Xiaoxin; Han, Jiawei; Yu, Philip S.

doi:10.1007/s10618-007-0072-z

CrossClus: user-guided multi-relational clustering

Published: 06 July 2007

Volume 15, pages 321–348, (2007)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Xiaoxin Yin¹,
Jiawei Han¹ &
Philip S. Yu²

277 Accesses
38 Citations
Explore all metrics

Abstract

Most structured data in real-life applications are stored in relational databases containing multiple semantically linked relations. Unlike clustering in a single table, when clustering objects in relational databases there are usually a large number of features conveying very different semantic information, and using all features indiscriminately is unlikely to generate meaningful results. Because the user knows her goal of clustering, we propose a new approach called CrossClus, which performs multi-relational clustering under user’s guidance. Unlike semi-supervised clustering which requires the user to provide a training set, we minimize the user’s effort by using a very simple form of user guidance. The user is only required to select one or a small set of features that are pertinent to the clustering goal, and CrossClus searches for other pertinent features in multiple relations. Each feature is evaluated by whether it clusters objects in a similar way with the user specified features. We design efficient and accurate approaches for both feature selection and object clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of CrossClus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two approaches for clustering algorithms with relational-based data

Article 23 July 2019

A clustering-based feature selection method for automatically generated relational attributes

Article 05 April 2018

An expressive dissimilarity measure for relational clustering using neighbourhood trees

Article 05 June 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aggarwal CC, Yu PS (2000) Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, Dallas, TX, pp 70–81
Aggarwal CC, Procopiuc C, Wolf JL, Yu PS, Park JS (1999) Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data, Philadelphia, PA, pp 61–72
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the 2004 international conference on machine learning, Alberta, Canada, pp 81–88
Blockeel H, Dehaspe L and Demoen B (2002). Improving the efficiency of inductive logic programming through the use of query packs. J Artif Intell Res 16: 135–166
MATH Google Scholar
Cheeseman P et al (1988) AutoClass: a Bayesian classfication system. In: Proceedings of the 1988 international conference on machine learning, Alberta, Ann Arbor, MI, pp 54–64
DBLP Bibliography. http://www.informatik.uni-trier.de/∼ley/db/
Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the 2003 IEEE computer society bioinformatics conference, Stanford, CA, pp 523–529
Dy JG and Brodley CE (2004). Feature selection for unsupervised learning. J Mach Learn Res 5: 845–889
MathSciNet Google Scholar
Emde W, Wettschereck D (1996) Relational instance-based learning. In: Proceedings of the 1996 international conference on machine learning, Bari, Italy, pp 122–130
Gärtner T, Lloyd JW and Flach PA (2004). Kernels and distances for structured data. Mach Learn 57: 205–232
Article MATH Google Scholar
Guyon I and Elisseeff A (2003). An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182
Article MATH Google Scholar
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 2000 international conference on machine learning, Stanford, CA, pp 359–366
Hristidis V, Papakonstantinou Y (2002) DISCOVER: keyword search in relational databases. In: Proceedings of the 2002 international conference on very large data bases, Hong Kong, China, pp 670–681
Jain AK, Murty MN and Flynn PJ (1999). Data clustering: a review. ACM Comput Surv 31: 264–323
Article Google Scholar
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. John Wiley and Sons
Klein D, Kamvar SD, Manning C (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the 2002 international conference on machine learning, Sydney, Australia, pp 307–314
Kirsten M, Wrobel S (1998) Relational distance-based clustering. In: Proceedings of the 1998 international Workshop on inductive logic programming, Madison, WI, pp 261–270
Kirsten M, Wrobel S (2000) Extending K-means clustering to first-order representations. In: Proceedings of the 2000 international workshop on inductive logic programming, London, UK, pp 112–129
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 1967 Berkeley symposium on mathematics, statistics and probability, Berkeley, CA, pp 281–298
Mitchell TM (1997) Machine learning. McGraw Hill
Mitra P, Murthy CA and Pal SK (2002). Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24: 301–312
Article Google Scholar
Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 1994 international conference on very large data bases, Santiago de Chile, Chile, pp 144–155
Quinlan JR, Cameron-Jones RM (1993) FOIL: a midterm report. In: Proceedings of the 1993 European conference on machine learning, Vienna, Austria, pp 3–20
Tan P-N, Steinbach M, Kumar W (2005) Introdution to data mining. Addison-Wesley
Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the 2001 international conference on machine learning, Williamstown, MA, pp 577–584
Xing EP, Ng AY, Jordan MI, Russell S (2002) Distance metric learning, with application to clustering with side-information. In: Proceedings of the 2002 neural information processing systems, Vancouver, Canada, pp 505–512
Yin X, Han J, Yang J, Yu PS (2004) CrossMine: efficient classification across multiple database relations. In: Proceedings of the 2004 international conference on data engineering, Boston, MA, pp 399–411
Yin X, Han J, Yu PS (2005) Cross-relational clustering with user’s guidance. In: Proceedings of the 2005 ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, IL, pp 344–353
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, Montreal, Canada, pp 103–114

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Xiaoxin Yin & Jiawei Han
IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Philip S. Yu

Authors

Xiaoxin Yin
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoxin Yin.

Additional information

Responsible editor: Eamonn Keogh.

The work was supported in part by the U.S. National Science Foundation NSF IIS-03-13678 and NSF BDI-05-15813, and an IBM Faculty Award. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect views of the funding agencies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yin, X., Han, J. & Yu, P.S. CrossClus: user-guided multi-relational clustering. Data Min Knowl Disc 15, 321–348 (2007). https://doi.org/10.1007/s10618-007-0072-z

Download citation

Received: 29 April 2006
Accepted: 29 March 2007
Published: 06 July 2007
Issue Date: December 2007
DOI: https://doi.org/10.1007/s10618-007-0072-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CrossClus: user-guided multi-relational clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Two approaches for clustering algorithms with relational-based data

A clustering-based feature selection method for automatically generated relational attributes

An expressive dissimilarity measure for relational clustering using neighbourhood trees

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

CrossClus: user-guided multi-relational clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Two approaches for clustering algorithms with relational-based data

A clustering-based feature selection method for automatically generated relational attributes

An expressive dissimilarity measure for relational clustering using neighbourhood trees

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation