User-Guided Large Attributed Graph Clustering with Multiple Sparse Annotations

Cao, Jianping; Wang, Senzhang; Qiao, Fengcai; Wang, Hui; Wang, Feiyue; Yu, Philip S.

doi:10.1007/978-3-319-31753-3_11

Jianping Cao¹⁹,
Senzhang Wang²⁰,
Fengcai Qiao¹⁹,
Hui Wang¹⁹,
Feiyue Wang¹⁹ &
…
Philip S. Yu²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9651))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2645 Accesses
3 Citations

Abstract

One of the key challenges in large attributed graph clustering is how to select representative attributes. Previous studies introduce user-guided clustering methods by letting a user select samples based on his/her knowledge. However, due to knowledge limitation, a single user may only pick out the samples that s/he is familiar with while ignore the others, such that the selected samples are often biased. We propose a framework to address this issue which allows multiple individuals to select samples for a specific clustering. With wider knowledge coming from multiple users, the selected samples can be more relevant to the target cluster. The challenges of this study are two-folds. Firstly, as user selected samples are usually sparse and the graph can be large, it is non-trivial to effectively combine the different annotations given by the multiple users. Secondly, it is also difficult to design a scalable approach to cluster large graphs with millions of nodes. We propose the approach CGMA (Clustering Graphs with Multiple Annotations) to address these challenges. CGMA is able to combine the crowd’s consensus opinions in an unbiased way, and conducts an effective clustering with low time complexity. We show the effectiveness and efficiency of the proposed approach on real-world graphs, by comparing with existing attributed graph clustering approaches.

J. Cao—This work is supported by NSFC grants: 71331008, 61105124 and 61303017.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tang, J., Liu, H.: Unsupervised feature selection for linked social media data. In: SIGKDD (2012)
Google Scholar
Akoglu, L., Tong, H., Meeder, B., Faloutsos, C.: PICS: parameter-free identification of cohesive subgroups in large attributed graphs. In: SDM (2012)
Google Scholar
Xing, E.P., Jordan, M.I., Russell, S., et al.: Distance metric learning with application to clustering with side-information. Adv. Neural Inf. Process. Syst. 15, 505–512 (2002)
Google Scholar
Yin, X., Han, J., Yu, P.S.: Cross-relational clustering with user’s guidance. In: SIGKDD (2005)
Google Scholar
Yin, X., Han, J., Yu, P.S.: CrossClus: user-guided multi-relational clustering. In: SIGKDD (2007)
Google Scholar
Sun, Y., Norick, B., Han, J., et al.: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: SIGKDD (2012)
Google Scholar
Perozzi, B., Akoglu, L., Iglesias Snchez, P., et al.: Focused clustering and outlier detection in large attributed graphs. In: SIGKDD (2014)
Google Scholar
Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: ICML (2002)
Google Scholar
Sánchez, P.I., Muller, E., Laforet, F., et al.: Statistical selection of congruent subspaces for mining attributed graphs. In: ICDM (2013)
Google Scholar
Chapelle, O., Schölkopf, B., Zien, A., et al.: Semi-Supervised Learning. MIT Press, Cambridge (2006)
Book Google Scholar
Zhou, Y., Cheng, H., Yu, J.X., Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural, attribute similarities. J. VLDB 2(1), 718–729 (2009)
Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
MATH Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: SIGKDD (2003)
Google Scholar
Ng, A.Y., Jordan, M.I., et al.: On spectral clustering: analysis and an algorithm. In: NIPS (2002)
Google Scholar
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014)
Article Google Scholar
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: SIGKDD (2000)
Google Scholar
Wang, S., Li, Z., Chao, W.-H., Cao, Q.: Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning. In: IJCNN (2012)
Google Scholar
Zhou, D., Liu, Q., Platt, J.C., Meek, C.: Aggregating ordinal labels from crowds by minimax conditional entropy. In: ICML (2014)
Google Scholar
Andersen, R., Chung, F., Lang, K.: Local graph partitioning using pagerank vectors. In: IEEE SFCS (2006)
Google Scholar
Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: WSDM (2013)
Google Scholar
Tong, H., Lin, C.-Y.: Non-negative residual matrix factorization with application to graph anomaly detection. In: SDM (2011)
Google Scholar
Gleich, D.F., Seshadhri, C.: Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: SIGKDD (2012)
Google Scholar
Wang, S., Xie, S., Zhang, X., Li, Z., Philip, S.Y., Xinyu, S.: Future influence ranking of scientific literature. In: SDM (2014)
Google Scholar
Ruvolo, P., Whitehill, J., Movellan, J.R.: Exploiting commonality and interaction effects in crowdsourcing tasks using latent factor models. In: NIPS (2013)
Google Scholar
Zhou, D., Basu, S., Mao, Y., Platt, J.C.: Learning from the wisdom of crowds by minimax entropy. In: NIPS (2012)
Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: ACM/IEEE Conference on Supercomputing (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Systems and Management, National University of Defense Technology, Changsha, Hunan, China
Jianping Cao, Fengcai Qiao, Hui Wang & Feiyue Wang
State Key Laboratory of Software Development Environment, Beihang University, Beijing, China
Senzhang Wang
Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA
Philip S. Yu

Authors

Jianping Cao
View author publications
You can also search for this author in PubMed Google Scholar
Senzhang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fengcai Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Feiyue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Senzhang Wang .

Editor information

Editors and Affiliations

The University of Melbourne, Melbourne, Victoria, Australia
James Bailey
The University of Texas at Dallas, Richardson, Texas, USA
Latifur Khan
Osaka University, Osaka, Japan
Takashi Washio
University of Auckland, Auckland, New Zealand
Gill Dobbie
Shenzhen University, Shenzhen, China
Joshua Zhexue Huang
Massey University, Auckland, New Zealand
Ruili Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cao, J., Wang, S., Qiao, F., Wang, H., Wang, F., Yu, P.S. (2016). User-Guided Large Attributed Graph Clustering with Multiple Sparse Annotations. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9651. Springer, Cham. https://doi.org/10.1007/978-3-319-31753-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-31753-3_11
Published: 12 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31752-6
Online ISBN: 978-3-319-31753-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics