Skip to main content

User-Guided Large Attributed Graph Clustering with Multiple Sparse Annotations

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9651))

Included in the following conference series:

Abstract

One of the key challenges in large attributed graph clustering is how to select representative attributes. Previous studies introduce user-guided clustering methods by letting a user select samples based on his/her knowledge. However, due to knowledge limitation, a single user may only pick out the samples that s/he is familiar with while ignore the others, such that the selected samples are often biased. We propose a framework to address this issue which allows multiple individuals to select samples for a specific clustering. With wider knowledge coming from multiple users, the selected samples can be more relevant to the target cluster. The challenges of this study are two-folds. Firstly, as user selected samples are usually sparse and the graph can be large, it is non-trivial to effectively combine the different annotations given by the multiple users. Secondly, it is also difficult to design a scalable approach to cluster large graphs with millions of nodes. We propose the approach CGMA (Clustering Graphs with Multiple Annotations) to address these challenges. CGMA is able to combine the crowd’s consensus opinions in an unbiased way, and conducts an effective clustering with low time complexity. We show the effectiveness and efficiency of the proposed approach on real-world graphs, by comparing with existing attributed graph clustering approaches.

J. Cao—This work is supported by NSFC grants: 71331008, 61105124 and 61303017.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tang, J., Liu, H.: Unsupervised feature selection for linked social media data. In: SIGKDD (2012)

    Google Scholar 

  2. Akoglu, L., Tong, H., Meeder, B., Faloutsos, C.: PICS: parameter-free identification of cohesive subgroups in large attributed graphs. In: SDM (2012)

    Google Scholar 

  3. Xing, E.P., Jordan, M.I., Russell, S., et al.: Distance metric learning with application to clustering with side-information. Adv. Neural Inf. Process. Syst. 15, 505–512 (2002)

    Google Scholar 

  4. Yin, X., Han, J., Yu, P.S.: Cross-relational clustering with user’s guidance. In: SIGKDD (2005)

    Google Scholar 

  5. Yin, X., Han, J., Yu, P.S.: CrossClus: user-guided multi-relational clustering. In: SIGKDD (2007)

    Google Scholar 

  6. Sun, Y., Norick, B., Han, J., et al.: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: SIGKDD (2012)

    Google Scholar 

  7. Perozzi, B., Akoglu, L., Iglesias Snchez, P., et al.: Focused clustering and outlier detection in large attributed graphs. In: SIGKDD (2014)

    Google Scholar 

  8. Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: ICML (2002)

    Google Scholar 

  9. Sánchez, P.I., Muller, E., Laforet, F., et al.: Statistical selection of congruent subspaces for mining attributed graphs. In: ICDM (2013)

    Google Scholar 

  10. Chapelle, O., Schölkopf, B., Zien, A., et al.: Semi-Supervised Learning. MIT Press, Cambridge (2006)

    Book  Google Scholar 

  11. Zhou, Y., Cheng, H., Yu, J.X., Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural, attribute similarities. J. VLDB 2(1), 718–729 (2009)

    Google Scholar 

  12. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)

    MATH  Google Scholar 

  13. Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: SIGKDD (2003)

    Google Scholar 

  14. Ng, A.Y., Jordan, M.I., et al.: On spectral clustering: analysis and an algorithm. In: NIPS (2002)

    Google Scholar 

  15. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014)

    Article  Google Scholar 

  16. Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: SIGKDD (2000)

    Google Scholar 

  17. Wang, S., Li, Z., Chao, W.-H., Cao, Q.: Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning. In: IJCNN (2012)

    Google Scholar 

  18. Zhou, D., Liu, Q., Platt, J.C., Meek, C.: Aggregating ordinal labels from crowds by minimax conditional entropy. In: ICML (2014)

    Google Scholar 

  19. Andersen, R., Chung, F., Lang, K.: Local graph partitioning using pagerank vectors. In: IEEE SFCS (2006)

    Google Scholar 

  20. Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: WSDM (2013)

    Google Scholar 

  21. Tong, H., Lin, C.-Y.: Non-negative residual matrix factorization with application to graph anomaly detection. In: SDM (2011)

    Google Scholar 

  22. Gleich, D.F., Seshadhri, C.: Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: SIGKDD (2012)

    Google Scholar 

  23. Wang, S., Xie, S., Zhang, X., Li, Z., Philip, S.Y., Xinyu, S.: Future influence ranking of scientific literature. In: SDM (2014)

    Google Scholar 

  24. Ruvolo, P., Whitehill, J., Movellan, J.R.: Exploiting commonality and interaction effects in crowdsourcing tasks using latent factor models. In: NIPS (2013)

    Google Scholar 

  25. Zhou, D., Basu, S., Mao, Y., Platt, J.C.: Learning from the wisdom of crowds by minimax entropy. In: NIPS (2012)

    Google Scholar 

  26. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  27. Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: ACM/IEEE Conference on Supercomputing (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Senzhang Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Cao, J., Wang, S., Qiao, F., Wang, H., Wang, F., Yu, P.S. (2016). User-Guided Large Attributed Graph Clustering with Multiple Sparse Annotations. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9651. Springer, Cham. https://doi.org/10.1007/978-3-319-31753-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31753-3_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31752-6

  • Online ISBN: 978-3-319-31753-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics