skip to main content
research-article

Hyperincident connected components of tagging networks

Published:01 September 2009Publication History
Skip Abstract Section

Abstract

Data created by social bookmarking systems can be described as 3-partite 3-uniform hypergraphs connecting documents, users, and tags (tagging networks), such that the toolbox of complex network analysis can be applied to examine their properties. One of the most basic tools, the analysis of connected components, however cannot be applied meaningfully: Tagging networks tend to be almost entirely connected. We therefore propose a generalization of connected components, m-hyperincident connected components. We show that decomposing tagging networks into 2-hyperincident connected components yields a characteristic component distribution with a salient giant component that can be found across various datasets. This pattern changes if the underlying formation process changes, for example, if the hypergraph is constructed from search logs, or if the tagging data is contaminated by spam: It turns out that the second- to 129th largest components of the spam-labeled Bibsonomy dataset are inhabited exclusively by spam users. Based on these findings, we propose and unsupervised method for spam detection.

References

  1. F. Boesch. Synthesis of reliable networks, a survey. IEEE Trans. Reliab, 35:240--246, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  2. C. Cattuto, C. Schmitz, A. Baldassarri, V. D. P. Servedio, V. Loreto, A. Hotho, M. Grahl, and G. Stumme. Network properties of folksonomies. AI Communications Journal, Special Issue on "Network Analysis in Natural Sciences and Engineering", 20(4):245--262, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Erdos and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5:17--61, 1960.Google ScholarGoogle Scholar
  4. E. R. Gansner and S. C. North. An open graph visualization system and its applications to software engineering. Software --- Practice and Experience, 30(11):1203--1233, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Gkanogiannis and T. Kalamboukis. A novel supervised learning algorithm and its use for spam detection in social bookmarking systems. In ECML PKDD Discovery Challenge 2008 (RSDC'08), 2008.Google ScholarGoogle Scholar
  6. S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2):198--208, April 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Goldschmidt. Critical random hypergraphs: the emergence of a giant set of identiable vertices., Annals of Probability,. 33(4):1573--1600, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  8. P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11(6):36--45, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Hotho, D. Benz, R. Jäschke, and B. Krause, editors. ECML PKDD Discovery Challenge 2008 (RSDC'08). Workshop at 18th Europ. Conf. on Machine Learning (ECML'08) / 11th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'08), 2008.Google ScholarGoogle Scholar
  10. G. Koutrika, F. A. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In AIRWeb '07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, pages 57--64, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Krause, A. Hotho, and G. Stumme. The anti-social tagger - detecting spam in social bookmarking systems. In Proc. of the Fourth International Workshop on Adversarial Information Retrieval on the Web, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. McGlohon, L. Akoglu, and C. Faloutsos. Weighted graphs and disconnected components: patterns and a generator. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 524--532, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Neubauer and K. Obermayer. Predicting tag spam examining cooccurrences, network structures and url components. In ECML PKDD Discovery Challenge 2008 (RSDC'08), 2008.Google ScholarGoogle Scholar
  14. Knowledge&Data Engineering Group, University of Kassel Benchmark folksonomy data from bibsonomy, version of june 30th, 2008.Google ScholarGoogle Scholar
  15. G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In InfoScale '06: Proceedings of the 1st international conference on Scalable information systems, page 1, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Schmidt-Pruzan and E. Shamir. Component structure in the evolution of random hypergraphs. Combinatorica, 5(1):81--94, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  17. J. Serra. Connectivity on complete lattices. Journal of Mathematical Imaging and Vision, 9:231--251, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Wetzker, C. Zimmermann, and C. Bauckhage. Analyzing social bookmarking systems: A del.icio.us cookbook. In Mining Social Data (MSoDa) Workshop Proceedings, ECAI 2008, pages 26--30, 2008.Google ScholarGoogle Scholar

Index Terms

  1. Hyperincident connected components of tagging networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGWEB Newsletter
          ACM SIGWEB Newsletter  Volume 2009, Issue Autumn
          Sep. 2009
          41 pages
          ISSN:1931-1745
          EISSN:1931-1435
          DOI:10.1145/1592394
          Issue’s Table of Contents

          Copyright © 2009 Copyright is held by the owner/author(s)

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 September 2009

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader