skip to main content
research-article

Hyperincident connected components of tagging networks

Published: 01 September 2009 Publication History

Abstract

Data created by social bookmarking systems can be described as 3-partite 3-uniform hypergraphs connecting documents, users, and tags (tagging networks), such that the toolbox of complex network analysis can be applied to examine their properties. One of the most basic tools, the analysis of connected components, however cannot be applied meaningfully: Tagging networks tend to be almost entirely connected. We therefore propose a generalization of connected components, m-hyperincident connected components. We show that decomposing tagging networks into 2-hyperincident connected components yields a characteristic component distribution with a salient giant component that can be found across various datasets. This pattern changes if the underlying formation process changes, for example, if the hypergraph is constructed from search logs, or if the tagging data is contaminated by spam: It turns out that the second- to 129th largest components of the spam-labeled Bibsonomy dataset are inhabited exclusively by spam users. Based on these findings, we propose and unsupervised method for spam detection.

References

[1]
F. Boesch. Synthesis of reliable networks, a survey. IEEE Trans. Reliab, 35:240--246, 1986.
[2]
C. Cattuto, C. Schmitz, A. Baldassarri, V. D. P. Servedio, V. Loreto, A. Hotho, M. Grahl, and G. Stumme. Network properties of folksonomies. AI Communications Journal, Special Issue on "Network Analysis in Natural Sciences and Engineering", 20(4):245--262, 2007.
[3]
P. Erdos and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5:17--61, 1960.
[4]
E. R. Gansner and S. C. North. An open graph visualization system and its applications to software engineering. Software --- Practice and Experience, 30(11):1203--1233, 2000.
[5]
A. Gkanogiannis and T. Kalamboukis. A novel supervised learning algorithm and its use for spam detection in social bookmarking systems. In ECML PKDD Discovery Challenge 2008 (RSDC'08), 2008.
[6]
S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2):198--208, April 2006.
[7]
C. Goldschmidt. Critical random hypergraphs: the emergence of a giant set of identiable vertices., Annals of Probability,. 33(4):1573--1600, 2005.
[8]
P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11(6):36--45, 2007.
[9]
A. Hotho, D. Benz, R. Jäschke, and B. Krause, editors. ECML PKDD Discovery Challenge 2008 (RSDC'08). Workshop at 18th Europ. Conf. on Machine Learning (ECML'08) / 11th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'08), 2008.
[10]
G. Koutrika, F. A. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In AIRWeb '07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, pages 57--64, New York, NY, USA, 2007. ACM.
[11]
B. Krause, A. Hotho, and G. Stumme. The anti-social tagger - detecting spam in social bookmarking systems. In Proc. of the Fourth International Workshop on Adversarial Information Retrieval on the Web, 2008.
[12]
M. McGlohon, L. Akoglu, and C. Faloutsos. Weighted graphs and disconnected components: patterns and a generator. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 524--532, New York, NY, USA, 2008. ACM.
[13]
N. Neubauer and K. Obermayer. Predicting tag spam examining cooccurrences, network structures and url components. In ECML PKDD Discovery Challenge 2008 (RSDC'08), 2008.
[14]
Knowledge&Data Engineering Group, University of Kassel Benchmark folksonomy data from bibsonomy, version of june 30th, 2008.
[15]
G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In InfoScale '06: Proceedings of the 1st international conference on Scalable information systems, page 1, New York, NY, USA, 2006. ACM Press.
[16]
J. Schmidt-Pruzan and E. Shamir. Component structure in the evolution of random hypergraphs. Combinatorica, 5(1):81--94, 1984.
[17]
J. Serra. Connectivity on complete lattices. Journal of Mathematical Imaging and Vision, 9:231--251, 1998.
[18]
R. Wetzker, C. Zimmermann, and C. Bauckhage. Analyzing social bookmarking systems: A del.icio.us cookbook. In Mining Social Data (MSoDa) Workshop Proceedings, ECAI 2008, pages 26--30, 2008.

Cited By

View all
  • (2018)Accessing Information with Tags: Search and RankingSocial Information Access10.1007/978-3-319-90092-6_9(310-343)Online publication date: 3-May-2018
  • (2016)Cascading failure analysis in hyper-network based on the hypergraphActa Physica Sinica10.7498/aps.65.08890165:8(088901)Online publication date: 2016
  • (2014)Clustering on heterogeneous networksWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.11264:3(213-233)Online publication date: 28-Apr-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGWEB Newsletter
ACM SIGWEB Newsletter  Volume 2009, Issue Autumn
Sep. 2009
41 pages
ISSN:1931-1745
EISSN:1931-1435
DOI:10.1145/1592394
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2009
Published in SIGWEB Volume 2009, Issue Autumn

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Accessing Information with Tags: Search and RankingSocial Information Access10.1007/978-3-319-90092-6_9(310-343)Online publication date: 3-May-2018
  • (2016)Cascading failure analysis in hyper-network based on the hypergraphActa Physica Sinica10.7498/aps.65.08890165:8(088901)Online publication date: 2016
  • (2014)Clustering on heterogeneous networksWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.11264:3(213-233)Online publication date: 28-Apr-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media