Abstract
Data created by social bookmarking systems can be described as 3-partite 3-uniform hypergraphs connecting documents, users, and tags (tagging networks), such that the toolbox of complex network analysis can be applied to examine their properties. One of the most basic tools, the analysis of connected components, however cannot be applied meaningfully: Tagging networks tend to be almost entirely connected. We therefore propose a generalization of connected components, m-hyperincident connected components. We show that decomposing tagging networks into 2-hyperincident connected components yields a characteristic component distribution with a salient giant component that can be found across various datasets. This pattern changes if the underlying formation process changes, for example, if the hypergraph is constructed from search logs, or if the tagging data is contaminated by spam: It turns out that the second- to 129th largest components of the spam-labeled Bibsonomy dataset are inhabited exclusively by spam users. Based on these findings, we propose and unsupervised method for spam detection.
- F. Boesch. Synthesis of reliable networks, a survey. IEEE Trans. Reliab, 35:240--246, 1986.Google ScholarCross Ref
- C. Cattuto, C. Schmitz, A. Baldassarri, V. D. P. Servedio, V. Loreto, A. Hotho, M. Grahl, and G. Stumme. Network properties of folksonomies. AI Communications Journal, Special Issue on "Network Analysis in Natural Sciences and Engineering", 20(4):245--262, 2007. Google ScholarDigital Library
- P. Erdos and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5:17--61, 1960.Google Scholar
- E. R. Gansner and S. C. North. An open graph visualization system and its applications to software engineering. Software --- Practice and Experience, 30(11):1203--1233, 2000. Google ScholarDigital Library
- A. Gkanogiannis and T. Kalamboukis. A novel supervised learning algorithm and its use for spam detection in social bookmarking systems. In ECML PKDD Discovery Challenge 2008 (RSDC'08), 2008.Google Scholar
- S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2):198--208, April 2006. Google ScholarDigital Library
- C. Goldschmidt. Critical random hypergraphs: the emergence of a giant set of identiable vertices., Annals of Probability,. 33(4):1573--1600, 2005.Google ScholarCross Ref
- P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11(6):36--45, 2007. Google ScholarDigital Library
- A. Hotho, D. Benz, R. Jäschke, and B. Krause, editors. ECML PKDD Discovery Challenge 2008 (RSDC'08). Workshop at 18th Europ. Conf. on Machine Learning (ECML'08) / 11th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'08), 2008.Google Scholar
- G. Koutrika, F. A. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In AIRWeb '07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, pages 57--64, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- B. Krause, A. Hotho, and G. Stumme. The anti-social tagger - detecting spam in social bookmarking systems. In Proc. of the Fourth International Workshop on Adversarial Information Retrieval on the Web, 2008. Google ScholarDigital Library
- M. McGlohon, L. Akoglu, and C. Faloutsos. Weighted graphs and disconnected components: patterns and a generator. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 524--532, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- N. Neubauer and K. Obermayer. Predicting tag spam examining cooccurrences, network structures and url components. In ECML PKDD Discovery Challenge 2008 (RSDC'08), 2008.Google Scholar
- Knowledge&Data Engineering Group, University of Kassel Benchmark folksonomy data from bibsonomy, version of june 30th, 2008.Google Scholar
- G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In InfoScale '06: Proceedings of the 1st international conference on Scalable information systems, page 1, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- J. Schmidt-Pruzan and E. Shamir. Component structure in the evolution of random hypergraphs. Combinatorica, 5(1):81--94, 1984.Google ScholarCross Ref
- J. Serra. Connectivity on complete lattices. Journal of Mathematical Imaging and Vision, 9:231--251, 1998. Google ScholarDigital Library
- R. Wetzker, C. Zimmermann, and C. Bauckhage. Analyzing social bookmarking systems: A del.icio.us cookbook. In Mining Social Data (MSoDa) Workshop Proceedings, ECAI 2008, pages 26--30, 2008.Google Scholar
Index Terms
- Hyperincident connected components of tagging networks
Recommendations
Hyperincident connected components of tagging networks
HT '09: Proceedings of the 20th ACM conference on Hypertext and hypermediaData created by social bookmarking systems can be described as 3-partite 3-uniform hypergraphs connecting documents, users, and tags (tagging networks), such that the toolbox of complex network analysis can be applied to examine their properties. One of ...
Tag spam creates large non-giant connected components
AIRWeb '09: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the WebSpammers in social bookmarking systems try to mimick bookmarking behaviour of real users to gain the attention of other users or search engines. Several methods have been proposed for the detection of such spam, including domain-specific features (like ...
Structure and enumeration of two-connected graphs with prescribed three-connected components
We adapt the classical 3-decomposition of any 2-connected graph to the case of simple graphs (no loops or multiple edges). By analogy with the block-cutpoint tree of a connected graph, we deduce from this decomposition a bicolored tree tc(g) associated ...
Comments