Skip to main content

Analysis of Cluster Structure in Large-Scale English Wikipedia Category Networks

  • Conference paper
Advances in Intelligent Data Analysis XII (IDA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8207))

Included in the following conference series:

  • 2429 Accesses

Abstract

In this paper we propose a framework for analysing the structure of a large-scale social media network, a topic of significant recent interest. Our study is focused on the Wikipedia category network, where nodes correspond to Wikipedia categories and edges connect two nodes if the nodes share at least one common page within the Wikipedia network. Moreover, each edge is given a weight that corresponds to the number of pages shared between the two categories that it connects. We study the structure of category clusters within the three complete English Wikipedia category networks from 2010 to 2012. We observe that category clusters appear in the form of well-connected components that are naturally clustered together. For each dataset we obtain a graph, which we call the t-filtered category graph, by retaining just a single edge linking each pair of categories for which the weight of the edge exceeds some specified threshold t. Our framework exploits this graph structure and identifies connected components within the t-filtered category graph. We studied the large-scale structural properties of the three Wikipedia category networks using the proposed approach. We found that the number of categories, the number of clusters of size two, and the size of the largest cluster within the graph all appear to follow power laws in the threshold t. Furthermore, for each network we found the value of the threshold t for which increasing the threshold to t + 1 caused the “giant” largest cluster to diffuse into two or more smaller clusters of significant size and studied the semantics behind this diffusion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beschastnikh, I., Kriplean, T., McDonald, D.W., ICWSM: Wikipedian self-governance in action: Motivating the policy lens. In: Proceedings of ICWSM (2008)

    Google Scholar 

  2. Buriol, L.S., Castillo, C., Donato, D., Leonardi, S., Millozzi, S.: Temporal analysis of the wikigraph. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006), pp. 45–51. IEEE (2006)

    Google Scholar 

  3. Goel, S., Hofman, J.M., Sirer, M.I.: Who does what on the web: A large-scale study of browsing behavior. In: Proceedings of ICWSM (2012)

    Google Scholar 

  4. Hu, M., Lim, E.-P., Krishnan, R.: Predicting outcome for collaborative featured article nomination in wikipedia. In: Proceedings of ICWSM (2009)

    Google Scholar 

  5. Jurgens, D., Lu, T.-C.: Temporal motifs reveal the dynamics of editor interactions in wikipedia. In: Proceedings of ICWSM (2012)

    Google Scholar 

  6. Kamps, J., Koolen, M.: Is wikipedia link structure different? In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 232–241. ACM (2009)

    Google Scholar 

  7. Kittur, A., Chi, E.H., Suh, B.: What’s in wikipedia?: mapping topics and conflict using socially annotated category structure. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1509–1512. ACM (2009)

    Google Scholar 

  8. Laniado, D., Tasso, R., Volkovich, Y., Kaltenbrunner, A.: When the wikipedians talk: Network and tree structure of wikipedia discussion pages. In: Proceedings of ICWSM (2011)

    Google Scholar 

  9. Leskovec, J., Huttenlocher, D.P., Kleinberg, J.M.: Governance in social media: A case study of the wikipedia promotion process. In: Proceedings of ICWSM (2010)

    Google Scholar 

  10. Macskassy, S.A.: On the study of social interactions in twitter. In: Proceedings of ICWSM (2012)

    Google Scholar 

  11. Massa, P.: Social networks of wikipedia. In: Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, pp. 221–230. ACM (2011)

    Google Scholar 

  12. Sadilek, A., Kautz, H.A., Silenzio, V.: Modeling spread of disease from social interactions. In: Proceedings of ICWSM (2012)

    Google Scholar 

  13. Schaeffer, S.E.: Graph clustering. Computer Science Review 1(1), 27–64 (2007)

    Article  MathSciNet  Google Scholar 

  14. Schönhofen, P.: Identifying document topics using the wikipedia category network. Web Intelligence and Agent Systems 7(2), 195–207 (2009)

    Google Scholar 

  15. Scott, J.: Social Network Analysis: a handbook. SAGE Publications, London (2011)

    Google Scholar 

  16. Suchecki, K., Salah, A.A.A., Gao, C., Scharnhorst, A.: Evolution of wikipedia’s category structure. Advances in Complex Systems 15(supp. 1) (2012)

    Google Scholar 

  17. Volkovich, Y., Scellato, S., Laniado, D., Mascolo, C., Kaltenbrunner, A.: The length of bridge ties: Structural and geographic properties of online social interactions. In: Proceedings of ICWSM (2012)

    Google Scholar 

  18. Wattenhofer, M., Wattenhofer, R., Zhu, Z.: The youtube social network. In: Proceedings of ICWSM (2012)

    Google Scholar 

  19. Yun, J., Jing, L., Yu, J., Huang, H., Zhang, Y.: Document topic extraction based on wikipedia category. In: 2011 Fourth International Joint Conference on Computational Sciences and Optimization (CSO), pp. 852–856. IEEE (2011)

    Google Scholar 

  20. Zesch, T., Gurevych, I.: Analysis of the wikipedia category graph for nlp applications. In: Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007), pp. 1–8 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Klaysri, T., Fenner, T., Lachish, O., Levene, M., Papapetrou, P. (2013). Analysis of Cluster Structure in Large-Scale English Wikipedia Category Networks. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds) Advances in Intelligent Data Analysis XII. IDA 2013. Lecture Notes in Computer Science, vol 8207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41398-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41398-8_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41397-1

  • Online ISBN: 978-3-642-41398-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics