Skip to main content
Log in

Mining latent relations in peer-production environments: a case study with Wikipedia article similarity and controversy

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

As people participate actively in social networking and peer-production sites, there are additional, implicit relations that emerge from various user activities. Mining such latent relations, or wisdom of crowds, is in itself an important area of ongoing research, with both general as well as domain-specific custom-made techniques. In this paper, we propose a new similarity measure, which we call expert-based similarity to discover semantic relations among Wikipedia articles from the co-editorship perspective. Also, different kinds of relations among entities may reveal diverse information. Both to explore and expose such a premise, we carry out a case study leveraging on multiple relations among Wikipedia articles. Specifically, we use expert-based similarity as well as other standard similarity measures, to discern the influence and impact of several factors which are hypothysed to generate controversies in Wikipedia articles. In the context of Wikipedia-specific research, our case study helps better differentiate the degree of impact of some of the possible causes of controversies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://en.wikipedia.org/wiki/Wikipedia:About.

  2. http://en.wikipedia.org/wiki/Enterprise_wiki.

  3. Note for clarification: From our analysis, we noticed that same contributor may actually contribute to many articles spread across different unrelated categories (and in that sense, the focus is not limited), e.g., on articles related to American football, Scientology and Biology, but within each specific category, the contributions are few and rather focused. The rest of the discussion is pertaining to contributions within one such category, namely “Religious Objects”, with which we carry out our case-study.

  4. http://en.wikipedia.org/wiki/{Pathi,Nizhal_Thangal,Muthiri_kinaru,Ayyavazhi,Vakaippathi}.

  5. http://glaros.dtc.umn.edu/gkhome/views/cluto/.

  6. The bots in Wikipedia are automated or semi-automated tools designed by contributors to carry out some edits, for example, adding some content and some links, reverting vandalism or removing some images, to a specific class of articles. Bots must be harmless and useful and be approved by Wikipedia.

  7. http://en.wikipedia.org/wiki/Wikipedia:Edit_war.

  8. http://en.wikipedia.org/wiki/Michael_Jackson.

  9. http://en.wikipedia.org/wiki/Nuclear_power.

References

  • Adafre SF, de Rijke M (2005) Discovering missing links in wikipedia. In: Proceedings of LinkKDD, ACM, New York, pp 90–97

  • Adler BT, de Alfaro L (2007) A content-driven reputation system for the wikipedia. In: Proceedings of WWW, pp 261–270

  • Allport GW (1979) The nature of prejudice. Basic books

  • Bhattacharyya P, Garg A, Wu S (2010) Analysis of user keyword similarity in online social networks. In: Social network analysis and mining, pp 1–16

  • Brandes U, Lerner J (2007) Visual analysis of controversy in user-generated encyclopedias. Inf Vis 7(1):34–48

    Article  Google Scholar 

  • Brandes U, Kenis P, Lerner J, van Raaij D (2009) Network analysis of collaboration structure in wikipedia. In: Proceedings of WWW, pp 731–740

  • Bross J, Richly K, Kohnen M, Meinel C (2011) Identifying the top-dogs of the blogosphere. In: Social network analysis and mining, pp 1–15

  • Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of ACM KDD, pp 226–231

  • Hamouda S, Wanas N (2011) Put-tag: personalized user-centric tag recommendation for social bookmarking systems. Social network analysis and mining, pp 1–9

  • Jeh G, Widom J (2002) Simrank: A measure of structural-context similarity. In: Proceedings of ACM KDD, pp 538–543

  • Johnson DW, Johnson FP (2002) Joining together: group theory and group skills, 8th edn. Allyn & Bacon, Boston

  • Kamps J, Koolen M (2008) The importance of link evidence in wikipedia. In: Proceedings of ECIR, pp 270–282

  • Kamps J, Koolen M (2009) Is wikipedia link structure different? In: Proceedings of WSDM, pp 232–241

  • Kittur A, Suh B, Pendleton BA, Chi EH (2007) He says, she says: Conflict and coordination in wikipedia. In: Proceedings of SIGCHI conference on CHI, pp 453–462

  • Le MT, Dang HV, Lim EP, Datta A (2008) Wikinetviz: Visualizing friends and adversaries in implicit social networks. In: Proceedings of intelligence and security informatics (ISI), pp 52–57

  • Lin YR, Sun J, Castro P, Konuru R, Sundaram H, Kelliher A (2009) Metafac: community discovery via relational hypergraph factorization. In: Proceedings of ACM KDD, Paris, pp 527–536

  • MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of Berkeley symposium on mathematical statistics and probability, pp 1:281–297

  • Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Mill JS (1982) On liberty. Penguin Classics

  • Myers D (2009) Social psychology, 10th edn. McGraw-Hill, New York

  • Potthast M, Stein B, Gerling R (2008) Automatic vandalism detection in wikipedia. In: Proceedings of ECIR, pp 663–668

  • Rendle S, Balby Marinho L, Nanopoulos A, Lars ST (2009) Learning optimal ranking with tensor factorization for tag recommendation. In: Proceedings of ACM KDD, Paris, pp 727–736

  • Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co. Inc., Boston

  • Vuong BQ, Lim EP, Sun A, Le MT, Lauw HW (2008) On ranking controversies in wikipedia: models and evaluation. In: Proceedings of WSDM, pp 171–182

  • West R, Precup D, Pineau J (2009a) Completing wikipedia’s hyperlink structure through dimensionality reduction. In: Proceedings of ACM CIKM, Hong Kong, pp 1097–1106

  • West R, Precup D, Pineau J (2009b) Completing wikipedia’s hyperlink structure through dimensionality reduction. In: Proceedings of CIKM, ACM, pp 1097–1106

  • Zhang Y, Sun A, Datta A, Chang K, Lim EP (2010) Do wikipedians follow domain experts? A domain-specific study on wikipedia knowledge building. In: Proceedings of JCDL, Gold Coast

  • Zhao P, Han J, Sun Y (2009) P-rank: A comprehensive structural similarity measure over information networks. In: Proceedings of CIKM, pp 553–562

  • Zhou Y, Cong G, Cui B, Jensen CS, Yao J (2009) Routing questions to the right users in online communities. In: Proceedings of IEEE ICDE, Washington, pp 700–711

Download references

Acknowledgments

The work is funded by A-STAR Grant No: 072 134 0055.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chenliang Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C., Datta, A. & Sun, A. Mining latent relations in peer-production environments: a case study with Wikipedia article similarity and controversy. Soc. Netw. Anal. Min. 2, 265–278 (2012). https://doi.org/10.1007/s13278-011-0037-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-011-0037-5

Keywords

Navigation