skip to main content
10.1145/2641580.2641615acmconferencesArticle/Chapter ViewAbstractPublication PagesopencollabConference Proceedingsconference-collections
tutorial

WikiBrain: Democratizing computation on Wikipedia

Authors Info & Claims
Published:27 August 2014Publication History

ABSTRACT

Wikipedia is known for serving humans' informational needs. Over the past decade, the encyclopedic knowledge encoded in Wikipedia has also powerfully served computer systems. Leading algorithms in artificial intelligence, natural language processing, data mining, geographic information science, and many other fields analyze the text and structure of articles to build computational models of the world.

Many software packages extract knowledge from Wikipedia. However, existing tools either (1) provide Wikipedia data, but not well-known Wikipedia-based algorithms or (2) narrowly focus on one such algorithm.

This paper presents the WikiBrain software framework, an extensible Java-based platform that democratizes access to a range of Wikipedia-based algorithms and technologies. WikiBrain provides simple access to the diverse Wikipedia data needed for semantic algorithms and technologies, ranging from page views to Wikidata. In a few lines of code, a developer can use WikiBrain to access Wikipedia data and state-of-the-art algorithms. WikiBrain also enables researchers to extend Wikipedia-based algorithms and evaluate their extensions. WikiBrain promotes a new vision of the Wikipedia software ecosystem: every researcher and developer should have access to state-of-the-art Wikipedia-based technologies.

References

  1. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. DBpedia: A Nucleus for a Web of Open Data. Lecture Notes in Computer Science, page 722--735, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Bao, B. Hecht, S. Carton, M. Quaderi, M. Horn, and D. Gergle. Omnipedia: bridging the wikipedia language gap. In CHI '12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Bergstrom and K. Karahalios. Conversation clusters: grouping conversation topics through human-computer dialog. In CHI '09, pages 2349--2352, Boston, MA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In EMNLP-CoNLL, volume 7, pages 708--716, 2007.Google ScholarGoogle Scholar
  5. O. Egozi, S. Markovitch, and E. Gabrilovich. Concept-Based Information Retrieval Using Explicit Semantic Analysis. Trans. Inf. Syst., 29(2):1--34, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Erdmann, K. Nakayama, T. Hara, and S. Nishio. An approach for extracting bilingual terminology from wikipedia. In Database Systems for Advanced Applications, pages 380--392. Springer Berlin Heidelberg, Jan. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Gabrilovich and S. Markovitch. Wikipedia-based semantic interpretation for natural language processing. JAIR, 34:443--498, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Goldsberry. CourtVision | examining the NBA through spatial and visual analytics, 2012.Google ScholarGoogle Scholar
  9. M. F. Goodchild. Citizens as sensors: the world of volunteered geography. GeoJournal, 69(4):211--221, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. Graham, S. A. Hale, and M. Stephens. Geographies of the World's Knowledge. Convoco! Edition, 2011.Google ScholarGoogle Scholar
  11. G. Halawi, G. Dror, E. Gabrilovich, and Y. Koren. Large-scale learning of word relatedness with constraints. In KDD '12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Halfaker. MediaWiki utilities.Google ScholarGoogle Scholar
  13. D. Hardy, J. Frew, and M. F. Goodchild. Volunteered geographic information production as a spatial process. IJGIS, 26(7):1191--1212, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Harispe, S. Ranwez, S. Janaqi, and J. Montmain. Semantic measures for the comparison of units of language, concepts or entities from text and knowledge base analysis. CoRR, abs/1310.1285, 2013.Google ScholarGoogle Scholar
  15. B. Hecht, S. H. Carton, M. Quaderi, J. Schöning, M. Raubal, D. Gergle, and D. Downey. Explanatory semantic relatedness and explicit spatialization for exploratory search. SIGIR '12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Hecht and D. Gergle. Measuring self-focus bias in community-maintained knowledge repositories. In C&T '09, page 11--19, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Hecht and D. Gergle. On the "Localness" of user-generated content. In CSCW '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Hecht and D. Gergle. The tower of babel meets web 2.0: User-generated content and its applications in a multilingual context. In CHI '10. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Hecht and D. Gergle. A beginner's guide to geographic virtual communities research. IGI Global, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  20. B. Hecht and E. Moxley. Terabytes of tobler: evaluating the first law in a massive, domain-neutral representation of world knowledge. In COSIT '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Hecht, J. Schöning, L. Capra, A. Mashhadi, L. Terveen, and M.-P. Kwan. 2013 workshop on geographic human-computer interaction. In CHI '13 EA:, 2013.Google ScholarGoogle Scholar
  22. J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Artificial Intelligence, 194:28--61, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Kittur, E. H. Chi, B. A. Pendleton, B. Suh, and T. Mytkowicz. Power of the few vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie. In CHI '07, 2007.Google ScholarGoogle Scholar
  24. G. Kobilarov, T. Scott, Y. Raimond, S. Oliver, C. Sizemore, M. Smethurst, C. Bizer, and R. Lee. Media meets semantic web -- how the BBC uses DBpedia and linked data to make connections. In The Semantic Web: Research and Applications, number 5554 in Lecture Notes in Computer Science, pages 723--737. Springer Berlin Heidelberg, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Lam, A. Uduwage, Z. Dong, S. Sen, D. Musicant, L. Terveen, and J. Riedl. WP:Clubhouse? an exploration of wikipedia's gender imbalance. In WikiSym '11:, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. D. Lieberman and J. Lin. You are where you edit: Locating wikipedia users through edit histories. In ICWSM '09, 2009.Google ScholarGoogle Scholar
  27. P. Massa and F. Scrinzi. Manypedia: Comparing language points of view of wikipedia communities. In WikiSym '12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. J. McIver and J. S. Brownstein. Wikipedia usage estimates prevalence of influenza-like illness in the united states in near real-time. PLoS Comput Biol, 10(4):e1003581, Apr. 2014.Google ScholarGoogle ScholarCross RefCross Ref
  29. M. Mestyán, T. Yasseri, and J. Kertész. Early prediction of movie box office success based on wikipedia activity big data. PLoS ONE, 8(8):e71226, Aug. 2013.Google ScholarGoogle ScholarCross RefCross Ref
  30. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google ScholarGoogle Scholar
  31. R. Miller. Wikipedia founder jimmy wales responds. Slashdot: News for Nerds, Stuff That Matters, 28, 2004.Google ScholarGoogle Scholar
  32. D. Milne and I. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, 2008.Google ScholarGoogle Scholar
  33. D. Minmo, H. M. Wallach, J. Naradowsky, D. A. Smith, and A. McCallum. Polylingual topic models. In EMNLP '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Okoli, M. Mehdi, M. Mesgari, F. Nielsen, and A. Lanamäki. The people's encyclopedia under the gaze of the sages: A systematic review of scholarly research on wikipedia. Available at SSRN, 2012.Google ScholarGoogle Scholar
  35. C. Pang and R. Biuk-Aghai. Wikipedia world map: Method and application of map-like wiki visualization. In WikiSym '11, Mountain View, CA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Patwardhan, S. Banerjee, and T. Pedersen. Using measures of semantic relatedness for word sense disambiguation. In CICLING '03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. T. Pedersen, S. Patwardhan, and J. Michelizzi. Wordnet:: Similarity: measuring the relatedness of concepts. In Demonstration Papers at HLT-NAACL 2004, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. U. Pfeil, P. Zaphiris, and C. S. Ang. Cultural differences in collaborative authoring of wikipedia. JCMC, 12(1):88--113, Oct. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  39. G. Pirró. Reword: Semantic relatedness in the web of data. In AAAI '12, 2012.Google ScholarGoogle Scholar
  40. S. P. Ponzetto and M. Strube. Exploiting semantic role labeling, WordNet and wikipedia for coreference resolution. In NAACL '06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. R. Priedhorsky, J. Chen, S. T. K. Lam, K. Panciera, L. Terveen, and J. Riedl. Creating, destroying, and restoring value in wikipedia. In Group '07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. K. Radinsky, E. Agichtein, E. Gabrilovich, and S. Markovitch. A word at a time: computing word relatedness using temporal semantic analysis. In WWW '11, pages 337--346. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. P. Resnick. Using information content to evaluate semantic similarity in a taxonomy. In IJCAI '95, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. S. Sen, E. Nunes, E. I. Sparling, H. Charlton, R. Kerwin, J. Lim, B. Maus, N. Miller, M. R. Naminski, A. Schneeman, and et al. Macademia. IUI '11, 2011.Google ScholarGoogle Scholar
  45. A. Skupin and S. I. Fabrikant. Spatialization methods: A cartographic research agenda for non-geographic information visualization. CAGIS, 30(2):95--115, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  46. J. R. Smith, C. Quirk, and K. Toutanova. Extracting parallel sentences from comparable corpora using document level alignment. In NAACL '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Strube and S. P. Ponzetto. WikiRelate! computing semantic relatedness using wikipedia. In AAAI '06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. W. R. Tobler. A computer movie simulating urban growth in the Detroit region. Economic geography, 1970.Google ScholarGoogle Scholar
  49. D. Vrandečić. Wikidata: A New Platform for Collaborative Data Collection. In WWW '12 Companion, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. M. Wiesmann. Falsehoods programmers believe about geography, 2012. 00000.Google ScholarGoogle Scholar
  51. B. P. Wing and J. Baldridge. Simple supervised document geolocation with geodesic grids. In ACL '11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. T. Yasseri, A. Spoerri, M. Graham, and J. Kertesz. The most controversial topics in wikipedia: A multilingual and geographical analysis. In P. Fichman and N. Hara, editors, Global Wikipedia: International and cross-cultural issues in online collaboration. Scarecrow Press, 2014.Google ScholarGoogle Scholar
  53. T. Yasseri, R. Sumi, and J. Kertész. Circadian patterns of wikipedia editorial activity: A demographic analysis. PLoS One, 7(1):1--8, Jan. 2012.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. WikiBrain: Democratizing computation on Wikipedia

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            OpenSym '14: Proceedings of The International Symposium on Open Collaboration
            August 2014
            302 pages
            ISBN:9781450330169
            DOI:10.1145/2641580

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 August 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • tutorial
            • Research
            • Refereed limited

            Acceptance Rates

            OpenSym '14 Paper Acceptance Rate29of64submissions,45%Overall Acceptance Rate108of195submissions,55%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader