ABSTRACT
We present Omnipedia, a system that allows Wikipedia readers to gain insight from up to 25 language editions of Wikipedia simultaneously. Omnipedia highlights the similarities and differences that exist among Wikipedia language editions, and makes salient information that is unique to each language as well as that which is shared more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with a multilingual Wikipedia experience. These include visualizing content in a language-neutral way and aligning data in the face of diverse information organization strategies. We present a study of Omnipedia that characterizes how people interact with information using a multilingual lens. We found that users actively sought information exclusive to unfamiliar language editions and strategically compared how language editions defined concepts. Finally, we briefly discuss how Omnipedia generalizes to other domains facing language barriers.
Supplemental Material
- Adafre, S.F. and de Rijke, M. 2006. Finding Similar Sentences Across Multiple Languages in Wikipedia. EACL 2006 Workshop on New Text, Wikis and Blogs and Other Dynamic Text Sources.Google Scholar
- Adar, E., Skinner, M. and Weld, D.S. 2009. Information Arbitrage Across Multi-lingual Wikipedia. WSDM '09. Google ScholarDigital Library
- von Ahn, L. 2011. Three human computation projects. (2011). SIGCSE '11. Google ScholarDigital Library
- Au Yeung, C.-man, Duh, K. and Nagata, M. 2011. Providing Cross-Lingual Editing Assistance to Wikipedia Editors. CICL '11. Google ScholarDigital Library
- Bergstrom, T. and Karahalios, K. 2009. Conversation clusters: grouping conversation topics through human-computer dialog. CHI '09. Google ScholarDigital Library
- Budanitsky, A. and Hirst, G. 2006. Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics. 32, 1 (2006), 13--47. Google ScholarDigital Library
- Callahan, E.S. and Herring, S.C. Cultural bias in Wikipedia content on famous persons. Journal of the American Society for Information Science and Technology. 62: 1899--1915. Google ScholarDigital Library
- Capocci, A., Servedio, V.D.P., Colaiori, F., Buriol, L.S., Donato, D., Leonardi, S. and Caldarelli, G. 2006. Preferential attachment in the growth of social networks: The internet encyclopedia Wikipedia. Physical Review E. 74, 3 (2006), 036116.Google ScholarCross Ref
- Dong, W. and Fu, W.-T. 2010. Cultural difference in image tagging. CHI '10. Google ScholarDigital Library
- Duolingo: http://duolingo.com/. Accessed: 2011-09--13.Google Scholar
- Filatova, E. 2009. Multilingual Wikipedia, Summarization, and Information Trustworthiness. SIGIR Workshop on Information Access in a Multilingual World.Google Scholar
- Frequently asked questions - Wikimedia Foundation: http://wikimediafoundation.org/wiki/Frequently_asked_questions. Accessed: 2011-09--21.Google Scholar
- Gärdenfors, P. 2000. Conceptual Spaces: The Geometry of Thought. The MIT Press. Google ScholarCross Ref
- Hecht, B. and Gergle, D. 2009. Measuring Self-Focus Bias in Community-Maintained Knowledge Repositories. Communities and Technologies 2009. Google ScholarDigital Library
- Hecht, B. and Gergle, D. 2010. The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context. CHI '10. Google ScholarDigital Library
- Hong, L., Convertino, G. and Chi, E.H. 2011. Language Matters in Twitter: A Large Scale Study. ICWSM '11.Google Scholar
- Jarmasz, M. and Szpakowicz, S. 2003. Roget's thesaurus and semantic similarity. RANLP '03.Google Scholar
- Kittur, A., Suh, B. and Chi, E.H. 2008. Can you ever trust a wiki?: impacting perceived trustworthiness in wikipedia. CSCW '08. Google ScholarDigital Library
- Kumaran, A., Datha, N., Ashok, B., Saravanan, K., Ande, A., Sharma, A., Vedantham, S., Natampally, V., Dendi, V. and Maurice, S. 2010. WikiBABEL: A System for Multilingual Wikipedia Content. American Machine Translation Association (AMTA) Workshop.Google Scholar
- wiki/List_of_Wikipedias. Accessed: 2011-09--20.Google Scholar
- Manypedia: 2011. http://www.manypedia.com/.Google Scholar
- de Melo, G. and Weikum, G. 2010. Untangling the Cross-Lingual Link Structure of Wikipedia. ACL '10. Google ScholarDigital Library
- Mihalcea, R. and Csomai, A. 2007. Wikify!: linking documents to encyclopedic knowledge. CIKM '07. Google ScholarDigital Library
- Milne, D. and Witten, I.H. 2008. An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. WIKIAI '08.Google Scholar
- Milne, D. and Witten, I.H. 2008. Learning to link with wikipedia. CIKM '08. Google ScholarDigital Library
- Oh, J.-H., Kawahara, D., Uchimoto, K., Kazama, J. and Torisawa, K. 2008. Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia. WIIAT '08. Google ScholarDigital Library
- Pfeil, U., Zaphiris, P. and Ang, C.S. 2006. Cultural Differences in Collaborative Authoring of Wikipedia. Journal of Computer-Mediated Communication. 12, 1, 88--113.Google ScholarCross Ref
- Sorg, P. and Cimiano, P. 2008. Enriching the Crosslingual Link Structure of Wikipedia - A Classification-based Approach. WIKI-AI '08.Google Scholar
- Suh, B., Chi, E.H, Pendleton, B.A. and Kittur, A. 2007. Us vs. Them: Understanding Social Dynamics in Wikipedia with Revert Graph Visualizations. VAST '07. Google ScholarDigital Library
- Translating the world's information with Google Translator Toolkit: 2009. http://googleblog.blogspot.com/2009/06/translating-worlds-information-with.html. Accessed: 2011-09--16.Google Scholar
- Viégas, F.B., Wattenberg, M. and Dave, K. 2004. Studying cooperation and conflict between authors with history flow visualizations. CHI '04.Google Scholar
- Wattenberg, M., Viégas, F.B. and Hollenbach, K. 2007. Visualizing activity on wikipedia with chromograms. INTERACT '07. Google ScholarDigital Library
- WikiBhasha beta -- A multi-lingual content creator for Wikipedia: http://www.wikibhasha.org/.Google Scholar
Index Terms
- Omnipedia: bridging the wikipedia language gap
Recommendations
Multilinguals and Wikipedia editing
WebSci '14: Proceedings of the 2014 ACM conference on Web scienceThis article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users). Such multilingual users may serve an important function in diffusing information across ...
The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context
CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsThis study explores language's fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included ...
Okinawa in Japanese and English wikipedia
CHI EA '14: CHI '14 Extended Abstracts on Human Factors in Computing SystemsThis research analyzes edits by foreign-language users in Wikipedia articles about Okinawa, Japan, in the Japanese and English editions of the encyclopedia. Okinawa, home to both English and Japanese speaking users, provides a good case to look at ...
Comments