ABSTRACT
This paper describes an application of machine translation technology for supporting collaboration in Wikipedia. Wikipedia hosts separate language Wikipedias for hundreds of different languages. While some content is specific to these different versions of Wikipedia, some topics have pages within multiple different Wikipedias. Similarly, while some users participate only in one Wikipedia, we find users who play a bridging role between these sub-communities and participate in the process of maintaining similar pages in different Wikipedias. Since these are not the majority of users, a support tool that allows stretching the effort of these specialized users further by indicating where their effort is needed could be a tremendous benefit to the community. An evaluation of the proposed approach demonstrates promise that such a tool could substantially reduce the effort involved in playing this bridging role on Wikipedia.
- Christof Müller and Iryna Gurevych. 2009 Using Wikipedia and Wiktionary in Domain-Specific Information Retrieval Evaluating Systems for Multilingual and Multimodal Information Access, Springer Berlin /Heidelberg, pp. 219--226. Google ScholarDigital Library
- Steinberger, Ralf and Pouliquen, Bruno and Hagman, Johan 2002. Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC EProceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, pp. 415--424. Google ScholarDigital Library
- Aminul Islam and Diana Inkpen. 2008, Jul. Semantic Text Similarity Using Corpus-Based Word Similarity and String Similarity ACM Transaction on Knowledge Discovery from Data, Vol. 2, No. 2, Article 10. Google ScholarDigital Library
- Wikipedia Infoboxes Help. (2010, Dec.) {Online}. Available: http://en.wikipedia.org/wiki/Help:InfoboxGoogle Scholar
- Wikipedia Infoboxes Categories. (2010, Dec.){Online}. Available http://en.wikipedia.org/wiki/Category:InfoboxtemplatesGoogle Scholar
- MediaWiki API Documentation. (2010, Dec.) {Online}. Available: http://www.mediawiki.org/wiki/APIoxGoogle Scholar
- GoogleTranslate API, developer's guide (v2): Using REST. (2010, Dec.) {Online}. Available: http://code.google.com/apis/language/translate/v2/usingrest.htmlGoogle Scholar
- Libcurl - C API documentation. (2010, Dec.) {Online}. Available: http://curl.haxx.se/libcurl/c/Google Scholar
- PHP similar text function documentation (2010, Dec.) {Online}. Available: http://php.net/manual/en/function.similar-text.phpGoogle Scholar
- Jonathan J. Oliver. 2008, Jul. Decision Graphs - An Extension of Decision Trees. Available: http://www.cs.monash.edu.au/jono/TechReports/TR173.dgraph.psGoogle Scholar
- Metzler, Donald and Dumais, Susan and Meek, Christopher 2007. Similarity Measures for Short Segments of Text Advances in Information Retrieval Vol. 4425, Springer Berlin / Heidelberg, pp. 16--27. Google ScholarDigital Library
- C. Fellbaum. 1998. WordNet: An Electronical Lexical Database. The MIT Press, Cambridge, MA.Google Scholar
- PHP metaphone code generation function by Lawrence Philips. (2010, Dec.) {Online}. Available: http://php.net/manual/en/function.metaphone.phpGoogle Scholar
- Binstock & Rex. 1995. Practical Algorithms for Programmers Addison Wesley. Google ScholarDigital Library
- Parts Of Speech Tagging, PHP/ir, Information Retrieval and other interesting topics. (2010, Dec.) {Online}. Available: http://phpir.com/part-of-speechtaggingGoogle Scholar
- Adar, Skinner and Weld 2009, Information Arbitrage Across Multi-lingual Wikipedia WSDM'09, Barcelona, Spain. Google ScholarDigital Library
- Ulrike Pfeil, Panayiotis Zaphiris, Chee Siang Ang 2006, Cultural Differences in Collaborative Authoring of Wikipedia.Google Scholar
- B. Latane, K. Williams, and S. Harkins. Many hands make light the work: The causes and consequences of social loafing. J. Pers. Soc. Psych., 37:822--832, 1979.Google Scholar
- D. Cosley, D. Frankowski, L. Terveen... - 2007, SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia.Google Scholar
- S. L. Bryant, A Forte... - 2005, Becoming Wikipedian: Transformation of Participation in a Collaborative Online Encyclopedia.Google Scholar
- Slattery, S. P. (2009). "Edit this page": the socio-technological infrastructure of a Wikipedia article. In Proc. of the 27th ACM international conference on Design of communication (pp. 289--296). Bloomington, Indiana, USA: ACM. Google ScholarDigital Library
- Liu, Y., Liu, Q., & Lin, S. (2006). Tree-to-string alignment template for statistical machine translation, Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
- Gildea, D. (2003). Loosely tree-based alignment for machine translation, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
- Och, F. & Ney, H. (2000). Improved statistical alignment models, Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
- Mohler, M. & and Mihalcea, R. (2009). Text-to-text Semantic Similarity for Automatic Short Answer Grading, in Proceedings of the European Chapter of the Association for Computational Linguistics (EACL 2009), Athens, Greece. Google ScholarDigital Library
- Gbrilovich, E. & Markovitch, S. (2009). Wikipedia-based semantic interpretation for natural language processing, Journal of Artificial Intelligence Research 34(1). Google ScholarDigital Library
- Metzler, D., Dumais, S., & Meek, C. (2007). Similarity Measures for Short Segments of Text, Advances in Information Retrieval, Volume 4425, pp 16--27. Google ScholarDigital Library
Index Terms
- Supporting collaboration in Wikipedia between language communities
Recommendations
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Learning multilingual named entity recognition from Wikipedia
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Automatically Generating Wikipedia Info-boxes from Wikidata
WWW '18: Companion Proceedings of the The Web Conference 2018Info-boxes provide a summary of the most important meta-data relating to a particular entity described by a Wikipedia article. However, many articles have no info-box or have info-boxes with only minimal information; furthermore, there is a huge ...
Comments