Extracting Difference Information from Multilingual Wikipedia

Fujiwara, Yuya; Suzuki, Yu; Konishi, Yukio; Nadamoto, Akiyo

doi:10.1007/978-3-642-29253-8_42

Yuya Fujiwara²⁰,
Yu Suzuki²¹,
Yukio Konishi²⁰ &
…
Akiyo Nadamoto²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7235))

Included in the following conference series:

Asia-Pacific Web Conference

2136 Accesses
3 Citations

Abstract

Wikipedia articles for a particular topic are written in many languages. When we select two articles which are about a single topic but which are written in different languages, the contents of these two articles are expected to be identical because of the Wikipedia policy. However, these contents are actually different, especially topics related to culture. In this paper, we propose a system to extract different Wikipedia information between that shown for Japan and that of other countries. An important technical problem is how to extract comparison target articles of Wikipedia. A Wikipedia article is written in different languages, with their respective linguistic structures. For example, “Cricket” is an important part of English culture, but the Japanese Wikipedia article related to cricket is too simple. Actually, it is only a single page. In contrast, the English version is substantial. It includes multiple pages. For that reason, we must consider which articles can be reasonably compared. Subsequently, we extract comparison target articles of Wikipedia based on a link graph and article structure. We implement our proposed method, and confirm the accuracy of difference extraction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adar, E., Skinner, M., Weld, D.S.: Information arbitrage across multi-lingual wikipedia. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM 2009, pp. 94–103. ACM, New York (2009)
Chapter Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW7: Proceedings of the Seventh International Conference on World Wide Web 7, WWW7, vol. 30, pp. 107–117. Elsevier Science Publishers B. V. (1998)
Google Scholar
Chen, Z., Liu, S., Wenyin, L., Pu, G., Ma, W.Y.: Building a web thesaurus from web link structure. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 48–55 (2003)
Google Scholar
Kamps, J., Koolen, M.: Is wikipedia link structure different? In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 232–241 (2009)
Google Scholar
Milne, D.: Computing semantic relatedness using wikipedia link structure. In: Proc. of New Zealand Computer Science Research Student Conference, NZCSRSC 2007. CDROM (2007)
Google Scholar
Milne, D., Medelyan, O., Witten, I.H.: Mining Domain-Specific thesauri from wikipedia: A case study. In: WI 2006: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 442–448 (2006)
Google Scholar
Nakatani, M., Jatowt, A., Tanaka, K.: Adaptive ranking of search results by considering user’s comprehension. In: Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication, ICUIMC 2010. CDROM (2010)
Google Scholar
Nakayama, K., Hara, T., Nishio, S.: Wikipedia Mining for an Association Web Thesaurus Construction. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds.) WISE 2007. LNCS, vol. 4831, pp. 322–334. Springer, Heidelberg (2007)
Chapter Google Scholar
Strube, M., Ponzetto, S.P.: WikiRelate! computing semantic relatedness using wikipedia. In: Proceedings of the 21st International Conference on Artificial Intelligence (AAAI 2006), pp. 1419–1424 (2006)
Google Scholar
Takahashi, Y., Ohshima, H., Yamamoto, M., Iwasaki, H., Oyama, S., Tanaka, K.: Evaluating significance of historical entities based on tempo-spatial impacts analysis using wikipedia link structure. In: Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, HT 2011, pp. 83–92. ACM, New York (2011)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Konan University, 8-9-1 Okamoto, Higashi-Nada, Kobe, Hyogo, 6588501, Japan
Yuya Fujiwara, Yukio Konishi & Akiyo Nadamoto
Nagoya University, Furo, Chikusa, Nagoya, Aichi, 4648601, Japan
Yu Suzuki

Authors

Yuya Fujiwara
View author publications
You can also search for this author in PubMed Google Scholar
Yu Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Yukio Konishi
View author publications
You can also search for this author in PubMed Google Scholar
Akiyo Nadamoto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, The University of Adelaide, Australia
Quan Z. Sheng
College of Information Science and Engineering, Northeastern University, 110819, Shenyang, China
Guoren Wang
Aarhus University, Denmark
Christian S. Jensen
Center for Applied Informatics, Victoria University, PO Box 14428, 8001, VIC, Australia
Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fujiwara, Y., Suzuki, Y., Konishi, Y., Nadamoto, A. (2012). Extracting Difference Information from Multilingual Wikipedia. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-29253-8_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29252-1
Online ISBN: 978-3-642-29253-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics