A Comparative Evaluation of Cross-Lingual Text Annotation Techniques

Zhang, Lei; Rettinger, Achim; Färber, Michael; Tadić, Marko

doi:10.1007/978-3-642-40802-1_16

Lei Zhang²¹,
Achim Rettinger²¹,
Michael Färber²¹ &
…
Marko Tadić²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8138))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1906 Accesses
3 Citations

Abstract

In this paper, we study the problem of extracting knowledge from textual documents written in different languages by annotating the text on the basis of a cross-lingual knowledge base, namely Wikipedia. Our contribution is twofold. First, we propose a novel framework for evaluating cross-lingual text annotation techniques, based on annotation of a parallel corpus to a hub-language in a cross-lingual knowledge base. Second, we investigate the performance of different cross-lingual text annotation techniques according to our proposed evaluation framework. We perform experiments for an empirical comparison of three approaches: (i) Cross-lingual Named Entity Annotation (CL-NEA), (ii) Cross-lingual Wikifier Annotation (CL-WIFI), and (iii) Cross-lingual Explicit Semantic Analysis (CL-ESA). Besides establishing an evaluation framework, our results show the differences between the three investigated approaches and demonstrate their advantages and disadvantages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Semantic Annotation of Text Using Open Semantic Resources

Text Annotation Tools: A Comprehensive Review and Comparative Analysis

References

Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, ANLC 1997, pp. 194–201. Association for Computational Linguistics, Stroudsburg (1997)
Chapter Google Scholar
Sekine, S.: NYU: Description of the Japanese NE system used for MET-2. In: Proc. of the Seventh Message Understanding Conference, MUC-7 (1998)
Google Scholar
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE Named Entity System as Used in MUC-7. In: Proceedings of the Message Understanding Conference, MUC-7 (1998)
Google Scholar
Asahara, M., Matsumoto, Y.: Japanese Named Entity extraction with redundant morphological analysis. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, vol. 1, pp. 8–15. Association for Computational Linguistics, Stroudsburg (2003)
Chapter Google Scholar
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, CONLL 2003, vol. 4, pp. 188–191. Association for Computational Linguistics, Stroudsburg (2003)
Chapter Google Scholar
Carreras, X., Màrquez, L., Padró, L.: A simple named entity extractor using AdaBoost. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, CONLL 2003, vol. 4, pp. 152–155. Association for Computational Linguistics, Stroudsburg (2003)
Chapter Google Scholar
Schapire, R.E., Singer, Y.: Improved Boosting Algorithms Using Confidence-rated Predictions. Mach. Learn. 37(3), 297–336 (1999)
Article MATH Google Scholar
Faruqui, M., Padó, S.: Training and Evaluating a German Named Entity Recognizer with Semantic Generalization. In: Proceedings of KONVENS 2010, Saarbrücken, Germany (2010)
Google Scholar
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM 2007, pp. 233–242. ACM (2007)
Google Scholar
Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, pp. 509–518. ACM, New York (2008)
Google Scholar
Cilibrasi, R.L., Vitanyi, P.M.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)
Article Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, vol. 6, p. 12 (2007)
Google Scholar
Gabrilovich, E., Markovitch, S.: Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. In: AAAI, pp. 1301–1306 (2006)
Google Scholar
Sorg, P., Cimiano, P.: Cross-lingual Information Retrieval with Explicit Semantic Analysis. Working Notes of the Annual CLEF Meeting (2008)
Google Scholar
Potthast, M., Stein, B., Anderka, M.: A Wikipedia-Based Multilingual Retrieval Model. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 522–530. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute AIFB, Karlsruhe Institute of Technology, Germany
Lei Zhang, Achim Rettinger & Michael Färber
Faculty of Humanities and Social Sciences, University of Zagreb, Croatia
Marko Tadić

Authors

Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Achim Rettinger
View author publications
You can also search for this author in PubMed Google Scholar
Michael Färber
View author publications
You can also search for this author in PubMed Google Scholar
Marko Tadić
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for the Evaluation of Language and Communication Technologies (CELCT), via alla Cascata 56/c, 38123, Povo, Italy
Pamela Forner
HES-SO Valais, University of Applied Sciences Western Switzerland, Technopôle 3, 3960, Sierre, Switzerland
Henning Müller
Departamento de Sistemas Informáticos y Computación, Universitat Politècnica de València, Camino de Vera s/n, 46071, València, Spain
Roberto Paredes
Departamento de Sistemas Informáticos y Computación, Universitat Politècnica de València, Camino de Vera s/n, 46022, València, Spain
Paolo Rosso
Bauhaus-Universität Weimar, Bauhausstraße 11, 99423, Weimar, Germany
Benno Stein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, L., Rettinger, A., Färber, M., Tadić, M. (2013). A Comparative Evaluation of Cross-Lingual Text Annotation Techniques. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds) Information Access Evaluation. Multilinguality, Multimodality, and Visualization. CLEF 2013. Lecture Notes in Computer Science, vol 8138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40802-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-40802-1_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40801-4
Online ISBN: 978-3-642-40802-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comparative Evaluation of Cross-Lingual Text Annotation Techniques

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Semantic Annotation of Text Using Open Semantic Resources

Semantic Annotation of Text Using Open Semantic Resources

Text Annotation Tools: A Comprehensive Review and Comparative Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us