Abstract
Bilingual Base Noun Phrase extraction is one of the key tasks of Natural Language Processing (NLP). This task is more challenges for the pair of English-Vietnamese because of the lack of available Vietnamese language resources such as robust NLP tools and annotated training data. This paper presents a bilingual dictionary-, a bilingual corpus- and knowledge-based method to identify Base Noun Phrase correspondences from a pair of English-Vietnamese bilingual sentences. Our method identifies anchor points of the Base Noun Phrase in English sentence, and then it performs alignment based on these anchor points. Our method not only overcomes the lack of resources of Vietnamese but also improves the performance of miss-alignment, null-alignment, overlap and conflict projection of the existing methods. The proposed technique can be easily applied to other language pairs. Experiment on 35,000 pairs of sentences in the English-Vietnamese bilingual corpus showed that our proposed method can obtain the accuracy of 78.5%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Quirk, R., Greenbaum, S.: A University Grammar of English. Longman Group Limited, London (1990)
Kupiec, J.: An Algorithm for finding Noun phrase Correspondences in Bilingual Corpora. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, Columbus, Ohio, USA, pp. 17–22 (1993)
Hung, T.V.: Enlish for Computerscience, Printed in Ho Chi Minh City, Vietnam (1995)
Buu, H.V.: Patterns of English, Printed in Ho Chi Minh City, Vietnam (1996)
Ker, S.J., Chang, J.S.: A Class-based Approach to Word Alignment. Computational Linguistics 23(2), 313–343 (1997)
Can, N.T.: Vietnamese syntax grammar, Printed in Hanoi, Vietnam (1999)
Wantanabe, H., Kurohashi, S., Aramaki, E.: Finding Structural Correspondences from Bilingual Parsed Corpus, IBM Research, Tokyo Research Laboratory (1999)
Yarowsky, D., Ngai, G.: Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection across Aligned Corpora. Johns Hopkins University Baltimore, MD (2001)
Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora. In: Proc. of NAACL 2001 (2001)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics (2002)
Hwa, R., Resnik, P., Weinberg, A., Kolak, O.: Evaluating Translational Correspondence using Annotation Projection. In: The Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (2002)
Riloff, E., Schafer, C., Yarowsky, D.: Inducing Information Extraction Systems for New Languages via Cross-Language Projection. In: Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002 (2002)
Wang, W., Zhou, M.: Structure Alignment Using Bilingual Chunking. In: The 19th International Conference on Computational Linguistics, COLING 2002 (2002)
Koehn, P.: Noun Phrase Translation. Ph.D. dissertation, University of Southern California (2003)
Och, F.J., Ney, H.: A Systematic Comparision of Various Statistical Alignment Models. Association for Computational Linguistics (2003)
Rebecca, Vickes, S.: The Fahasa/Heinemann Illustrated Encyclopedia, vol. 1,2,3 (2003)
Dien, D., Kiem, H.: POS-Tagger for English-Vietnamese Bilingual Corpus. In: HLT-NAACL 2003 Workshop (2003)
Hwang, Y.S., Paik, K., Sasaki, Y.: Bilingual Knowledge Extraction Using Chunk Alignment. In: PACLIC 18, December 8-10. Waseda University, Tokyo (2004)
Deng, Y.: Bitext Alignment for Statistical Machine Translation. Ph.D. dissertation, Johns Hopkins University, Baltimore, Maryland (2005)
Chau, Q.N., Tuoi, T.P., Tru, H.C.: Vietnamese Proper Noun Recognition. In: Proceedings of the 4th IEEE International Conference on Computer Sciences Research, Innovation and Vision for the Future, Ho Chi Minh City, Vietnam (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nguyen, H.C. (2013). A Combination System for Identifying Base Noun Phrase Correspondences. In: Nguyen, N., Trawiński, B., Katarzyniak, R., Jo, GS. (eds) Advanced Methods for Computational Collective Intelligence. Studies in Computational Intelligence, vol 457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34300-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-34300-1_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34299-8
Online ISBN: 978-3-642-34300-1
eBook Packages: EngineeringEngineering (R0)