Skip to main content

A Combination System for Identifying Base Noun Phrase Correspondences

  • Conference paper
Advanced Methods for Computational Collective Intelligence

Part of the book series: Studies in Computational Intelligence ((SCI,volume 457))

Abstract

Bilingual Base Noun Phrase extraction is one of the key tasks of Natural Language Processing (NLP). This task is more challenges for the pair of English-Vietnamese because of the lack of available Vietnamese language resources such as robust NLP tools and annotated training data. This paper presents a bilingual dictionary-, a bilingual corpus- and knowledge-based method to identify Base Noun Phrase correspondences from a pair of English-Vietnamese bilingual sentences. Our method identifies anchor points of the Base Noun Phrase in English sentence, and then it performs alignment based on these anchor points. Our method not only overcomes the lack of resources of Vietnamese but also improves the performance of miss-alignment, null-alignment, overlap and conflict projection of the existing methods. The proposed technique can be easily applied to other language pairs. Experiment on 35,000 pairs of sentences in the English-Vietnamese bilingual corpus showed that our proposed method can obtain the accuracy of 78.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Quirk, R., Greenbaum, S.: A University Grammar of English. Longman Group Limited, London (1990)

    Google Scholar 

  2. Kupiec, J.: An Algorithm for finding Noun phrase Correspondences in Bilingual Corpora. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, Columbus, Ohio, USA, pp. 17–22 (1993)

    Google Scholar 

  3. Hung, T.V.: Enlish for Computerscience, Printed in Ho Chi Minh City, Vietnam (1995)

    Google Scholar 

  4. Buu, H.V.: Patterns of English, Printed in Ho Chi Minh City, Vietnam (1996)

    Google Scholar 

  5. Ker, S.J., Chang, J.S.: A Class-based Approach to Word Alignment. Computational Linguistics 23(2), 313–343 (1997)

    Google Scholar 

  6. Can, N.T.: Vietnamese syntax grammar, Printed in Hanoi, Vietnam (1999)

    Google Scholar 

  7. Wantanabe, H., Kurohashi, S., Aramaki, E.: Finding Structural Correspondences from Bilingual Parsed Corpus, IBM Research, Tokyo Research Laboratory (1999)

    Google Scholar 

  8. Yarowsky, D., Ngai, G.: Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection across Aligned Corpora. Johns Hopkins University Baltimore, MD (2001)

    Google Scholar 

  9. Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora. In: Proc. of NAACL 2001 (2001)

    Google Scholar 

  10. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics (2002)

    Google Scholar 

  11. Hwa, R., Resnik, P., Weinberg, A., Kolak, O.: Evaluating Translational Correspondence using Annotation Projection. In: The Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (2002)

    Google Scholar 

  12. Riloff, E., Schafer, C., Yarowsky, D.: Inducing Information Extraction Systems for New Languages via Cross-Language Projection. In: Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002 (2002)

    Google Scholar 

  13. Wang, W., Zhou, M.: Structure Alignment Using Bilingual Chunking. In: The 19th International Conference on Computational Linguistics, COLING 2002 (2002)

    Google Scholar 

  14. Koehn, P.: Noun Phrase Translation. Ph.D. dissertation, University of Southern California (2003)

    Google Scholar 

  15. Och, F.J., Ney, H.: A Systematic Comparision of Various Statistical Alignment Models. Association for Computational Linguistics (2003)

    Google Scholar 

  16. Rebecca, Vickes, S.: The Fahasa/Heinemann Illustrated Encyclopedia, vol. 1,2,3 (2003)

    Google Scholar 

  17. Dien, D., Kiem, H.: POS-Tagger for English-Vietnamese Bilingual Corpus. In: HLT-NAACL 2003 Workshop (2003)

    Google Scholar 

  18. Hwang, Y.S., Paik, K., Sasaki, Y.: Bilingual Knowledge Extraction Using Chunk Alignment. In: PACLIC 18, December 8-10. Waseda University, Tokyo (2004)

    Google Scholar 

  19. Deng, Y.: Bitext Alignment for Statistical Machine Translation. Ph.D. dissertation, Johns Hopkins University, Baltimore, Maryland (2005)

    Google Scholar 

  20. Chau, Q.N., Tuoi, T.P., Tru, H.C.: Vietnamese Proper Noun Recognition. In: Proceedings of the 4th IEEE International Conference on Computer Sciences Research, Innovation and Vision for the Future, Ho Chi Minh City, Vietnam (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hieu Chi Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, H.C. (2013). A Combination System for Identifying Base Noun Phrase Correspondences. In: Nguyen, N., Trawiński, B., Katarzyniak, R., Jo, GS. (eds) Advanced Methods for Computational Collective Intelligence. Studies in Computational Intelligence, vol 457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34300-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34300-1_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34299-8

  • Online ISBN: 978-3-642-34300-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics