skip to main content
10.1145/3377049.3377124acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccaConference Proceedingsconference-collections
short-paper

Improving Natural Language Parser Accuracy by Unknown Word Replacement

Published:20 March 2020Publication History

ABSTRACT

Natural language parsers are the basis for further understanding the content written in natural language. Parsers for natural language have been shown to be effective in many NLP tasks, such as, machine translation, sentiment analysis and classification of documents. The existing state-of-the-art parsers, such as Charniak [9], Collins [11], Stanford, OpenNLP, have been shown to have F Score ranging from 85 to 92 percent. The accuracy of the parsers is hampered to a major extent by unknown and unseen words. In this paper we show a novel method on improving the accuracy by incorporating knowledge about the unknown words from external source. Experimental results show our technique improves accuracy. The improvement depends on number of known words present in the model during training. We show that we achieve above one percent improvement on some parsers.

References

  1. How many words are there in the english language. https://en.oxforddictionaries.com/explore/how-many-words-are-there-in-the-english-language/.Google ScholarGoogle Scholar
  2. Updates to the oed. https://public.oed.com/updates//.Google ScholarGoogle Scholar
  3. Wordnet. https://wordnet.princeton.edu/.Google ScholarGoogle Scholar
  4. Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceedings of the Empirical Methods in Natural Language Processing Conference, 1996.Google ScholarGoogle Scholar
  5. James Allen. Natural Language Understanding. THE BEN-JAMIN/CUMMINGS PUBLISHING COMPANY, INC., 1987.Google ScholarGoogle Scholar
  6. Daniel M. Bikel. Intricacies of collins' parsing model, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ezra Black, Fred Jelinek, John Lafferty, David M. Magerman, Robert Mercer, and Salim Roukos. Towards history-based grammars: Using richer models for probabilistic parsing. In Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, ACL '93, pages 31--37, Stroudsburg, PA, USA, 1993. Association for Computational Linguistics.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eugene Charniak. Statistical parsing with a context-free grammar and word statistics. Proceedings of the 14th National Conference on Artificial Intelligence, 1997.Google ScholarGoogle Scholar
  9. Eugene Charniak. A Maximum-Entropy-Inspired Parser. 1st North American chapter of the Association for Computational Linguistics conference (NAACL' 2000), 2000.Google ScholarGoogle Scholar
  10. Danqi Chen and Christopher Manning. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.Google ScholarGoogle ScholarCross RefCross Ref
  11. Michael Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Michael Collins. Head-driven statistical models for natural language parsing, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kyle D Dent and Sharoda A Paul. Through the twitter glass: Detecting questions in micro-text. Analyzing Microtext, 11:05, 2011.Google ScholarGoogle Scholar
  14. Evangelos Dermatas and George Kokkinakis. Automatic stochastic tagging of natural language texts. Comput. Linguist., 21(2):137--163, June 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jason Eisner and Giorgio Satta. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 457--464. Association for Computational Linguistics, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Jelinek, J. Lafferty, D. Magerman, R. Mercer, A. Ratnaparkhi, and S. Roukos. Decision tree parsing using a hidden derivation model. In Proceedings of the Workshop on Human Language Technology, HLT '94, pages 272--277, Stroudsburg, PA, USA, 1994. Association for Computational Linguistics.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Daniel Jurafsky. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Computational Linguistics, 2000.Google ScholarGoogle Scholar
  18. Adam Kilgarriff and Christiane Fellbaum. WordNet: An Electronic Lexical Database. Language, 2000.Google ScholarGoogle Scholar
  19. Dan Klein and Christopher D Manning. Fast extract inference with a factored model for natural language parsing. Advances in Neural Information Processing Systems 15 (NIPS 2002), 2003.Google ScholarGoogle Scholar
  20. David M. Magerman. Learning grammatical structure using statistical decision-trees. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1996.Google ScholarGoogle ScholarCross RefCross Ref
  21. Mitchell Marcus, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. The penn treebank: annotating predicate argument structure. In Proceedings of the workshop on Human Language Technology, pages 114--119. Association for Computational Linguistics, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):313--330, 1993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hermann Ney. Dynamic programming parsing for context-free grammars in continuous speech recognition. IEEE Transactions on Signal Processing, 39(2):336--340, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Joakim Nivre. Parsing with pcfgs. 2013.Google ScholarGoogle Scholar
  25. Steven Pinker. Language learnability and language development, with new commentary by the author, volume 7. Harvard University Press, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  26. Sujith Ravi, Kevin Knight, and Radu Soricut. Automatic Prediction of Parser Accuracy. Computational Linguistics, 2008.Google ScholarGoogle Scholar
  27. R Socher and Cc Lin. Parsing natural scenes and natural language with recursive neural networks. International Conference on Machine Learning, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Andreas Stolcke. An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational linguistics, 21(2):165--201, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Thompson and T. Booth. Applying probability measures to abstract languages. IEEE Transactions on Computers, 22:442--450, 05 1973.Google ScholarGoogle Scholar
  30. R Weischedel, R Schwartz, J Palmucci, M Meteer, and L Ramshaw. Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics, 1993.Google ScholarGoogle Scholar

Index Terms

  1. Improving Natural Language Parser Accuracy by Unknown Word Replacement

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICCA 2020: Proceedings of the International Conference on Computing Advancements
      January 2020
      517 pages
      ISBN:9781450377782
      DOI:10.1145/3377049

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 March 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader