ABSTRACT
Natural language parsers are the basis for further understanding the content written in natural language. Parsers for natural language have been shown to be effective in many NLP tasks, such as, machine translation, sentiment analysis and classification of documents. The existing state-of-the-art parsers, such as Charniak [9], Collins [11], Stanford, OpenNLP, have been shown to have F Score ranging from 85 to 92 percent. The accuracy of the parsers is hampered to a major extent by unknown and unseen words. In this paper we show a novel method on improving the accuracy by incorporating knowledge about the unknown words from external source. Experimental results show our technique improves accuracy. The improvement depends on number of known words present in the model during training. We show that we achieve above one percent improvement on some parsers.
- How many words are there in the english language. https://en.oxforddictionaries.com/explore/how-many-words-are-there-in-the-english-language/.Google Scholar
- Updates to the oed. https://public.oed.com/updates//.Google Scholar
- Wordnet. https://wordnet.princeton.edu/.Google Scholar
- Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceedings of the Empirical Methods in Natural Language Processing Conference, 1996.Google Scholar
- James Allen. Natural Language Understanding. THE BEN-JAMIN/CUMMINGS PUBLISHING COMPANY, INC., 1987.Google Scholar
- Daniel M. Bikel. Intricacies of collins' parsing model, 2004.Google ScholarDigital Library
- Ezra Black, Fred Jelinek, John Lafferty, David M. Magerman, Robert Mercer, and Salim Roukos. Towards history-based grammars: Using richer models for probabilistic parsing. In Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, ACL '93, pages 31--37, Stroudsburg, PA, USA, 1993. Association for Computational Linguistics.Google ScholarDigital Library
- Eugene Charniak. Statistical parsing with a context-free grammar and word statistics. Proceedings of the 14th National Conference on Artificial Intelligence, 1997.Google Scholar
- Eugene Charniak. A Maximum-Entropy-Inspired Parser. 1st North American chapter of the Association for Computational Linguistics conference (NAACL' 2000), 2000.Google Scholar
- Danqi Chen and Christopher Manning. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.Google ScholarCross Ref
- Michael Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, 1999.Google ScholarDigital Library
- Michael Collins. Head-driven statistical models for natural language parsing, 2003.Google ScholarDigital Library
- Kyle D Dent and Sharoda A Paul. Through the twitter glass: Detecting questions in micro-text. Analyzing Microtext, 11:05, 2011.Google Scholar
- Evangelos Dermatas and George Kokkinakis. Automatic stochastic tagging of natural language texts. Comput. Linguist., 21(2):137--163, June 1995.Google ScholarDigital Library
- Jason Eisner and Giorgio Satta. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 457--464. Association for Computational Linguistics, 1999.Google ScholarDigital Library
- F. Jelinek, J. Lafferty, D. Magerman, R. Mercer, A. Ratnaparkhi, and S. Roukos. Decision tree parsing using a hidden derivation model. In Proceedings of the Workshop on Human Language Technology, HLT '94, pages 272--277, Stroudsburg, PA, USA, 1994. Association for Computational Linguistics.Google ScholarDigital Library
- Daniel Jurafsky. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Computational Linguistics, 2000.Google Scholar
- Adam Kilgarriff and Christiane Fellbaum. WordNet: An Electronic Lexical Database. Language, 2000.Google Scholar
- Dan Klein and Christopher D Manning. Fast extract inference with a factored model for natural language parsing. Advances in Neural Information Processing Systems 15 (NIPS 2002), 2003.Google Scholar
- David M. Magerman. Learning grammatical structure using statistical decision-trees. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1996.Google ScholarCross Ref
- Mitchell Marcus, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. The penn treebank: annotating predicate argument structure. In Proceedings of the workshop on Human Language Technology, pages 114--119. Association for Computational Linguistics, 1994.Google ScholarDigital Library
- Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):313--330, 1993.Google ScholarDigital Library
- Hermann Ney. Dynamic programming parsing for context-free grammars in continuous speech recognition. IEEE Transactions on Signal Processing, 39(2):336--340, 1991.Google ScholarDigital Library
- Joakim Nivre. Parsing with pcfgs. 2013.Google Scholar
- Steven Pinker. Language learnability and language development, with new commentary by the author, volume 7. Harvard University Press, 2009.Google ScholarCross Ref
- Sujith Ravi, Kevin Knight, and Radu Soricut. Automatic Prediction of Parser Accuracy. Computational Linguistics, 2008.Google Scholar
- R Socher and Cc Lin. Parsing natural scenes and natural language with recursive neural networks. International Conference on Machine Learning, 2011.Google ScholarDigital Library
- Andreas Stolcke. An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational linguistics, 21(2):165--201, 1995.Google ScholarDigital Library
- R. Thompson and T. Booth. Applying probability measures to abstract languages. IEEE Transactions on Computers, 22:442--450, 05 1973.Google Scholar
- R Weischedel, R Schwartz, J Palmucci, M Meteer, and L Ramshaw. Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics, 1993.Google Scholar
Index Terms
- Improving Natural Language Parser Accuracy by Unknown Word Replacement
Recommendations
Bottom-up context-sensitive algorithms for Bengali parser in natural language processing
This paper embodies the design of parsing algorithms tangibly for a Bengali parser. To design parsing algorithms a detailed study on linguistics and grammar has been performed. A detailed study also has been made on the various techniques and algorithms ...
A conceptual dependency parser for natural language
COLING '69: Proceedings of the 1969 conference on Computational linguisticsThis paper describes an operable automatic parser for natural language. The parser is not concerned with producing the syntactic structure of an input sentence. Instead, it is a conceptual parser, concerned with determining the underlying meaning of the ...
An efficient parser generator for natural language
COLING '94: Proceedings of the 15th conference on Computational linguistics - Volume 1We have developed a parser generator for natural language processing. The generator named "NLyacc" accepts grammar rules written in the Yacc format. NLyacc, unlike Yacc, can handle arbitrary context-free grammars using the generalized LR parsing ...
Comments