short-paper

Improving Natural Language Parser Accuracy by Unknown Word Replacement

Authors:
Raihan Kibria

American International University-Bangladesh, Bangladesh

American International University-Bangladesh, Bangladesh
View Profile

,
Khandaker Tabin Hasan

American International University-Bangladesh, Bangladesh

American International University-Bangladesh, Bangladesh
View Profile

ICCA 2020: Proceedings of the International Conference on Computing AdvancementsJanuary 2020Article No.: 3Pages 1–7https://doi.org/10.1145/3377049.3377124

Published:20 March 2020Publication History

ICCA 2020: Proceedings of the International Conference on Computing Advancements

Pages 1–7

ABSTRACT

Natural language parsers are the basis for further understanding the content written in natural language. Parsers for natural language have been shown to be effective in many NLP tasks, such as, machine translation, sentiment analysis and classification of documents. The existing state-of-the-art parsers, such as Charniak [9], Collins [11], Stanford, OpenNLP, have been shown to have F Score ranging from 85 to 92 percent. The accuracy of the parsers is hampered to a major extent by unknown and unseen words. In this paper we show a novel method on improving the accuracy by incorporating knowledge about the unknown words from external source. Experimental results show our technique improves accuracy. The improvement depends on number of known words present in the model during training. We show that we achieve above one percent improvement on some parsers.

References

How many words are there in the english language. https://en.oxforddictionaries.com/explore/how-many-words-are-there-in-the-english-language/.Google Scholar
Updates to the oed. https://public.oed.com/updates//.Google Scholar
Wordnet. https://wordnet.princeton.edu/.Google Scholar
Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceedings of the Empirical Methods in Natural Language Processing Conference, 1996.Google Scholar
James Allen. Natural Language Understanding. THE BEN-JAMIN/CUMMINGS PUBLISHING COMPANY, INC., 1987.Google Scholar
Daniel M. Bikel. Intricacies of collins' parsing model, 2004.Google ScholarDigital Library
Ezra Black, Fred Jelinek, John Lafferty, David M. Magerman, Robert Mercer, and Salim Roukos. Towards history-based grammars: Using richer models for probabilistic parsing. In Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, ACL '93, pages 31--37, Stroudsburg, PA, USA, 1993. Association for Computational Linguistics.Google ScholarDigital Library
Eugene Charniak. Statistical parsing with a context-free grammar and word statistics. Proceedings of the 14th National Conference on Artificial Intelligence, 1997.Google Scholar
Eugene Charniak. A Maximum-Entropy-Inspired Parser. 1st North American chapter of the Association for Computational Linguistics conference (NAACL' 2000), 2000.Google Scholar
Danqi Chen and Christopher Manning. A Fast and Accurate Dependency Parser using Neural Networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.Google ScholarCross Ref
Michael Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, 1999.Google ScholarDigital Library
Michael Collins. Head-driven statistical models for natural language parsing, 2003.Google ScholarDigital Library
Kyle D Dent and Sharoda A Paul. Through the twitter glass: Detecting questions in micro-text. Analyzing Microtext, 11:05, 2011.Google Scholar
Evangelos Dermatas and George Kokkinakis. Automatic stochastic tagging of natural language texts. Comput. Linguist., 21(2):137--163, June 1995.Google ScholarDigital Library
Jason Eisner and Giorgio Satta. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 457--464. Association for Computational Linguistics, 1999.Google ScholarDigital Library
F. Jelinek, J. Lafferty, D. Magerman, R. Mercer, A. Ratnaparkhi, and S. Roukos. Decision tree parsing using a hidden derivation model. In Proceedings of the Workshop on Human Language Technology, HLT '94, pages 272--277, Stroudsburg, PA, USA, 1994. Association for Computational Linguistics.Google ScholarDigital Library
Daniel Jurafsky. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Computational Linguistics, 2000.Google Scholar
Adam Kilgarriff and Christiane Fellbaum. WordNet: An Electronic Lexical Database. Language, 2000.Google Scholar
Dan Klein and Christopher D Manning. Fast extract inference with a factored model for natural language parsing. Advances in Neural Information Processing Systems 15 (NIPS 2002), 2003.Google Scholar
David M. Magerman. Learning grammatical structure using statistical decision-trees. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1996.Google ScholarCross Ref
Mitchell Marcus, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. The penn treebank: annotating predicate argument structure. In Proceedings of the workshop on Human Language Technology, pages 114--119. Association for Computational Linguistics, 1994.Google ScholarDigital Library
Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):313--330, 1993.Google ScholarDigital Library
Hermann Ney. Dynamic programming parsing for context-free grammars in continuous speech recognition. IEEE Transactions on Signal Processing, 39(2):336--340, 1991.Google ScholarDigital Library
Joakim Nivre. Parsing with pcfgs. 2013.Google Scholar
Steven Pinker. Language learnability and language development, with new commentary by the author, volume 7. Harvard University Press, 2009.Google ScholarCross Ref
Sujith Ravi, Kevin Knight, and Radu Soricut. Automatic Prediction of Parser Accuracy. Computational Linguistics, 2008.Google Scholar
R Socher and Cc Lin. Parsing natural scenes and natural language with recursive neural networks. International Conference on Machine Learning, 2011.Google ScholarDigital Library
Andreas Stolcke. An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational linguistics, 21(2):165--201, 1995.Google ScholarDigital Library
R. Thompson and T. Booth. Applying probability measures to abstract languages. IEEE Transactions on Computers, 22:442--450, 05 1973.Google Scholar
R Weischedel, R Schwartz, J Palmucci, M Meteer, and L Ramshaw. Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics, 1993.Google Scholar

Index Terms

Improving Natural Language Parser Accuracy by Unknown Word Replacement
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Bottom-up context-sensitive algorithms for Bengali parser in natural language processing

This paper embodies the design of parsing algorithms tangibly for a Bengali parser. To design parsing algorithms a detailed study on linguistics and grammar has been performed. A detailed study also has been made on the various techniques and algorithms ...
Read More
A conceptual dependency parser for natural language
COLING '69: Proceedings of the 1969 conference on Computational linguistics

This paper describes an operable automatic parser for natural language. The parser is not concerned with producing the syntactic structure of an input sentence. Instead, it is a conceptual parser, concerned with determining the underlying meaning of the ...
Read More
An efficient parser generator for natural language
COLING '94: Proceedings of the 15th conference on Computational linguistics - Volume 1

We have developed a parser generator for natural language processing. The generator named "NLyacc" accepts grammar rules written in the Yacc format. NLyacc, unlike Yacc, can handle arbitrary context-free grammars using the generalized LR parsing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICCA 2020: Proceedings of the International Conference on Computing Advancements
January 2020
517 pages
ISBN:9781450377782
DOI:10.1145/3377049

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 March 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- short-paper
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 46
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving Natural Language Parser Accuracy by Unknown Word Replacement

ICCA 2020: Proceedings of the International Conference on Computing Advancements

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bottom-up context-sensitive algorithms for Bengali parser in natural language processing

A conceptual dependency parser for natural language

An efficient parser generator for natural language

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improving Natural Language Parser Accuracy by Unknown Word Replacement

ICCA 2020: Proceedings of the International Conference on Computing Advancements

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bottom-up context-sensitive algorithms for Bengali parser in natural language processing

A conceptual dependency parser for natural language

An efficient parser generator for natural language

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media