Abstract
Natural language parsing, as one of the central tasks in natural language processing, is widely used in many AI fields. In this paper, we address an issue of parser performance evaluation, particularly its variation across datasets. We propose three simple statistical measures to characterize the datasets and also evaluate their correlation to the parser performance. The results clearly show that different parsers have different performance variation and sensitivity against these measures. The method can be used to guide the choice of natural language parsers for new domain applications, as well as systematic combination for better parsing accuracy.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Miyao, Y., Sagae, K., Sætre, R., Matsuzaki, T., Tsujii, J.: Evaluating Contributions of Natural Language Parsers to Protein-Protein Interaction Extraction. Journal of Bioinformatics 25(3), 394–400 (2009)
Moore, R.K.: Spoken language processing: piecing together the puzzle. Speech Communication: Special Issue on Bridging the Gap Between Human and Automatic Speech Processing 49, 418–435 (2007)
Bacchiani, M., Riley, M., Roark, B., Sproat, R.: Map adaptation of stochastic grammars. Computer speech and language 20(1), 41–68 (2006)
McClosky, D., Charniak, E., Johnson, M.: Reranking and self-training for parser adaptation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 337–344 (2006)
McClosky, D., Charniak, E., Johnson, M.: When is self-training effective for parsing? In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 561–568 (2008)
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of EMNLP-CoNLL 2007, Prague, Czech Republic, pp. 915–932 (2007)
Hara, T., Miyao, Y., Tsujii, J.: Adapting a probabilistic disambiguation model of an HPSG parser to a new domain. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 199–210. Springer, Heidelberg (2005)
Rimell, L., Clark, S.: Porting a Lexicalized-Grammar Parser to the Biomedical Domain. Journal of Biomedical Informatics (in press, 2009)
Plank, B.: Structural Correspondence Learning for Parse Disambiguation. In: Proceedings of the Student Research Workshop at EACL 2009, Athens, Greece, pp. 37–45 (2009)
Dredze, M., Blitzer, J., Pratim Talukdar, P., Ganchev, K., Graca, J.a., Pereira, F.: Frustratingly hard domain adaptation for dependency parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007. Association for Computational Linguistics, Prague, June 2007, pp. 1051–1055 (2007)
Bikel, D.M.: Intricacies of Collins’ parsing model. Computational Linguistics 30, 479–511 (2004)
Collins, M.: Three Generative, Lexicalised Models for Statistical Parsing. In: Proceedings of the 35th annual meeting of the association for computational linguistics, Madrid, Spain, pp. 16–23 (1997)
Klein, D., Manning, C.D.: Accurate Unlexicalized Parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp. 423–430 (2003)
McDonald, R., Pereira, F., Ribarov, K., Hajic, J.: Non-Projective Dependency Parsing using Spanning Tree Algorithms. In: Proceedings of HLT-EMNLP 2005, Vancouver, Canada, pp. 523–530 (2005)
Nivre, J., Nilsson, J., Hall, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., Marsi, E.: Maltparser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13(1), 1–41 (2007)
Flickinger, D.: On building a more efficient grammar by exploiting types. In: Oepen, S., Flickinger, D., Tsujii, J., Uszkoreit, H. (eds.) Collaborative Language Engineering, pp. 1–17. CSLI Publications, Stanford (2002)
Callmeier, U.: Efficient parsing with large-scale unification grammars. Master’s thesis, Universität des Saarlandes, Saarbrücken, Germany (2001)
Baldwin, T., Bender, E.M., Flickinger, D., Kim, A., Oepen, S.: Road-testing the English Resource Grammar over the British National Corpus. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, Portugal (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Y., Wang, R. (2009). Correlating Natural Language Parser Performance with Statistical Measures of the Text. In: Mertsching, B., Hund, M., Aziz, Z. (eds) KI 2009: Advances in Artificial Intelligence. KI 2009. Lecture Notes in Computer Science(), vol 5803. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04617-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-04617-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04616-2
Online ISBN: 978-3-642-04617-9
eBook Packages: Computer ScienceComputer Science (R0)