A novel method for performance evaluation of text chunking

Maiti, Suchismita; Garain, Utpal; Dhar, Arnab; De, Sankar

doi:10.1007/s10579-013-9250-3

A novel method for performance evaluation of text chunking

Project Notes
Published: 13 August 2013

Volume 49, pages 215–226, (2015)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Suchismita Maiti¹,
Utpal Garain²,
Arnab Dhar³ &
…
Sankar De⁴

442 Accesses
Explore all metrics

Abstract

Evaluation of text chunking is revisited. The proposed method tries to analyze the errors made by a chunker and formulates an evaluation strategy that brings out the strength and weakness of a chunker in a better way than the existing precision, recall and F score based methods or their variants do. A tree-matching based algorithm of linear time complexity is designed, analyzed, and illustrated by giving examples. Correctness of the algorithm is checked by using a chunker and a set of test sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abney, S., & Abney, S. P. (1991). Parsing by chunk. In: R. C. Berwick, S. P. Abney & C. Tenny (Eds.), Principle-based parsing: Computation and Psycholinguistics. (pp. 257–278). Dordrecht: Kluwer Academic Publishers.
Bharti, A., Sangal, R., & Sharma, D. M. (2007). SSF: Shakti Standard Format Guide. Hyderabad: LTRC, IIIT.
Google Scholar
Bharti, A., Sharma, D. M., Husain, S., Bai, L., Begam, R., & Sangal, R. (2009). AnnCorra:TreeBanks for Indian Languages, Guidelines for Annotating Hindi TreeBank v2.0. Hyderabad: LTRC, IIIT.
Google Scholar
Biswas, S., Dhar, A., De, S., & Garain, U. (2010). Performance evaluation of text chunking. In Proceedings of the 8th international conference on natural language processing (ICON), Kharagpur, India.
Black, E., Abney, S., Flickenger, D., Gdaniec, C., Grishman, R., Harison, P., Hindle, D., Ingria, R., Jelineck, F., Klavan, J., Liberman, M., Marcus, M., Roukos, S., Santorini, B., & Strzalkozskijl, T. (1991). A procedure for quantitatively comparing the syntactic coverage of english grammars. In Proceedings of the 4th DARPA speech and natural language workshop, Morgan Kaufman, pp. 306–311.
Carroll, J., Briscoe, T., & Sanfilippo, A. (1998). Parser evaluation: A survey and a new proposal. In Proceedings of the 1st international conference language resources and evaluation (LREC), pp. 447–454.
Carroll, J., Frank, A., Lin, D., Prescher, D., & Uszkoreit, H. (2002). Beyond PARSEVAL—towards improved evaluation measures for parsing system. In Proceedings of 3rd international conference Language Resources and Evaluation (LREC).
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms (3rd ed.). Cambridge, MA: MIT Press.
Google Scholar
De, S., Dhar, A., Biswas, S., & Garain, U. (2011). On development and evaluation of a chunker in Bangla. In Proceedings of 2nd international conference on Emerging Applications of Information Technology (EAIT), pp. 321–324.
Husian, S., Mannem, P., Ambati, B., & Gadde, P. (2010). Proceedings of ICON10 NLP Tools Contest: Indian language dependency parsing. The 8th international conference on natural language processing (ICON), India.
Lin, D. (2003). Dependency-based evaluation of Minipar. In: A. Abeille (Ed.), Treebanks: Building and using parsed corpora (Chap. 18, Vol. 20, pp. 317–329). The Netherlands: Springer.
Manning, C. D., & Schutze, H. (1999). Foundation of statistical natural language processing. Cambridge, MA: MIT Press.
Google Scholar
Paroubek, P., Hamon, O., Clergerie, E., Grouin, C., & Vilnat, A. (2010). The second evaluation campaign of PASSAGE on parsing of French. In Proceedings of 7th international conference on language resources and evaluation (LREC), pp. 19–21.
Paroubek, P., Robba, I., Vilnat, A., & Ayache, C. (2008). Easy, evaluation of parsers of French: What are the results? In Proceedings of 6th international conference language resources and evaluation (LREC).
Roark, B. (2002). Evaluating parser accuracy using edit distance. In Proceedings of the beyond PARSEVAL workshop, 3rd international conference language resources and evaluation (LREC), pp. 30–36.
Sakoe, H., & Chiba, S. (1978), Dynamic programming algorithm optimization for spoken word recognition. In IEEE transactions on acoustics. Speech and signal processing, Vol. 2, pp. 43–49.
Sampson, G., & Babarczy, A. (2003). A test of the leaf-ancestor metric for parse accuracy. Journal of Natural Language Engineering, 9, 365–380.
Article Google Scholar
Sang Tjong Kim, E. F., & Buchholz, S. (2000) Introduction to the CoNLL-2000 shared task: Chunking. In Proceedings of CoNLL-2000 and LLL-2000 (pp. 127–132). Lisbon, Portugal.
Singh, A., Bendre, S. M., & Sangal, R. (2005), HMM based chunker for Hindi. In Proceedings 2nd International Joint Conference on Natural Language Processing (IJCNLP), Jeju Island, Republic of Korea.
Srinivas, B. (2000). A lightweight dependency analyzer for partial parsing. Natural Language Engineering, 6(2), 113–138.
Article Google Scholar
Srinivas, B., Doran, C., Hockey, B. A., & Joshi, A. (1996). An approach to robust partial parsing and evaluation metrics. In Proceedings of 8th european summer school in logic, language and information, pp. 70–82.
Zhang, K., & Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 18, 1245–1262.
Article Google Scholar

Download references

Acknowledgments

The authors sincerely thank the anonymous reviewers of this paper. We also express our gratitude to one of the reviewers who appreciated our work and pointed out its need for revisiting chunking in the context of noisy text (sms, tweet, blog, email, etc.) analysis.

Author information

Authors and Affiliations

Department of Information Technology, National Institute of Technology (NIT), Durgapur, WB, India
Suchismita Maiti
Indian Statistical Institute, 203, B.T. Road, Kolkata, 700108, India
Utpal Garain
Department of Computer Science and Engineering, Indian Institute of Technology (IIT), Kharagpur, WB, India
Arnab Dhar
Gupta College of Technological Sciences, Asansol, 713301, WB, India
Sankar De

Authors

Suchismita Maiti
View author publications
You can also search for this author inPubMed Google Scholar
Utpal Garain
View author publications
You can also search for this author inPubMed Google Scholar
Arnab Dhar
View author publications
You can also search for this author inPubMed Google Scholar
Sankar De
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Utpal Garain.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maiti, S., Garain, U., Dhar, A. et al. A novel method for performance evaluation of text chunking. Lang Resources & Evaluation 49, 215–226 (2015). https://doi.org/10.1007/s10579-013-9250-3

Download citation

Published: 13 August 2013
Issue Date: March 2015
DOI: https://doi.org/10.1007/s10579-013-9250-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel method for performance evaluation of text chunking

Abstract

Access this article

Subscribe and save

Buy Now

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now