Skip to main content

Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora

  • Conference paper
  • First Online:
Text, Speech and Dialogue (TSD 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1902))

Included in the following conference series:

Abstract

In this paper we apply the ensemble approach to the identification of incorrectly annotated items (noise) in a training set. In a controlled experiment, memory-based, decision tree-based and transformation-based classifiers are used as a filter to detect and remove noise deliberately introduced into a manually tagged corpus. The results indicate that the method can be successfully applied to automatically detect errors in a corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. In Computational Linguistics. 21:4. (1995).

    Google Scholar 

  2. Brill, E. & Wu, Y.: Classifier Combination for Improved Lexical Disambiguation. In Proceedings of COLING-ACL’98, 1, pp. 191–195. (1998).

    Google Scholar 

  3. Brodley, C. E., & Friedl, M. A.: Identifying and eliminating mislabelled training instances. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 799–805 Portland, OR. AAAI Press. (1996).

    Google Scholar 

  4. Brodley, C. E., Friedl, M. A.: Identifying Mislabelled Training Data. Journal of Artificial Intelligence Research, 11, pp. 131–167. (1999).

    MATH  Google Scholar 

  5. Daelemans, W., Zavrel, J., Sloot, K, Bosch, A.: TiMBL: Tilburg Memory Based Learner. Reference Guide. ILK Technical Report-ILK 99-01. (1999).

    Google Scholar 

  6. Quinlan, J. R.: Induction of decision trees. Machine Learning, 1(1), 81–106. (1986).

    Google Scholar 

  7. Sampson, G.: English for the Computer. Oxford University Press. (1995).

    Google Scholar 

  8. Zavrel, J and Daelemans, W.: Recent Advances in Memory-Based Part-of-Speech Tagging. VI Simposio Internacional de Comunicacion Social, Santiago de Cuba, pp. 590–597, 1999. ILK-9903. (1999).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Berthelsen, H., Megyesi, B. (2000). Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2000. Lecture Notes in Computer Science(), vol 1902. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45323-7_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-45323-7_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41042-3

  • Online ISBN: 978-3-540-45323-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics