Abstract
Transformation-based learning (TBL) is a machine learning method for, in particular, sequential classification, invented by Eric Brill [Brill 1993b, 1995a]. It is widely used within computational linguistics and natural language processing, but surprisingly little in other areas.
TBL is a simple yet flexible paradigm, which achieves competitive or even state-of-the-art performance in several areas and does not overtrain easily. It is especially successful at catching local, fixed-distance dependencies and seamlessly exploits information from heterogeneous discrete feature types. The learned representation—an ordered list of transformation rules—is compact and efficient, with clear semantics. Individual rules are interpretable and often meaningful to humans.
The present article offers a survey of the most important theoretical work on TBL, addressing a perceived gap in the literature. Because the method should be useful also outside the world of computational linguistics and natural language processing, a chief aim is to provide an informal but relatively comprehensive introduction, readable also by people coming from other specialities.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, When Errors Become the Rule: Twenty Years with Transformation-Based Learning
- Harold Abelson and Gerald J. Sussman. 1996. Structure and Interpretation of Computer Programs. MIT Press, Cambridge. Google ScholarDigital Library
- John Aberdeen, John Burger, David Day, Lynette Hirschman, Patricia Robinson, and Marc Vilain. 1995. MITRE: description of the Alembic system used for MUC-6. In Proceedings of the 6th Conference on Message Understanding. Association for Computational Linguistics, 141--155. Google ScholarDigital Library
- Chinatsu Aone and Kevin Hausman. 1996. Unsupervised learning of a rule-based Spanish part of speech tagger. In Proceedings of the 16th Conference on Computational Linguistics, Vol. 1. Association for Computational Linguistics, 53--58. Google ScholarDigital Library
- Nezip F. Ayan, Bonnie J. Dorr, and Christof Monz. 2005. Alignment link projection using transformation-based learning. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 185--192. Google ScholarDigital Library
- Lalit R. Bahl, Peter F. Brown, Peter V. de Souza, and Robert L. Mercer. 1989. A tree-based statistical language model for natural language speech recognition. Acoustics, Speech and Signal Processing, IEEE Transactions 37, 7 (1989), 1001--1008.Google Scholar
- Markus Becker. 1998. Unsupervised part of speech tagging with extended templates. In Proceedings of ESSLLI 1998, Student Session.Google Scholar
- Gosse Bouma. 2000. A finite state and data oriented method for grapheme to phoneme conversion. In NAACL-2000. 303--310. Google ScholarDigital Library
- Gosse Bouma. 2003. Finite state methods for hyphenation. Natural Language Engineering 9 (2003), 5--20. DOI:http://dx.doi.org/10.1017/S1351324903003073 Google ScholarDigital Library
- Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (1996), 123--140. Google ScholarDigital Library
- Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone. 1984. Classification and Regression Trees. Wadsworth and Brooks, Monterrey, CA.Google Scholar
- Eric Brill. 1993a. Automatic grammar induction and parsing free text: A transformation-based approach. In Proceedings of the Workshop on Human Language Technology. Association for Computational Linguistics, 237--242. Google ScholarDigital Library
- Eric Brill. 1993b. A Corpus-Based Approach to Language Learning. Ph.D. Dissertation. University of Pennsylvania, Philadelphia, PA. Google ScholarDigital Library
- Eric Brill. 1994. Some advances in transformation-based part of speech tagging. In Proceedings of the 12th National Conference on Artificial Intelligence. Arxiv preprint cmp-lg/9406010 (1994), 722--727. Google ScholarDigital Library
- Eric Brill. 1995a. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21, 4 (1995), 543--565. Google ScholarDigital Library
- Eric Brill. 1995b. Unsupervised learning of disambiguation rules for part of speech tagging. In Proceedings of the 3rd Workshop on Very Large Corpora, Vol. 30. 1--13.Google Scholar
- Eric Brill. 1996. Learning to parse with transformations. In Recent Advances in Parsing Technology. Kluwer.Google Scholar
- Eric Brill and Philip Resnik. 1994. A rule-based approach to prepositional phrase attachment disambiguation. In Proceedings of COLING'94. 1198--1204. Google ScholarDigital Library
- Eric Brill and Jun Wu. 1998. Classifier combination for improved lexical disambiguation. In Proceedings of the 17th International Conference on Computational Linguistics, Vol. 1. Association for Computational Linguistics, 191--195. Google ScholarDigital Library
- Björn Bringmann, Stefan Kramer, Friedrich Neubarth, Hannes Pirker, and Gerhard Widmer. 2002. Transformation-based regression. In Machine Learning: International Workshop then Conference. Citeseer, 59--66. Google ScholarDigital Library
- Sandra Carberry, K. Vijay-Shanker, Andrew Wilson, and Ken Samuel. 2001. Randomized rule selection in transformation-based learning: A comparative study. Natural Language Engineering 7, 2 (2001), 99--116. Google ScholarDigital Library
- John Carroll, Ted Briscoe, and Antonio Sanfilippo. 1998. Parser evaluation: A survey and a new proposal. In Proceedings of the 1st International Conference on Language Resources and Evaluation. 447--454.Google Scholar
- Rich Caruana. 1997. Multitask learning. Machine Learning 28, 1 (1997), 41--75. Google ScholarDigital Library
- James R. Curran and Raymond K. Wong. 1999. Transformation-based learning for automatic translation from HTML to XML. In Proceedings of the 4th Australasian Document Computing Symposium (ADCS99). Citeseer.Google Scholar
- James R. Curran and Raymond K. Wong. 2000. Formalization of transformation-based learning. In ACSC. IEEE Computer Society, 51--57.Google Scholar
- Walter Daelemans. 1995. Memory-based lexical acquisition and processing. In Machine Translation and the Lexicon, P. Steffens (Ed.). Springer, Berlin, 85--98. Google ScholarDigital Library
- David Day, John Aberdeen, Lynette Hirschman, Robyn Kozierok, Patricia Robinson, and Marc Vilain. 1997. Mixed-initiative development of language processing systems. In Proceedings of the Fifth Conference on Applied Natural Language Processing. Association for Computational Linguistics, 348--355. Google ScholarDigital Library
- Luca Dini, Vittorio Di Tomaso, and Frédérique Segond. 1998. Error driven word sense disambiguation. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Vol. 1. Association for Computational Linguistics, 320--324. Google ScholarDigital Library
- Cícero N. dos Santos. 2009. Entropy Guided Transformation Learning. Ph.D. Dissertation. Pontifícia Universidade Católica do Rio de Janeiro.Google Scholar
- Cícero N. dos Santos and Ruy L. Milidiú. 2007. Probabilistic classifications with TBL. In Computational Linguistics and Intelligent Text Processing, Alexander Gelbukh (Ed.). Lecture Notes in Computer Science, Vol. 4394. Springer, Berlin, 196--207. Google ScholarDigital Library
- Cícero N. dos Santos and Ruy L. Milidiú. 2009. Entropy guided transformation learning. Foundations of Computational Intelligence. 1, (2009), 159--184.Google Scholar
- Cícero N. dos Santos, Ruy L. Milidiú, Carlos E. M. Crestana, and Eraldo R. Fernandes. 2010. ETL Ensembles for Chunking, NER and SRL. In Computational Linguistics and Intelligent Text Processing, Alexander Gelbukh (Ed.). Lecture Notes in Computer Science, Vol. 6008. Springer, Berlin, 100--112. Google ScholarDigital Library
- Cícero N. dos Santos, Ruy L. Milidiú, and Raúl Rentería. 2008. Portuguese part-of-speech tagging using entropy guided transformation learning. In Computational Processing of the Portuguese Language, António Teixeira, Vera de Lima, Luís de Oliveira, and Paulo Quaresma (Eds.). Lecture Notes in Computer Science, Vol. 5190. Springer, Berlin, 143--152. Google ScholarDigital Library
- Cícero N. dos Santos and Claudia Oliveira. 2005. Constrained atomic term: Widening the reach of rule templates in transformation based learning. In EPIA(Lecture Notes in Computer Science), Carlos Bento, Amílcar Cardoso, and Gaël Dias (Eds.), Vol. 3808. Springer, 622--633. DOI:http://dx.doi.org/10.1007/ 11595014_61 Google ScholarDigital Library
- Philip Edmonds. 2002. SENSEVAL: The evaluation of word sense disambiguation systems. ELRA Newsletter 7, 3 (2002), 5--14.Google Scholar
- Eraldo R. Fernandes, Cícero N. dos Santos, and Ruy L. Milidiú. 2010. A machine learning approach to Portuguese clause identification. Computational Processing of the Portuguese Language (2010), 55--64. Google ScholarDigital Library
- Radu Florian. 2002a. Named entity recognition as a house of cards: Classifier stacking. In Proceedings of the 6th Conference on Natural Language Learning. 1--4. Google ScholarDigital Library
- Radu Florian. 2002b. Transformation Based Learning and Data-Driven Lexical Disambiguation. Syntactic and Semantic Ambiguity Resolution. Ph.D. Dissertation, Johns Hopkins University. Google ScholarDigital Library
- Radu Florian, John Henderson, and Grace Ngai. 2000. Coaxing confidences from an old friend: Probabilistic classifications from transformation rule lists. In Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora. Association for Computational Linguistics, 26--34. Google ScholarDigital Library
- Radu Florian, Abe Ittycheriah, Hongyan Jing, and Tong Zhang. 2003. Named entity recognition through classifier combination. In Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, Vol. 4. Association for Computational Linguistics, 171. Google ScholarDigital Library
- Radu Florian and Grace Ngai. 2001. Multidimensional transformation-based learning, In Proceedings of the 5th Workshop on Computational Language Learning (CoNLL-2001). CoRR cs.CL/0107021 (2001). Google ScholarDigital Library
- Cameron Fordyce. 1998. Prosody Prediction for Speech Synthesis Using Transformational Rule-Based Learning. Master's Thesis, Boston University.Google Scholar
- Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research 4 (2003), 933--969. Google ScholarDigital Library
- William A. Gale, Kenneth W. Church, and David Yarowsky. 1992. A method for disambiguating word senses in a large corpus. Computers and the Humanities 26, 5--6 (1992), 415--439.Google ScholarCross Ref
- Daniel Hardt. 1998. Improving ellipsis resolution with transformation-based learning. In AAAI Fall Symposium.Google Scholar
- Daniel Hardt. 2001. Transformation-based learning of Danish grammar correction. In Proceedings of RANLP 2001, Tzigov Chark. Citeseer.Google Scholar
- Per Hedelin, Anders Jonsson, and Per Lindblad. 1987. Svenskt uttalslexikon (3rd ed.). Technical report. Chalmers University of Technology.Google Scholar
- Mark Hepple. 2000. Independence and commitment: Assumptions for rapid training and execution of rule-based POS taggers. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 278. Google ScholarDigital Library
- Paul Hudak. 1996. Building domain-specific embedded languages. ACM Computing Surveys (CSUR) 28, 4 (1996). Google ScholarDigital Library
- Paul Hudak. 1998. Modular domain specific languages and tools. In Proceedings of the 5th International Conference on Software Reuse, P. Devanbu and J. Poulin (Eds.). IEEE Computer Society Press, 134--142. Google ScholarDigital Library
- Daniel Jurafsky and James H. Martin. 2008. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (2nd ed.). Prentice-Hall.Google Scholar
- Fred Karlsson, Atro Voutilainen, Juha Heikkilä, and Arto Anttila (Eds.). 1995. Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter. Google ScholarDigital Library
- Ergina Kavallieratou, Efstathios Stamatatos, Nikos Fakotakis, and George Kokkinakis. 2000. Handwritten character segmentation using transformation-based learning. In Proceedings of the 15th International Conference on Pattern Recognition (ICPR'00). 634--637.Google ScholarCross Ref
- Joungbum Kim, Sarah E. Schwarm, and Mari Ostendorf. 2004. Detecting structural metadata with decision trees and transformation-based learning. In Proceedings of HLT-NAACL04. 137--144.Google ScholarCross Ref
- Ludmila I. Kuncheva and Christopher J. Whitaker. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 2 (2003), 181--207. Google ScholarDigital Library
- Torbjörn Lager. 1999a. μ-TBL Lite: A small, extensible transformation-based learner. In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL'99). Bergen. Poster paper. Google ScholarDigital Library
- Torbjörn Lager. 1999b. The μ-TBL system: Logic programming tools for transformation-based learning. In Proceedings of CoNLL, Vol. 99.Google Scholar
- Torbjörn Lager. 2001. Transformation-based learning of rules for constraint grammar tagging. In 13th Nordic Conference in Computational Linguistics. Uppsala, Sweden, 21--22.Google Scholar
- Torbjörn Lager and Natalia Zinovjeva. 1999. Training a dialogue act tagger with the μ-TBL System. In Proceedings of the 3rd Swedish Symposium on Multimodal Communication. Linköping University Natural Language Processing Laboratory (NLPLAB).Google Scholar
- Niels Landwehr, Bernd Gutmann, Ingo Thon, Luc De Raedt, and Matthai Philipose. 2008. Relational transformation-based tagging for human activity recognition. Fundamenta Informaticae 89, 1 (2008), 111--129. Google ScholarDigital Library
- Xin Li, Xuan-Jing Huang, and Li-de Wu. 2006. Question classification by ensemble learning. IJCSNS 6, 3 (2006), 147.Google Scholar
- Nikolaj Lindberg and Martin Eineborg. 1998. Learning constraint grammar-style disambiguation rules using inductive logic programming. In Proceedings of the 17th International Conference on Computational Linguistics. Association for Computational Linguistics, 775--779. Google ScholarDigital Library
- Lidia Mangu and Eric Brill. 1997. Automatic rule acquisition for spelling correction. In Machine Learning -- International Workshop then Conference. Citeseer, 187--194. Google ScholarDigital Library
- Christopher D. Manning and Hinrich Schütze. 2001. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA. Google ScholarDigital Library
- Andrei Mikheev. 1997. Automatic rule induction for unknown-word guessing. Computational Linguistics 23, 3 (1997), 405--423. Google ScholarDigital Library
- Ruy Luiz Milidiú, C. E. M. Crestana, and Cícero Nogueira dos Santos. 2010. A token classification approach to dependency parsing. In Proceedings of the 7th Brazilian Symposium on Information and Human Language Technology (STIL'09). IEEE, 80--88. Google ScholarDigital Library
- Ruy L. Milidiú, Cícero N. dos Santos, and Julio C. Duarte. 2008. Phrase chunking using entropy guided transformation learning. In Proceedings of ACL 2008. Citeseer.Google Scholar
- Ruy L. Milidiú, Julio C. Duarte, and Cícero N. dos Santos. 2007. Evolutionary TBL template generation. Journal of the Brazilian Computer Society 13(4) (2007), 39--50.Google Scholar
- Tom Mitchell. 1997. Machine Learning. McGraw-Hill. Google ScholarDigital Library
- Un Yong Nahm. 2005. Transformation-based information extraction using learned meta-rules. Computational Linguistics and Intelligent Text Processing (2005), 535--538. Google ScholarDigital Library
- Lee Naish. 1996. Higher-order logic programming in Prolog. In Proceedings of the Workshop on Multi-Paradigm Logic Programming, JICSLP, Vol. 96.Google Scholar
- Grace Ngai and Radu Florian. 2001a. Transformation-based learning in the fast lane. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies 2001. Association for Computational Linguistics, 8. Google ScholarDigital Library
- Grace Ngai and Radu Florian. 2001b. Transformation Based Learning in the Fast Lane: A Generative Approach. Technical Report. Center for Speech and Language Processing, Johns Hopkins University.Google Scholar
- Kemal Oflazer and Gökhan Tür. 1996. Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 69--81.Google Scholar
- Jonathan Oliver. 1992. Decision Graphs: An Extension of Decision Trees. Technical Report 92/173. Department of Computer Science, Monash University.Google Scholar
- David D. Palmer. 1997. A trainable rule-based algorithm for word segmentation. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 321--328. Google ScholarDigital Library
- Seong-Bae Park, Jeong-Ho Chang, and Byoung-Tak Zhang. 2004. Korean compound noun decomposition using syllabic information only. Computational Linguistics and Intelligent Text Processing (2004), 146--157.Google Scholar
- Fernando Pereira and Yves Schabes. 1992. Inside-outside reestimation from partially bracketed corpora. In ACL. Google ScholarDigital Library
- J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann. Google ScholarDigital Library
- Lance A. Ramshaw and Mitchell P. Marcus. 1994. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In Proceedings of the ACL Workshop on Combining Symbolic and Statistical Approaches to Language. 128--135.Google Scholar
- Lance A. Ramshaw and Mitchell P. Marcus. 1995. Text chunking using transformation-based learning. In Proceedings of the ACL 3rd Workshop on Very Large Corpora, David Yarowsky and Kenneth W. Church (Eds.), Vol. cmp-lg/9505040. Association of Computational Linguistics, Somerset, NJ, 82--94.Google Scholar
- Ronald Rivest. 1987. Learning decision lists. Machine Learning 2, 3 (1987), 229--246. Google ScholarDigital Library
- Emmanuel Roche and Yves Schabes. 1995. Deterministic part-of-speech tagging with finite-state transducers. Computational Linguistics 21, 2 (1995), 227--253. Google ScholarDigital Library
- Dan Roth. 1998. Learning to resolve natural language ambiguities: A unified approach. In Proceedings of the National Conference on Artificial Intelligence. John Wiley & Sons Ltd., 806--813. Google ScholarDigital Library
- Tobias Ruland. 2000. A context-sensitive model for probabilistic LR parsing of spoken language with transformation-based postprocessing. In Proceedings of the 18th Conference on Computational Linguistics, Vol. 2. Association for Computational Linguistics, 677--683. Google ScholarDigital Library
- Ken Samuel. 1998a. Discourse learning: Dialogue act tagging with transformation-based learning. In Proceedings of the National Conference on Artificial Intelligence. John Wiley and Sons, Ltd., 1199--1199. Google ScholarDigital Library
- Ken Samuel. 1998b. Lazy transformation-based learning. In Proceedings of the 11th International Florida Artificial Intelligence Research Society Conference. AAAI Press, 235--239. Google ScholarDigital Library
- Ken Samuel, Sandra Carberry, and K. Vijay-Shanker. 1998. An investigation of transformation-based learning in discourse. In Machine Learning: Proceedings of the 15th International Conference. Google ScholarDigital Library
- Christer Samuelsson, Pasi Tapanainen, and Atro Voutilainen. 1996. Inducing constraint grammars. Grammatical Interference: Learning Syntax from Sentences (1996), 146--155. Google ScholarDigital Library
- Erik Tjong, Kim Sang, and Jorn Veenstra. 1999. Representing text chunks. In Proceedings of the 9th Conference on European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 173--179. Google ScholarDigital Library
- Yoshimasa Tsuruoka, John McNaught, and Sophia Ananiadou. 2008. Normalizing biomedical terms by minimizing ambiguity and variability. BMC Bioinformatics 9, Suppl 3 (2008), S2.Google ScholarCross Ref
- Leslie G. Valiant. 1984. A theory of the learnable. Communication ACM 27, 11 (1984), 1134--1142. Google ScholarDigital Library
- Arie van Deursen, Paul Klint, and Joost Visser. 2000. Domain-specific languages: An annotated bibliography. ACM SIGPLAN Notices 35, 6 (2000), 26--36. Google ScholarDigital Library
- Ken Williams, Christopher Dozier, and Andrew McCulloh. 2004. Learning transformation rules for semantic role labeling. In Proceedings of CoNLL-2004.Google Scholar
- Garnett Wilson and Malcolm Heywood. 2005. Use of a genetic algorithm in Brill's transformation-based part-of-speech tagger. In GECCO'05: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation. ACM, New York, NY, 2067--2073. DOI:http://dx.doi.org/10.1145/1068009.1068352 Google ScholarDigital Library
- David Wolpert. 1992. Stacked generalization. Neural Networks 5(2) (1992), 241260. Google ScholarDigital Library
- Dekai Wu, Grace Ngai, and Marine Carpuat. 2004. Raising the bar: Stacked conservative error correction beyond boosting. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004). Lisbon.Google Scholar
- George K. Zipf. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley.Google Scholar
- Win Zonneveld, Mieke Trommelen, Michael Jessen, Curtis Rice, Gösta Bruce, and Kristjan Arnason. 1999. Wordstress in West-Germanic and North-Germanic languages. In Word Prosodic Systems in the Languages of Europe, Harry van der Hulst (Ed.). Walter de Gruyter, Chapter 8, 477--604.Google Scholar
Index Terms
- When Errors Become the Rule: Twenty Years with Transformation-Based Learning
Recommendations
Transformation-based part-of-speech tagging for Serbian language
CIMMACS'09: Proceedings of the 8th WSEAS International Conference on Computational intelligence, man-machine systems and cyberneticsMachine learning techniques based on transformation rules have proven to be a viable alternative to stochastic tagging, achieving similar accuracy while having many advantages such as simplicity and better portability to other languages. However, data ...
BLARK for multi-dialect languages: towards the Kurdish BLARK
In this paper we introduce the Kurdish BLARK (Basic Language Resource Kit). The original BLARK has not considered multi-dialect characteristics and generally has targeted reasonably well-resourced languages. To consider these two features, we extended ...
Hebrew Computational Linguistics: Past and Future
This paper reviews the current state of the art in Natural Language Processing for Hebrew, both theoretical and practical. The Hebrew language, like other Semitic languages, poses special challenges for developers of programs for natural language ...
Comments