Learning Local Transductions Is Hard

Jansche, Martin

doi:10.1007/s10849-004-2115-9

Learning Local Transductions Is Hard

Original Article
Published: March 2004

Volume 13, pages 439–455, (2004)
Cite this article

Journal of Logic, Language and Information Aims and scope Submit manuscript

Martin Jansche¹

58 Accesses
Explore all metrics

Abstract

Local deterministic string-to-string transductions arise in natural language processing (NLP) tasks such as letter-to-sound translation or pronunciation modeling. This class of transductions is a simple generalization of morphisms of free monoids; learning local transductions is essentially the same as inference of certain monoid morphisms. However, learning even a highly restricted class of morphisms, the so-called fine morphisms, leads to intractable problems: deciding whether a hypothesized fine morphism is consistent with observations is an NP-complete problem; and maximizing classification accuracy of the even smaller class of alphabetic substitution morphisms is APX-hard. These theoretical results provide some justification for using the kinds of heuristics that are commonly used for this learning task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aho, A.V., Hopcroft, J.E., and Ullman, J.D., 1983, Data Structures and Algorithms, Addison-Wesley Series in Computer Science and Information Processing, Reading, MA: Addison-Wesley.
Google Scholar
Angluin, D., 1982, “Inference of reversible languages,” Journal of the ACM 29(3), 741–765.
Article Google Scholar
Ausiello, G.,Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A. and Protasi, M., 1999, Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties, Berlin, Germany: Springer.
Google Scholar
Bakiri, G. and Dietterich, T.G., 2001, “Constructing high-accuracy letter-to-phoneme rules with machine learning,” pp. 27–44 in Data-Driven Techniques in Speech Synthesis, R.I. Damper, ed., No. 9in Telecommunications Technology and Applications, Boston, MA: Kluwer.
Google Scholar
Damper, R.I., Marchand, Y., Adamson, M.J., and Gustafson, K., 1999, “Evaluating the pronunciation component of text-to-speech systems for English: A performance comparison of different approaches,” Computer Speech and Language 13(2), 155–176.
Article Google Scholar
Eilenberg, S., 1974, Automata, Languages, and Machines, Vol. A., New York, NY: Academic Press.
Google Scholar
Fisher, W.M., 1999, “A statistical text-to-phone function using Ngrams and rules,” pp. 649–652 in International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ.
García, P. and Vidal, E., 1990, “Inference of k-testable languages in the strict sense and application to syntactic pattern recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence 12(9), 920–925.
Article Google Scholar
Garey, M.R. and Johnson, D.S., 1979, Computers and Intractability: A Guide to the Theory of NP-Completeness, San Francisco, CA: W.H. Freeman.
Google Scholar
Gildea, D. and Jurafsky, D., 1996, “Learning bias and phonological-rule induction,” Computational Linguistics 22(4), 497–530.
Google Scholar
Gold, E.M., 1967, “Language identification in the limit,” Information and Control 10(5), 447–474.
Article Google Scholar
Hyafil, L. and Rivest, R.L., 1976, “Constructing optimal binary decision trees is NP-complete,” Information Processing Letters 5(1), 15–17.
Article Google Scholar
International Phonetic Association: 1999, Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet, Cambridge, U.K.: Cambridge University Press.
Jansche, M., 2001, “Re-engineering letter-to-sound rules,” pp. 111–117 in Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics. Pittsburgh, PA.
Jansche, M., 2003, “Inference of string mappings for language technology,” Ph.D. Thesis, The Ohio State University, Columbus.
Kearns, M.J. and Vazirani, U.V., 1994, An Introduction to Computational Learning Theory. Cambridge, MA: MIT Press, Second printing, 1997.
Google Scholar
Kearns, M.J., Schapire, R.E., and Sellie, L.M., 1992, “Toward efficient agnostic learning,” pp. 341–352 in Proceedings of the 5th Annual Workshop on Computational Learning Theory, Philadelphia.
Kruskal, J.B., 1983, “An overview of sequence comparison,” pp. 1–44 in Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. Kruskal, eds., Reading, MA: Addison-Wesley Reissued by CSLI Publications, Stanford, CA,1999.
Google Scholar
Lucassen, J.M. and Mercer, R.L., 1984, “An information theoretic approach to the automatic determination of phonemic baseforms,” pp. 42.5.1–42.5.4 in International Conference on Acoustics, Speech, and Signal Processing.
McNaughton, R. and Papert, S., 1972, Counter-Free Automata, Cambridge, MA: MIT Press.
Google Scholar
Minka, T.P., 2000, “Empirical risk minimization is an incomplete inductive principle,” http://www.stat.cmu.edu/~minka/papers/erm.html
Mohri, M., 1997, “Finite-state transducers in language and speech processing,” Computational Linguistics 23(2), 269–311.
Google Scholar
Oncina, J., Garcia, P. and Vidal, E., 1993, “Learning subsequential transducers for pattern recognition interpretation tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence 15(5), 448–458.
Article Google Scholar
Papadimitriou, C.H., 1994, Computational Complexity, Reading, MA: Addison-Wesley.
Google Scholar
Papadimitriou, C.H. and Steiglitz, K., 1998, Combinatorial Optimization: Algorithms and Complexity, Mineola, NY: Dover Publications. Originally published by Prentice Hall, Englewood Cliffs, NJ, 1982.
Google Scholar
Pitt, L., 1989, “Inductive inference, DFAs, and computational complexity,” pp. 18–44 in Analogical and Inductive Inference, International Workshop AII’ 89, Reinhardsbrunn Castle, GDR, October 1–6, 1989, Proceedings, Vol. 397 of Lecture Notes in Computer Science, K.P. Jantke, ed., Berlin, Germany: Springer.
Google Scholar
Pitt, L. and Warmuth, M.K., 1993, “The minimum consistent DFA problem cannot be approximated within any polynomial,” Journal of the ACM 40(1), 95–142.
Article Google Scholar
Roche, E. and Schabes, Y., eds., 1997, Finite-State Language Processing, Language, Speech and Communication, Cambridge, MA: MIT Press.
Google Scholar
Sejnowski, T.J. and Rosenberg, C.R., 1987, “Parallel networks that learn to pronounce English text,” Complex Systems 1(1), 145–168.
Google Scholar
Sproat, R., Möbius, B., Maeda, K. and Tzoukermann, E., 1998, “Multilingual text analysis,” Chapt. 3, pp. 31–87 in Multilingual Text-to-Speech Synthesis: The Bell Labs Approach, R. Sproat, ed., Dordrecht, The Netherlands: Kluwer Academic Publishers.
Google Scholar
Valiant, L.G., 1984, “A theory of the learnable,” Communications of the ACM 27(11), 1134–1142.
Article Google Scholar
van den Bosch, A.P.J., 1997, “Learning to pronounce written words: A study in inductive language learning,” Ph.D. Thesis, Universiteit Maastricht, Maastricht, The Netherlands.
Wagner, R.A. and Fischer, M.J., 1974, “The string-to-string correction problem,” Journal of the ACM 21(1), 168–173.
Article Google Scholar
Weide, R.L., 1998, “The Carnegie Mellon pronouncing dictionary version 0.6,” {electronic document}, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA. ftp://ftp.cs.cmu.edu/project/fgdata/dict/.

Download references

Author information

Authors and Affiliations

Center for Computational Learning Systems, Columbia University, New York, U.S.A.
Martin Jansche

Authors

Martin Jansche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Jansche.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jansche, M. Learning Local Transductions Is Hard. J Logic Lang Inf 13, 439–455 (2004). https://doi.org/10.1007/s10849-004-2115-9

Download citation

Issue Date: March 2004
DOI: https://doi.org/10.1007/s10849-004-2115-9

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Local Transductions Is Hard

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural Language Processing

Explainable AI Methods - A Brief Overview

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key words

Navigation

Learning Local Transductions Is Hard

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural Language Processing

Explainable AI Methods - A Brief Overview

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation