Skip to main content
Log in

Learning Local Transductions Is Hard

  • Original Article
  • Published:
Journal of Logic, Language and Information Aims and scope Submit manuscript

Abstract

Local deterministic string-to-string transductions arise in natural language processing (NLP) tasks such as letter-to-sound translation or pronunciation modeling. This class of transductions is a simple generalization of morphisms of free monoids; learning local transductions is essentially the same as inference of certain monoid morphisms. However, learning even a highly restricted class of morphisms, the so-called fine morphisms, leads to intractable problems: deciding whether a hypothesized fine morphism is consistent with observations is an NP-complete problem; and maximizing classification accuracy of the even smaller class of alphabetic substitution morphisms is APX-hard. These theoretical results provide some justification for using the kinds of heuristics that are commonly used for this learning task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aho, A.V., Hopcroft, J.E., and Ullman, J.D., 1983, Data Structures and Algorithms, Addison-Wesley Series in Computer Science and Information Processing, Reading, MA: Addison-Wesley.

    Google Scholar 

  • Angluin, D., 1982, “Inference of reversible languages,” Journal of the ACM 29(3), 741–765.

    Article  Google Scholar 

  • Ausiello, G.,Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A. and Protasi, M., 1999, Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties, Berlin, Germany: Springer.

    Google Scholar 

  • Bakiri, G. and Dietterich, T.G., 2001, “Constructing high-accuracy letter-to-phoneme rules with machine learning,” pp. 27–44 in Data-Driven Techniques in Speech Synthesis, R.I. Damper, ed., No. 9in Telecommunications Technology and Applications, Boston, MA: Kluwer.

    Google Scholar 

  • Damper, R.I., Marchand, Y., Adamson, M.J., and Gustafson, K., 1999, “Evaluating the pronunciation component of text-to-speech systems for English: A performance comparison of different approaches,” Computer Speech and Language 13(2), 155–176.

    Article  Google Scholar 

  • Eilenberg, S., 1974, Automata, Languages, and Machines, Vol. A., New York, NY: Academic Press.

    Google Scholar 

  • Fisher, W.M., 1999, “A statistical text-to-phone function using Ngrams and rules,” pp. 649–652 in International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ.

  • García, P. and Vidal, E., 1990, “Inference of k-testable languages in the strict sense and application to syntactic pattern recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence 12(9), 920–925.

    Article  Google Scholar 

  • Garey, M.R. and Johnson, D.S., 1979, Computers and Intractability: A Guide to the Theory of NP-Completeness, San Francisco, CA: W.H. Freeman.

    Google Scholar 

  • Gildea, D. and Jurafsky, D., 1996, “Learning bias and phonological-rule induction,” Computational Linguistics 22(4), 497–530.

    Google Scholar 

  • Gold, E.M., 1967, “Language identification in the limit,” Information and Control 10(5), 447–474.

    Article  Google Scholar 

  • Hyafil, L. and Rivest, R.L., 1976, “Constructing optimal binary decision trees is NP-complete,” Information Processing Letters 5(1), 15–17.

    Article  Google Scholar 

  • International Phonetic Association: 1999, Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet, Cambridge, U.K.: Cambridge University Press.

  • Jansche, M., 2001, “Re-engineering letter-to-sound rules,” pp. 111–117 in Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics. Pittsburgh, PA.

  • Jansche, M., 2003, “Inference of string mappings for language technology,” Ph.D. Thesis, The Ohio State University, Columbus.

  • Kearns, M.J. and Vazirani, U.V., 1994, An Introduction to Computational Learning Theory. Cambridge, MA: MIT Press, Second printing, 1997.

    Google Scholar 

  • Kearns, M.J., Schapire, R.E., and Sellie, L.M., 1992, “Toward efficient agnostic learning,” pp. 341–352 in Proceedings of the 5th Annual Workshop on Computational Learning Theory, Philadelphia.

  • Kruskal, J.B., 1983, “An overview of sequence comparison,” pp. 1–44 in Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. Kruskal, eds., Reading, MA: Addison-Wesley Reissued by CSLI Publications, Stanford, CA,1999.

    Google Scholar 

  • Lucassen, J.M. and Mercer, R.L., 1984, “An information theoretic approach to the automatic determination of phonemic baseforms,” pp. 42.5.1–42.5.4 in International Conference on Acoustics, Speech, and Signal Processing.

  • McNaughton, R. and Papert, S., 1972, Counter-Free Automata, Cambridge, MA: MIT Press.

    Google Scholar 

  • Minka, T.P., 2000, “Empirical risk minimization is an incomplete inductive principle,” http://www.stat.cmu.edu/~minka/papers/erm.html

  • Mohri, M., 1997, “Finite-state transducers in language and speech processing,” Computational Linguistics 23(2), 269–311.

    Google Scholar 

  • Oncina, J., Garcia, P. and Vidal, E., 1993, “Learning subsequential transducers for pattern recognition interpretation tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence 15(5), 448–458.

    Article  Google Scholar 

  • Papadimitriou, C.H., 1994, Computational Complexity, Reading, MA: Addison-Wesley.

    Google Scholar 

  • Papadimitriou, C.H. and Steiglitz, K., 1998, Combinatorial Optimization: Algorithms and Complexity, Mineola, NY: Dover Publications. Originally published by Prentice Hall, Englewood Cliffs, NJ, 1982.

    Google Scholar 

  • Pitt, L., 1989, “Inductive inference, DFAs, and computational complexity,” pp. 18–44 in Analogical and Inductive Inference, International Workshop AII’ 89, Reinhardsbrunn Castle, GDR, October 1–6, 1989, Proceedings, Vol. 397 of Lecture Notes in Computer Science, K.P. Jantke, ed., Berlin, Germany: Springer.

    Google Scholar 

  • Pitt, L. and Warmuth, M.K., 1993, “The minimum consistent DFA problem cannot be approximated within any polynomial,” Journal of the ACM 40(1), 95–142.

    Article  Google Scholar 

  • Roche, E. and Schabes, Y., eds., 1997, Finite-State Language Processing, Language, Speech and Communication, Cambridge, MA: MIT Press.

    Google Scholar 

  • Sejnowski, T.J. and Rosenberg, C.R., 1987, “Parallel networks that learn to pronounce English text,” Complex Systems 1(1), 145–168.

    Google Scholar 

  • Sproat, R., Möbius, B., Maeda, K. and Tzoukermann, E., 1998, “Multilingual text analysis,” Chapt. 3, pp. 31–87 in Multilingual Text-to-Speech Synthesis: The Bell Labs Approach, R. Sproat, ed., Dordrecht, The Netherlands: Kluwer Academic Publishers.

    Google Scholar 

  • Valiant, L.G., 1984, “A theory of the learnable,” Communications of the ACM 27(11), 1134–1142.

    Article  Google Scholar 

  • van den Bosch, A.P.J., 1997, “Learning to pronounce written words: A study in inductive language learning,” Ph.D. Thesis, Universiteit Maastricht, Maastricht, The Netherlands.

  • Wagner, R.A. and Fischer, M.J., 1974, “The string-to-string correction problem,” Journal of the ACM 21(1), 168–173.

    Article  Google Scholar 

  • Weide, R.L., 1998, “The Carnegie Mellon pronouncing dictionary version 0.6,” {electronic document}, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA. ftp://ftp.cs.cmu.edu/project/fgdata/dict/.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Jansche.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jansche, M. Learning Local Transductions Is Hard. J Logic Lang Inf 13, 439–455 (2004). https://doi.org/10.1007/s10849-004-2115-9

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10849-004-2115-9

Key words

Navigation