Abstract
We present a number of applications in Natural Language Processing where the main computation consists of a similarity search for an input pattern in a large database. Afterwards we describe some efficient methods and algorithms for solving this computational challenge. We discuss the view of the similarity search as a special kind of computation, which is remarkably common in applications of Computational Linguistics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Ernst-Gerlach, A., Fuhr, N.: Generating search term variants for text collections with historic spellings. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 49–60. Springer, Heidelberg (2006)
Gotscharek, A., Neumann, A., Reffle, U., Ringlstetter, C., Schulz, K.U.: Enabling information retrieval on historical document collections: the role of matching procedures and special lexica. In: AND 2009: Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data, pp. 69–76. ACM, New York (2009), doi:10.1145/1568296.1568309
Gross, M.: The Construction of Local Grammars. In: Finite-State Language Processing, pp. 329–352. The MIT Press, Cambridge (1997)
Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing. Prentice Hall, New Jersey (2001)
Kukich, K.: Techniques for automatically correcting words in texts. ACM Computing Surveys, 377–439 (1992)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)
Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. (1966)
Mihov, S., Schulz, K.U.: Fast approximate search in large dictionaries. Computational Linguistics 30(4), 451–477 (2004)
Mitankin, P., Mihov, S., Schulz, K.U.: Deciding word neighborhood with universal neighborhood automata. Theoretical Computer Science (in Press)
Mitankin, P., Mihov, S., Tinchev, T.: Large vocabulary continuous speech recognition for Bulgarian. In: Proceedings of the RANLP 2009 (2009)
Ney, H., Essen, U., Kneser, R.: On structuring probabilistic dependencies in stochastic language modelling. Computer Speech & Language 8, 1–38 (1994)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computational Linguistics 22(1), 73–89 (1996)
Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of IEEE 77 (1989)
Redner, R., Walker, H.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26(2), 195–239 (1984)
Ringlstetter, C., Schulz, K.U., Mihov, S.: Adaptive text correction with webcrawled domain-dependent dictionaries. ACM Trans. Speech Lang. Process. 4(4), 9 (2007)
Schulz, K., Mihov, S., Mitankin, P.: Fast selection of small and precise candidate sets from dictionaries for text correction tasks. In: ICDAR 2007: Proceedings of the Ninth International Conference on Document Analysis and Recognition, pp. 471–475. IEEE Computer Society Press, Washington, DC, USA (2007)
Schulz, K.U., Mihov, S.: Fast string correction with Levenshtein automata. International Journal of Document Analysis and Recognition 5(1), 67–85 (2002)
Wagner, R., Fisher, M.: The string-to-string correction problem. Journal of the ACM (1974)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mihov, S., Schulz, K.U. (2011). Computation of Similarity—Similarity Search as Computation. In: Löwe, B., Normann, D., Soskov, I., Soskova, A. (eds) Models of Computation in Context. CiE 2011. Lecture Notes in Computer Science, vol 6735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21875-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-21875-0_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21874-3
Online ISBN: 978-3-642-21875-0
eBook Packages: Computer ScienceComputer Science (R0)