Skip to main content

Computation of Similarity—Similarity Search as Computation

  • Conference paper
Book cover Models of Computation in Context (CiE 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6735))

Included in the following conference series:

  • 507 Accesses

Abstract

We present a number of applications in Natural Language Processing where the main computation consists of a similarity search for an input pattern in a large database. Afterwards we describe some efficient methods and algorithms for solving this computational challenge. We discuss the view of the similarity search as a special kind of computation, which is remarkably common in applications of Computational Linguistics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)

    Article  Google Scholar 

  2. Ernst-Gerlach, A., Fuhr, N.: Generating search term variants for text collections with historic spellings. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 49–60. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Gotscharek, A., Neumann, A., Reffle, U., Ringlstetter, C., Schulz, K.U.: Enabling information retrieval on historical document collections: the role of matching procedures and special lexica. In: AND 2009: Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data, pp. 69–76. ACM, New York (2009), doi:10.1145/1568296.1568309

    Chapter  Google Scholar 

  4. Gross, M.: The Construction of Local Grammars. In: Finite-State Language Processing, pp. 329–352. The MIT Press, Cambridge (1997)

    Google Scholar 

  5. Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing. Prentice Hall, New Jersey (2001)

    Google Scholar 

  6. Kukich, K.: Techniques for automatically correcting words in texts. ACM Computing Surveys, 377–439 (1992)

    Google Scholar 

  7. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  8. Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. (1966)

    Google Scholar 

  9. Mihov, S., Schulz, K.U.: Fast approximate search in large dictionaries. Computational Linguistics 30(4), 451–477 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  10. Mitankin, P., Mihov, S., Schulz, K.U.: Deciding word neighborhood with universal neighborhood automata. Theoretical Computer Science (in Press)

    Google Scholar 

  11. Mitankin, P., Mihov, S., Tinchev, T.: Large vocabulary continuous speech recognition for Bulgarian. In: Proceedings of the RANLP 2009 (2009)

    Google Scholar 

  12. Ney, H., Essen, U., Kneser, R.: On structuring probabilistic dependencies in stochastic language modelling. Computer Speech & Language 8, 1–38 (1994)

    Article  Google Scholar 

  13. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  14. Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computational Linguistics 22(1), 73–89 (1996)

    Google Scholar 

  15. Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of IEEE 77 (1989)

    Google Scholar 

  16. Redner, R., Walker, H.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26(2), 195–239 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  17. Ringlstetter, C., Schulz, K.U., Mihov, S.: Adaptive text correction with webcrawled domain-dependent dictionaries. ACM Trans. Speech Lang. Process. 4(4), 9 (2007)

    Article  Google Scholar 

  18. Schulz, K., Mihov, S., Mitankin, P.: Fast selection of small and precise candidate sets from dictionaries for text correction tasks. In: ICDAR 2007: Proceedings of the Ninth International Conference on Document Analysis and Recognition, pp. 471–475. IEEE Computer Society Press, Washington, DC, USA (2007)

    Google Scholar 

  19. Schulz, K.U., Mihov, S.: Fast string correction with Levenshtein automata. International Journal of Document Analysis and Recognition 5(1), 67–85 (2002)

    Article  MATH  Google Scholar 

  20. Wagner, R., Fisher, M.: The string-to-string correction problem. Journal of the ACM (1974)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mihov, S., Schulz, K.U. (2011). Computation of Similarity—Similarity Search as Computation. In: Löwe, B., Normann, D., Soskov, I., Soskova, A. (eds) Models of Computation in Context. CiE 2011. Lecture Notes in Computer Science, vol 6735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21875-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21875-0_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21874-3

  • Online ISBN: 978-3-642-21875-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics