Skip to main content

Bounded Occurrence Edit Distance: A New Metric for String Similarity Joins with Edit Distance Constraints

  • Conference paper
SOFSEM 2014: Theory and Practice of Computer Science (SOFSEM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8327))

  • 1432 Accesses

Abstract

Given two sets of strings and a similarity function on strings, similarity joins attempt to find all similar pairs of strings from each respective set. In this paper, we focus on similarity joins with respect to the edit distance, and propose a new metric called the bounded occurrence edit distance and a filter based on the metric. Using the filter, we can reduce the total time required to solve similarity joins because the metric can be computed faster than the edit distance by bitwise operations. We demonstrate the effectiveness of the filter through experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: Proc. of WWW, pp. 131–140 (2007)

    Google Scholar 

  2. Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. ACM Trans. Algorithms 3(1), 2:1–2:19 (2007)

    Google Scholar 

  3. Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: Proc. of VLDB, pp. 491–500 (2001)

    Google Scholar 

  4. Metwally, A., Agrawal, D., El Abbadi, A.: Detectives: detecting coalition hit inflation attacks in advertising networks streams. In: Proc. of WWW, pp. 241–250 (2007)

    Google Scholar 

  5. Narita, K., Nakadai, S., Araki, T.: Landmark-join: hash-join based string similarity joins with edit distance constraints. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 180–191. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Ohad, L., Ely, P.: Approximate pattern matching with the l 1, l 2 and l ; metrics. Algorithmica 60(2), 335–348 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  7. Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proc. of WWW, pp. 377–386 (2006)

    Google Scholar 

  8. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  9. Wang, J., Feng, J., Li, G.: Trie-join: efficient trie-based string similarity joins with edit-distance constraints. Proceedings of the VLDB Endowment 3(1-2), 1219–1230 (2010)

    Google Scholar 

  10. Warren, H.S.: Hacker’s Delight. Addison-Wesley Longman Publishing Co., Inc. (2002)

    Google Scholar 

  11. Xiao, C., Wang, W., Lin, X.: Ed-join: an efficient algorithm for similarity joins with edit distance constraints. Proceedings of the VLDB Endowment 1(1), 933–944 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Komatsu, T., Okuta, R., Narisawa, K., Shinohara, A. (2014). Bounded Occurrence Edit Distance: A New Metric for String Similarity Joins with Edit Distance Constraints. In: Geffert, V., Preneel, B., Rovan, B., Å tuller, J., Tjoa, A.M. (eds) SOFSEM 2014: Theory and Practice of Computer Science. SOFSEM 2014. Lecture Notes in Computer Science, vol 8327. Springer, Cham. https://doi.org/10.1007/978-3-319-04298-5_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04298-5_32

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04297-8

  • Online ISBN: 978-3-319-04298-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics