Abstract
The paper presents a real-time method for finding strings similar to a given pattern. The method is based on the Levenshtein metric with the Wagner–Fischer algorithm being adopted. An improvement is proposed to this well-known technique, a histogram-based approach which resulted in significant reduction of calculation time without a noticeable loss of correctness. Additionally, the used Wagner–Fischer algorithm has been massively parallelized with CUDA technology. The presented method is very flexible as one can define a task-suitable vocabulary, even for abstract elements, far beyond applications relevant to alphanumeric objects. The presented approach seems to be promising for networking and security applications as it is suitable for real-time analysis of data streams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Damerau worked at IBM on problems of detection and correction of spelling errors [5] before the Levenshtein metric was invented.
- 2.
- 3.
We will not distinguish between uppercase and lowercase letters.
- 4.
Compute Unified Device Architecture
- 5.
- 6.
References
Abdel-Ghaffar, K.A.S., Paluncic, F., Ferreira, H.C., Clarke, W.A.: On Helberg’s generalization of the Levenshtein code for multiple deletion/insertion error correction. IEEE Trans. Inf. Theory 58(3), 1804–1808 (2012)
Andoni A., Onak K.: Approximating edit distance in near-linear time. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, pp. 199–204. ACM (2009)
Backurs A., Indyk P.: Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In: Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, pp. 51–58. ACM (2015)
Chowdhury, S.D., Bhattacharya, U., Parui, S.K.: Online handwriting recognition using Levenshtein distance metric. In: 12th International Conference on Document Analysis and Recognition (ICDAR) (2013)
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)
Dong, J., Liu, H.: Semi-real-time algorithm for fast pattern matching. IET Image Proc. 10(12), 979–985 (2016)
Fujita, O.: Metrics based on average distance between sets. Jpn. J. Ind. Appl. Math. 30(1), 1–19 (2013)
Gaikwad, S., Bogiri, N.: Levenshtein distance algorithm for efficient and effective XML duplicate detection. In: International Conference on Computer, Communication and Control (IC4), pp. 1–5 (2015)
Harish Kumar, B.T., Vibha, L., Venugopal, K.R.: Web page access prediction using hierarchical clustering based on modified Levenshtein distance and higher order Markov model. In: IEEE Region 10 Symposium (TENSYMP), pp. 1–6 (2016)
Kim S.-H., Cho H.-G.: Position-restricted approximate string matching with metric Hamming distance. In: IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 108–114 (2017)
Konstantinidis S.: Computing the Levenshtein distance of a regular language. In: IEEE Information Theory Workshop (2005)
Levandowsky, M., Winter, D.: Distance between sets. Nature 234(5323), 34–35 (1971)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10, 707–710 (1966)
Nagata, J.: Modern General Topology, 3rd edn., vol. 33. Elsevier Science Publishers BV, Amsterdam (1985)
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. (CSUR) 33(1), 31–88 (2001)
Nemmour, H., Chibani, Y.: New Jaccard-distance based support vector machine kernel for handwritten digit recognition. In: 3rd International Conference on Information and Communication Technologies: From Theory to Applications, pp. 1–4 (2008)
Nyirarugira, C., Choi, H.-R., Kim, J.Y., Hayes M., Kim, T.Y.: Modified Levenshtein distance for real-time gesture recognition. In: 6th International Congress on Image and Signal Processing (CISP), pp. 974–979 (2013)
Medhat, D., Hassan, A., Salama C.: A hybrid cross-language name matching technique using novel modified Levenshtein distance. In: Tenth International Conference on Computer Engineering and Systems (ICCES), pp. 204–209 (2015)
Shao, M.-M., Qian, D.-M.: The Application of Levenshtein algorithm in the examination of the question bank similarity. In: International Conference on Robots and Intelligent System (ICRIS), pp. 422–424 (2016)
Skłodowski, P., Żorski W.: Movement tracking in terrain conditions accelerated with CUDA. In: Proceedings of the Federated Conference on Computer Science and Information Systems, pp. 709–717 (2014)
Cha, S.-H., Srihari, S.N.: On measuring the distance between histograms. Pattern Recogn. 35(6), 1355–1370 (2002)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. Assoc. Comput. Mach. 21, 168–173 (1974)
Putra, M.E.W., Supriana, I.: Structural offline handwriting character recognition using Levenshtein distance. In: International Conference on Electrical Engineering and Informatics (ICEEI) (2015)
Yujian, L., Bo, L.: A normalized Levenshtein distance metric. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1091–1095 (2007)
Zhu, H., Cao, Y., Zhou, Z., Gong, M.: Parallel multi-temporal remote sensing image change detection on GPU. In: IEEE 26th International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW) (2012)
Żorski, W.: The hough transform application including its hardware implementation. In: Advanced Concepts for Intelligent Vision Systems: Proceedings of the 7th International Conference, Lecture Notes in Computer Science, vol. 3708, pp. 460–467. Springer (2005)
NVIDIA, CUDA C Programming Guide, March 2018, PG-02829-001_v9.1. https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf
NVIDIA, CUDA C Best Practices Guide, March 2018, DG-05603-001_v9.1. https://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Żorski, W., Drogosiewicz, B. (2019). Real-Time Comparable Phrases Searching Via the Levenshtein Distance with the Use of CUDA Technology. In: Kosiuczenko, P., Zieliński, Z. (eds) Engineering Software Systems: Research and Praxis. KKIO 2018. Advances in Intelligent Systems and Computing, vol 830. Springer, Cham. https://doi.org/10.1007/978-3-319-99617-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-99617-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99616-5
Online ISBN: 978-3-319-99617-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)