Abstract
A novel approach to real-time string filtering of large databases is presented. The proposed approach is based on a combination of artificial neural networks and operates in two stages. The first stage employs a self-organizing map for performing approximate string matching and retrieving those strings of the database which are similar to (i.e. assigned to the same SOM node as) the query string. The second stage employs a harmony theory network for comparing the previously retrieved strings in parallel with the query string and determining whether an exact match exists. The experimental results demonstrate accurate, fast and database-size independent string filtering which is robust to database modifications. The proposed approach is put forward for general-purpose (directory, catalogue and glossary search) and Internet (e-mail blocking, intrusion detection systems, URL and username classification) applications.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Boyer, R., Moore, S.: A Fast String Matching Algorithm. Comm. ACM 20, 762–772 (1977)
Knuth, D.E., Morris, J., Pratt, V.: Fast Pattern Matching Strings. SIAM J. Comp. 6, 323–350 (1977)
Makinen, V., Navarro, G., Ukkonen, E.: Transposition Invariant String Matching. J. Algor. 56, 124–153 (2005)
Elloumi, M.: Comparison of Strings Belonging to the Same Family. Inform. Sci. 111, 49–63 (1998)
Pao, D.C.W., Sun, M.C., Lam, C.H.: An Approximate String Matching Algorithm for on-Line Chinese Character Recognition. Im. Vis. Comp. 15, 695–703 (1997)
Lopresti, D., Tomkins, A.: Block Edit Models for Approximate String Matching. Theoret. Comp. Sci. 181, 159–179 (1997)
Parizeau, M., Ghazzali, N., Hebert, J.F.: Optimizing the Cost Matrix for Approximate String Matching Using Genetic Algorithms. Patt. Recogn. 32, 431–440 (1998)
Lemstrom, K., Navarro, G., Pinzon, Y.: Practical Algorithms for Transposition-Invariant String-Matching. J. Dicsr. Alg. 3, 267–292 (2005)
Deodorowicz, S.: Speeding up Transposition-Invariant String Matching. Inform. Proc. Lett. 100, 14–20 (2006)
Crochemore, M., Gasieniec, L., Rytter, W.: Constant-Space String-Matching in Sublinear Average Time. Theor. Comp. Sci. 218, 197–203 (1999)
Misra, J.: Derivation of a Parallel String Matching Algorithm. Inform. Proc. Lett. 85, 255–260 (2005)
Allauzen, C., Raffinot, M.: Simple Optimal String Matching Algorithm. J. Alg. 36, 102–116 (2000)
He, L., Fang, B., Sui, J.: The Wide Window String Matching Algorithm. Theor. Comp. Sci. 332, 301–404 (2005)
Ramesh, H., Vinay, V.: String Matching on quantum time. J. Discr. Alg. 1, 103–110 (2003)
Horspool, R.: Practical Fast Searching in Strings. Oft. Pract. & Exper. 10, 501–506 (1980)
Sunday, D.M.: A very Fast Substring Search Algorithm. Comm. ACM 33, 132–142 (1990)
Galil, Z., Park, K.: An Improved Algorithm for Approximate String Matching. SIAM J. Comp 19, 989–999 (1990)
Baeza-Yates, R.A., Perleberg, C.H.: Fast and Practical Approximate String Matching. Inf. Proc. Lett. 59, 21–27 (1996)
Landau, G., Vishkin, U.: Fast String Matching with k Differences. J. Comp. Sys. Sci. 37, 63–78 (1988)
Navarro, G., Baeza-Yates, R.: Very Fast and Simple Approximate String Matching. Inf. Proc. Lett. 72, 65–70 (1999)
Holub, J., Melichar, B.: Approximate String Matching Using Factor Automata. Theor. Comp. Sci. 249, 305–311 (2000)
Choffrut, C., Haddad, Y.: String-Matching with OBDDs. Theor. Comp. Sci. 320, 187–198 (2004)
Hyyro, H.: Bit-Parallel Approximate String Matching Algorithms with Transposition. J. Discr. Alg. 3, 215–229 (2005)
Navarro, G., Chavez, E.: A Metric Index for Approximate String Matching. Theor. Comp. Sci. 352, 266–279 (2006)
Nebel, M.E.: Fast String Matching by Using Probabilities: an Optimal Mismatch Variant of Horspool’s Algorithm. Theor. Comp. 359, 329–343 (2006)
Levenshtein, A.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Sov. Phy. Dokl. 10, 707–710 (1966)
Manber, U., Myers, E.W.: Suffix Arrays: a New Method for On-Line String Searches. SIAM J. on Comp. 22, 935–948 (1993)
Moffat, A., Zobel, J.: Self-Indexing Inverted Files for Fast Text Retrieval. ACM Trans. Onf. Sys. 14, 349–379 (1996)
Ferragina, P., Grossi, R.: The String B-Tree: a New Structure for String Search in External Memory and Application. J. of ACM 46, 236–280 (1999)
Bentley, J., Sedgewick, R.: Fast Algorithms for Sorting and Searching Strings. In: Proc. Of the ACM-SIAM Symposium on Discrete Algorithms, pp. 360–369 (1997)
Grossi, R., Vitter, J.S.: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexng and String Matching. In: Proc. Of the 3rd Annual ACM Symposiom on Theory of Computation, pp. 397–406 (2000), also in SIAM J. on Comp. 35 (2005)
Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Heidelberg (2001)
Smolensky, P.: Information Processing in Dynamical Systems: Foundations of Harmony Theory. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, pp. 194–281. MIT Press, Cambrige (1986)
Tambouratzis, T.: String Matching Artificial Neural Networks. Int. J. Neur. Syst. 11, 445–453 (2001)
Tambouratzis, T.: A Novel Artificial Neural Network for Sorting. IEEE Trans. Syst., Man & Cybern. 29, 271–275 (1999)
Vesanto, J., Himberg, J., Alhoniemi, E., Parhankangas, J.: SOM Toolbox for Matlab 5. Report A57, SOM Toolbox Team, Helsinki University of Technology, Finland (2000), available at http://www.cis.hut.fi/projects/somtoolbox
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Tambouratzis, T. (2007). Real-Time String Filtering of Large Databases Implemented Via a Combination of Artificial Neural Networks. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2007. Lecture Notes in Computer Science, vol 4432. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71629-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-71629-7_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71590-0
Online ISBN: 978-3-540-71629-7
eBook Packages: Computer ScienceComputer Science (R0)