Efficient n-Gram-Based String Matching in Electronic Testing at Programming

Niewiadomski, Adam; Akinwale, Adio

doi:10.1007/978-3-642-40495-5_66

Efficient n-Gram-Based String Matching in Electronic Testing at Programming

Adam Niewiadomski²² &
Adio Akinwale²²

Conference paper

2003 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8083))

Abstract

The purpose of this study is to explore the grammatical properties and features of string matching based on n-grams techniques and to apply them to electronic testing at programming languages. Because of the intensive and extensive availability of internet in the academic environment, there exists a real need for computational procedures within artificial intelligence methods that support assessment of examination questions for uniformity and consistency. There are many computer-aided assessment packages, mainly designed for single and multiple choice tests, but they are not suitable for electronic testing at programming languages. n-grams based string matching is being successfully applied to document retrieval and other natural language processing applications. Generalized n-grams matching during substring processing tends to be time-consuming since there are N ² n-grams extracted where n is the length of a (sub)string. The choice of selecting parameter n in n-grams approximately is an important task since the large size of n leads to polynomial growth and its small size may shorten search time significantly. As the result, some new string matching methods based on n-grams are proposed for the improvement of generalized n-grams. Experiments are conducted with the method using programming language codes as both pattern and text matching. The results are compared to chosen existing methods. We found the obtained results very promising and suggest that the proposed methods can be successfully applied to electronic testing at programming languages as an intelligent support for teachers involved in e-learning processes.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adamson, G.W., Boreham, J.: The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Information Storage and Retrieval 10, 253–260 (1974)
Article Google Scholar
Angel, R., Frennd, G., Willet, P.: Automatic spelling corection using a trigram similarity measure. Information Processing and Management 19(4), 255–261 (1983)
Article Google Scholar
Damerau, F.: A technique for computer detection and correction of spelling errors. Communication of ACM 7, 171–176 (1964)
Article Google Scholar
Pfeifer, U., Poersh, T., Fubr, N.: Retrieval effectiveness of proper name search methods. Information Processing and Management 32(6), 667–679 (1996)
Article Google Scholar
Zamora, E., Pollack, J., Zamora, A.: The use of trigram analysis for spelling error detection. Information and Management 17(6), 305–316 (1981)
Google Scholar
Damashek, M.: Gauging similarity with n-gram language-independence categorization of text. Science 276, 845–848 (1995)
Google Scholar
Schuegraf, E.J., Heaps, H.S.: Selection of equifrequent word fragments for information retrieval. Information Storage and Retrieval 9, 697–711 (1973)
Article Google Scholar
Celko, J.: A Sql programming. Morgan Kaufman Publishers (1995)
Google Scholar
Harrison, M.: Implementation of the substring test by hashing. Communication of the ACM 14(12), 777–779 (1971)
Article Google Scholar
Church, K.W., Gale, W.A.: A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of english bigrams. Computer Speech Language 5(1), 19–54 (1991)
Article Google Scholar
Kuhn, R., De Mori, R.: A cache-based natural language model for speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(6), 570–583 (1990)
Article Google Scholar
Niesler, T.R., Woodland, P.C.: A variable-length category-based n-gram language model. In: Proceedings, IEEE ICASSP, pp. 164–167 (1996)
Google Scholar
Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram based detection of new malicious code. In: COMPSAC 2004 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts, vol. 02, pp. 41–42 (2004)
Google Scholar
Ibrahim, A., Abu-Bakar, Z.: Automated grading of linear algebraic equation using n-gram method. Master’s thesis, Pensyarah matmatik FTMSK, Kampus, Cawangen, Kuala Pilar (2005)
Google Scholar
Ukkonen, E.: On approximate string matching in fct. Science (1983)
Google Scholar
Barrón-Cedeño, A., Rosso, P.: On automatic plagiarism detection based on n-grams comparison. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 696–700. Springer, Heidelberg (2009)
Chapter Google Scholar
Lin, J., Yeh, J., Ke, H., Yang, W.: Learning to rank for information retrieval using genetic programming. In: Proceedings of SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, Amsterdam, Netherland (2007)
Google Scholar
Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: ICDE, pp. 257–266 (2008)
Google Scholar
Prahlad, F., Lee, W.: Q-gram matching using tree models. IEEE Transactions on Knowledge and Data Engineering 18(4), 433–447 (2006)
Article Google Scholar
Niewiadomski, A.: Methods for the linguistic summarization of data: application of fuzzy sets and their extensions. Akademicka Oficyna Wydawnicza EXIT, Warszawa (2008)
Google Scholar
Akinwale, A.T., Niewiadomski, A.: New similarity measures in electronic tests at programming languages. In: IIIrd Conference TEWI “Technologia Edukacja Wiedza Innowacja”, Łódź Poland, July 03 (2012)
Google Scholar
Niewiadomski, A.: Interval-valued data structures and their application to e-learning. In: Vojtáš, P., Bieliková, M., Charron-Bost, B., Sýkora, O. (eds.) SOFSEM 2005. LNCS, vol. 3381, pp. 403–407. Springer, Heidelberg (2005)
Chapter Google Scholar
Niewiadomski, A., Kryger, P., Szczepaniak, P.S.: Fuzzy Comparison of Strings in FAQ Answering. In: Abramowicz, W. (ed.) Proceedings of the 7th Business Information Systems, Kwietnia 21-23, pp. 355–362. Poznań (2004)
Google Scholar
Buckles, B.P., Petry, F.E.: Information theoretic characterization of fuzzy relational databases. IEEE Transaction Systems Man Cybernet. 13(1), 74–77 (1983)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Technology, Lodz University of Technology, Lodz, Poland
Adam Niewiadomski & Adio Akinwale

Authors

Adam Niewiadomski
View author publications
You can also search for this author in PubMed Google Scholar
Adio Akinwale
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer and Information Technology Department, University of Craiova, Bvd. Decebal 107, 200440, Craiova, Romania
Costin Bǎdicǎ
Institute of Informatics, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland
Ngoc Thanh Nguyen
Computer and Information Technology Department, University of Craiova, Bvd. Decebal 107, 200440, Craiova, Romania
Marius Brezovan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Niewiadomski, A., Akinwale, A. (2013). Efficient n-Gram-Based String Matching in Electronic Testing at Programming. In: Bǎdicǎ, C., Nguyen, N.T., Brezovan, M. (eds) Computational Collective Intelligence. Technologies and Applications. ICCCI 2013. Lecture Notes in Computer Science(), vol 8083. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40495-5_66

Download citation

DOI: https://doi.org/10.1007/978-3-642-40495-5_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40494-8
Online ISBN: 978-3-642-40495-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics