Abstract
The purpose of this study is to explore the grammatical properties and features of string matching based on n-grams techniques and to apply them to electronic testing at programming languages. Because of the intensive and extensive availability of internet in the academic environment, there exists a real need for computational procedures within artificial intelligence methods that support assessment of examination questions for uniformity and consistency. There are many computer-aided assessment packages, mainly designed for single and multiple choice tests, but they are not suitable for electronic testing at programming languages. n-grams based string matching is being successfully applied to document retrieval and other natural language processing applications. Generalized n-grams matching during substring processing tends to be time-consuming since there are N 2 n-grams extracted where n is the length of a (sub)string. The choice of selecting parameter n in n-grams approximately is an important task since the large size of n leads to polynomial growth and its small size may shorten search time significantly. As the result, some new string matching methods based on n-grams are proposed for the improvement of generalized n-grams. Experiments are conducted with the method using programming language codes as both pattern and text matching. The results are compared to chosen existing methods. We found the obtained results very promising and suggest that the proposed methods can be successfully applied to electronic testing at programming languages as an intelligent support for teachers involved in e-learning processes.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adamson, G.W., Boreham, J.: The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Information Storage and Retrieval 10, 253–260 (1974)
Angel, R., Frennd, G., Willet, P.: Automatic spelling corection using a trigram similarity measure. Information Processing and Management 19(4), 255–261 (1983)
Damerau, F.: A technique for computer detection and correction of spelling errors. Communication of ACM 7, 171–176 (1964)
Pfeifer, U., Poersh, T., Fubr, N.: Retrieval effectiveness of proper name search methods. Information Processing and Management 32(6), 667–679 (1996)
Zamora, E., Pollack, J., Zamora, A.: The use of trigram analysis for spelling error detection. Information and Management 17(6), 305–316 (1981)
Damashek, M.: Gauging similarity with n-gram language-independence categorization of text. Science 276, 845–848 (1995)
Schuegraf, E.J., Heaps, H.S.: Selection of equifrequent word fragments for information retrieval. Information Storage and Retrieval 9, 697–711 (1973)
Celko, J.: A Sql programming. Morgan Kaufman Publishers (1995)
Harrison, M.: Implementation of the substring test by hashing. Communication of the ACM 14(12), 777–779 (1971)
Church, K.W., Gale, W.A.: A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of english bigrams. Computer Speech Language 5(1), 19–54 (1991)
Kuhn, R., De Mori, R.: A cache-based natural language model for speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(6), 570–583 (1990)
Niesler, T.R., Woodland, P.C.: A variable-length category-based n-gram language model. In: Proceedings, IEEE ICASSP, pp. 164–167 (1996)
Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram based detection of new malicious code. In: COMPSAC 2004 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts, vol. 02, pp. 41–42 (2004)
Ibrahim, A., Abu-Bakar, Z.: Automated grading of linear algebraic equation using n-gram method. Master’s thesis, Pensyarah matmatik FTMSK, Kampus, Cawangen, Kuala Pilar (2005)
Ukkonen, E.: On approximate string matching in fct. Science (1983)
Barrón-Cedeño, A., Rosso, P.: On automatic plagiarism detection based on n-grams comparison. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 696–700. Springer, Heidelberg (2009)
Lin, J., Yeh, J., Ke, H., Yang, W.: Learning to rank for information retrieval using genetic programming. In: Proceedings of SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, Amsterdam, Netherland (2007)
Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: ICDE, pp. 257–266 (2008)
Prahlad, F., Lee, W.: Q-gram matching using tree models. IEEE Transactions on Knowledge and Data Engineering 18(4), 433–447 (2006)
Niewiadomski, A.: Methods for the linguistic summarization of data: application of fuzzy sets and their extensions. Akademicka Oficyna Wydawnicza EXIT, Warszawa (2008)
Akinwale, A.T., Niewiadomski, A.: New similarity measures in electronic tests at programming languages. In: IIIrd Conference TEWI “Technologia Edukacja Wiedza Innowacja”, Łódź Poland, July 03 (2012)
Niewiadomski, A.: Interval-valued data structures and their application to e-learning. In: Vojtáš, P., Bieliková, M., Charron-Bost, B., Sýkora, O. (eds.) SOFSEM 2005. LNCS, vol. 3381, pp. 403–407. Springer, Heidelberg (2005)
Niewiadomski, A., Kryger, P., Szczepaniak, P.S.: Fuzzy Comparison of Strings in FAQ Answering. In: Abramowicz, W. (ed.) Proceedings of the 7th Business Information Systems, Kwietnia 21-23, pp. 355–362. Poznań (2004)
Buckles, B.P., Petry, F.E.: Information theoretic characterization of fuzzy relational databases. IEEE Transaction Systems Man Cybernet. 13(1), 74–77 (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Niewiadomski, A., Akinwale, A. (2013). Efficient n-Gram-Based String Matching in Electronic Testing at Programming. In: Bǎdicǎ, C., Nguyen, N.T., Brezovan, M. (eds) Computational Collective Intelligence. Technologies and Applications. ICCCI 2013. Lecture Notes in Computer Science(), vol 8083. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40495-5_66
Download citation
DOI: https://doi.org/10.1007/978-3-642-40495-5_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40494-8
Online ISBN: 978-3-642-40495-5
eBook Packages: Computer ScienceComputer Science (R0)