Skip to main content

Efficient n-Gram-Based String Matching in Electronic Testing at Programming

  • Conference paper
  • 2003 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8083))

Abstract

The purpose of this study is to explore the grammatical properties and features of string matching based on n-grams techniques and to apply them to electronic testing at programming languages. Because of the intensive and extensive availability of internet in the academic environment, there exists a real need for computational procedures within artificial intelligence methods that support assessment of examination questions for uniformity and consistency. There are many computer-aided assessment packages, mainly designed for single and multiple choice tests, but they are not suitable for electronic testing at programming languages. n-grams based string matching is being successfully applied to document retrieval and other natural language processing applications. Generalized n-grams matching during substring processing tends to be time-consuming since there are N 2 n-grams extracted where n is the length of a (sub)string. The choice of selecting parameter n in n-grams approximately is an important task since the large size of n leads to polynomial growth and its small size may shorten search time significantly. As the result, some new string matching methods based on n-grams are proposed for the improvement of generalized n-grams. Experiments are conducted with the method using programming language codes as both pattern and text matching. The results are compared to chosen existing methods. We found the obtained results very promising and suggest that the proposed methods can be successfully applied to electronic testing at programming languages as an intelligent support for teachers involved in e-learning processes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adamson, G.W., Boreham, J.: The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Information Storage and Retrieval 10, 253–260 (1974)

    Article  Google Scholar 

  2. Angel, R., Frennd, G., Willet, P.: Automatic spelling corection using a trigram similarity measure. Information Processing and Management 19(4), 255–261 (1983)

    Article  Google Scholar 

  3. Damerau, F.: A technique for computer detection and correction of spelling errors. Communication of ACM 7, 171–176 (1964)

    Article  Google Scholar 

  4. Pfeifer, U., Poersh, T., Fubr, N.: Retrieval effectiveness of proper name search methods. Information Processing and Management 32(6), 667–679 (1996)

    Article  Google Scholar 

  5. Zamora, E., Pollack, J., Zamora, A.: The use of trigram analysis for spelling error detection. Information and Management 17(6), 305–316 (1981)

    Google Scholar 

  6. Damashek, M.: Gauging similarity with n-gram language-independence categorization of text. Science 276, 845–848 (1995)

    Google Scholar 

  7. Schuegraf, E.J., Heaps, H.S.: Selection of equifrequent word fragments for information retrieval. Information Storage and Retrieval 9, 697–711 (1973)

    Article  Google Scholar 

  8. Celko, J.: A Sql programming. Morgan Kaufman Publishers (1995)

    Google Scholar 

  9. Harrison, M.: Implementation of the substring test by hashing. Communication of the ACM 14(12), 777–779 (1971)

    Article  Google Scholar 

  10. Church, K.W., Gale, W.A.: A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of english bigrams. Computer Speech Language 5(1), 19–54 (1991)

    Article  Google Scholar 

  11. Kuhn, R., De Mori, R.: A cache-based natural language model for speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(6), 570–583 (1990)

    Article  Google Scholar 

  12. Niesler, T.R., Woodland, P.C.: A variable-length category-based n-gram language model. In: Proceedings, IEEE ICASSP, pp. 164–167 (1996)

    Google Scholar 

  13. Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram based detection of new malicious code. In: COMPSAC 2004 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts, vol. 02, pp. 41–42 (2004)

    Google Scholar 

  14. Ibrahim, A., Abu-Bakar, Z.: Automated grading of linear algebraic equation using n-gram method. Master’s thesis, Pensyarah matmatik FTMSK, Kampus, Cawangen, Kuala Pilar (2005)

    Google Scholar 

  15. Ukkonen, E.: On approximate string matching in fct. Science (1983)

    Google Scholar 

  16. Barrón-Cedeño, A., Rosso, P.: On automatic plagiarism detection based on n-grams comparison. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 696–700. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  17. Lin, J., Yeh, J., Ke, H., Yang, W.: Learning to rank for information retrieval using genetic programming. In: Proceedings of SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, Amsterdam, Netherland (2007)

    Google Scholar 

  18. Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: ICDE, pp. 257–266 (2008)

    Google Scholar 

  19. Prahlad, F., Lee, W.: Q-gram matching using tree models. IEEE Transactions on Knowledge and Data Engineering 18(4), 433–447 (2006)

    Article  Google Scholar 

  20. Niewiadomski, A.: Methods for the linguistic summarization of data: application of fuzzy sets and their extensions. Akademicka Oficyna Wydawnicza EXIT, Warszawa (2008)

    Google Scholar 

  21. Akinwale, A.T., Niewiadomski, A.: New similarity measures in electronic tests at programming languages. In: IIIrd Conference TEWI “Technologia Edukacja Wiedza Innowacja”, Łódź Poland, July 03 (2012)

    Google Scholar 

  22. Niewiadomski, A.: Interval-valued data structures and their application to e-learning. In: Vojtáš, P., Bieliková, M., Charron-Bost, B., Sýkora, O. (eds.) SOFSEM 2005. LNCS, vol. 3381, pp. 403–407. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  23. Niewiadomski, A., Kryger, P., Szczepaniak, P.S.: Fuzzy Comparison of Strings in FAQ Answering. In: Abramowicz, W. (ed.) Proceedings of the 7th Business Information Systems, Kwietnia 21-23, pp. 355–362. Poznań (2004)

    Google Scholar 

  24. Buckles, B.P., Petry, F.E.: Information theoretic characterization of fuzzy relational databases. IEEE Transaction Systems Man Cybernet. 13(1), 74–77 (1983)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Niewiadomski, A., Akinwale, A. (2013). Efficient n-Gram-Based String Matching in Electronic Testing at Programming. In: Bǎdicǎ, C., Nguyen, N.T., Brezovan, M. (eds) Computational Collective Intelligence. Technologies and Applications. ICCCI 2013. Lecture Notes in Computer Science(), vol 8083. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40495-5_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40495-5_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40494-8

  • Online ISBN: 978-3-642-40495-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics