Abstract
This paper addresses the task of learning sentence similarity on pairs of relevant sentences retrieved from a Quranic Retrieval Application (QRA). With the existing keywords and semantic concepts extraction, a long list of relevant verses (sentences) is retrieved that matches the query. However, as Quranic concepts are repeatedly conveyed on scattered sentences, it is important to classify which of the retrieved sentences are similar not only in word function but in context with subsequence words. Information context on similar sentences is realized with the evaluation of both word similarity and relatedness. This paper proposed a multi-word Term Similarity and Retrieval (mTSR) model that uses the n-gram score function that measures the relatedness of subsequent words. Bigram similarity scores are constructed between every pair of the relevant Quranic sentences, which boost the conventional keyword matched QRA. A similarity score is established to refine the list of relevant sentences aimed to help the user to understand the scattered content of the documents. The results are presented to the user as a refined list of similar sentences, by ranking the first-retrieved results from a keyword search. The ranking is done using a bigram score. When the score is tested on the Malay Quranic Retrieval Application (myQRA) prototype, results show that the refined results accurately matched the manually perceived similar sentences (iS) extracted by the three volunteers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Recent accessed on March 2019 from http://www.islam.gov.my/e-jakim/e-quran/terjemahan-al-quran.
- 2.
Recent accessed in March 2019 from https://blog.kerul.net/2014/01/list-of-malay-stop-words.html.
- 3.
Al-Quran (Tafsir & by Word) produced by Greentech Apps Foundation, Accessed on 30th July 2018.
References
Corley, C., Mihalcea, R.: Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pp. 13–18. Association for Computational Linguistics, Ann Arbor (2005)
Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discov. Data 2(2), 1–25 (2008)
Isa, N., Puteh, M., Kamarudin, R.: Sentiment classification of Malay newspaper using immune network (SCIN), vol. 3 (2013)
Abd-Rashid, A., et al.: Word sense disambiguation using fuzzy semanticbased string similarity model. Malays. J. Comput. 3(2), 154–161 (2018)
Sazali, S.S., Bakar, Z.A. Jaafar, J.: Word prediction algorithm in resolving ambiguity in Malay text. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom) (2016)
Groves, I., Tian, Y., Douratsos, I.: Treat the system like a human student: automatic naturalness evaluation of generated text without reference texts. In: Proceedings of the 11th International Conference on Natural Language Generation (2018)
Zulkefli, N.S.S., Rahman, N.A., Puteh, M.: A survey: framework of an information retrieval for Malay translated hadith document. In: MATEC Web Conference, vol. 135, p. 00073 (2017)
Ismail, R., Bakar, Z.A., Rahman, N.A.: Extracting knowledge from English translated Quran using NLP pattern. J. Teknol. 77(19) (2015)
AbuShareah, E., et al.: A hybrid approach for indexing and searching the holy Quran (2014)
AlMaayah, M., Sawalha, M., Abushariah, M.: A proposed model for Quranic Arabic WordNet. In: Brierley, C., Sawalha, M., Atwell, E. (eds.) LRA, pp. 9–13 (2014)
Adeleke, A., et al.: A two-step feature selection method for quranic text classification. Indones. J. Electr. Eng. Comput. Sci. 16(2), 730–736 (2019)
Ahmad, N., Bennett, B., Atwell, E.: Semantic-based ontology for Malay Qur’an reader. In: 4th International Conference on Islamic Applications in Computer Science and Technologies, Khartoum, Sudan (2016)
Ta’a, A., Abed, Q.A., Ahmad, M.: Al-Quran ontology based on knowledge themes. J. Fundam. Appl. Sci. 9(5S), 800–817 (2018)
Trstenjak, B., Mikac, S., Donko, D.: KNN with TF-IDF based framework for text categorization. Proc. Eng. 69, 1356–1364 (2014)
Sembok, T.M.T., Bakar, Z.A, Ahmad, F.: Experiments in Malay information retrieval. In: 2011 International Conference on Electrical Engineering and Informatics, 17–19 July 2011, Bandung, Indonesia (2011)
Husin, M.Z., Saad, S., Noah, S.A.M.: Syntactic rule-based approach for extracting concepts from quranic translation text. In: 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI) (2017)
Bakar, Z.A., Rahman, N.A.: Evaluating the effectiveness of thesaurus and stemming methods in retrieving Malay translated Al-Quran documents. In: Sembok, T.M.T., Zaman, H.B., Chen, H., Urs, Shalini R., Myaeng, S.-H. (eds.) ICADL 2003. LNCS, vol. 2911, pp. 653–662. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24594-0_67
Rahman, N.A., Bakar, Z., Sembok, T.: Query expansion using thesaurus in improving Malay Hadith retrieval system, vol. 3, 1404–1409 (2010)
Ahmad, N.D., Bennett, B., Atwell, E.: Retrieval performance for malay quran. Int. J. Islamic Appl. Comput. Sci. Technol. 5(2), 13–25 (2017)
Kwee, A.T., Tsai, F.S., Tang, W.: Sentence-level novelty detection in English and Malay. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 40–51. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_7
Acknowledgment
Special thanks to the members of the MuDIR Research Interest Group for their kind suggestion to improve this paper. The project is supported by Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hanum, H.M., Rasip, N.F., Abu Bakar, Z. (2019). Multi-word Similarity and Retrieval Model for a Refined Retrieval of Quranic Sentences. In: Badioze Zaman, H., et al. Advances in Visual Informatics. IVIC 2019. Lecture Notes in Computer Science(), vol 11870. Springer, Cham. https://doi.org/10.1007/978-3-030-34032-2_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-34032-2_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34031-5
Online ISBN: 978-3-030-34032-2
eBook Packages: Computer ScienceComputer Science (R0)