Skip to main content

Multi-word Similarity and Retrieval Model for a Refined Retrieval of Quranic Sentences

  • Conference paper
Advances in Visual Informatics (IVIC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11870))

Included in the following conference series:

  • 1657 Accesses

Abstract

This paper addresses the task of learning sentence similarity on pairs of relevant sentences retrieved from a Quranic Retrieval Application (QRA). With the existing keywords and semantic concepts extraction, a long list of relevant verses (sentences) is retrieved that matches the query. However, as Quranic concepts are repeatedly conveyed on scattered sentences, it is important to classify which of the retrieved sentences are similar not only in word function but in context with subsequence words. Information context on similar sentences is realized with the evaluation of both word similarity and relatedness. This paper proposed a multi-word Term Similarity and Retrieval (mTSR) model that uses the n-gram score function that measures the relatedness of subsequent words. Bigram similarity scores are constructed between every pair of the relevant Quranic sentences, which boost the conventional keyword matched QRA. A similarity score is established to refine the list of relevant sentences aimed to help the user to understand the scattered content of the documents. The results are presented to the user as a refined list of similar sentences, by ranking the first-retrieved results from a keyword search. The ranking is done using a bigram score. When the score is tested on the Malay Quranic Retrieval Application (myQRA) prototype, results show that the refined results accurately matched the manually perceived similar sentences (iS) extracted by the three volunteers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Recent accessed on March 2019 from http://www.islam.gov.my/e-jakim/e-quran/terjemahan-al-quran.

  2. 2.

    Recent accessed in March 2019 from https://blog.kerul.net/2014/01/list-of-malay-stop-words.html.

  3. 3.

    Al-Quran (Tafsir & by Word) produced by Greentech Apps Foundation, Accessed on 30th July 2018.

References

  1. Corley, C., Mihalcea, R.: Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pp. 13–18. Association for Computational Linguistics, Ann Arbor (2005)

    Google Scholar 

  2. Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discov. Data 2(2), 1–25 (2008)

    Article  Google Scholar 

  3. Isa, N., Puteh, M., Kamarudin, R.: Sentiment classification of Malay newspaper using immune network (SCIN), vol. 3 (2013)

    Google Scholar 

  4. Abd-Rashid, A., et al.: Word sense disambiguation using fuzzy semanticbased string similarity model. Malays. J. Comput. 3(2), 154–161 (2018)

    Article  Google Scholar 

  5. Sazali, S.S., Bakar, Z.A. Jaafar, J.: Word prediction algorithm in resolving ambiguity in Malay text. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom) (2016)

    Google Scholar 

  6. Groves, I., Tian, Y., Douratsos, I.: Treat the system like a human student: automatic naturalness evaluation of generated text without reference texts. In: Proceedings of the 11th International Conference on Natural Language Generation (2018)

    Google Scholar 

  7. Zulkefli, N.S.S., Rahman, N.A., Puteh, M.: A survey: framework of an information retrieval for Malay translated hadith document. In: MATEC Web Conference, vol. 135, p. 00073 (2017)

    Google Scholar 

  8. Ismail, R., Bakar, Z.A., Rahman, N.A.: Extracting knowledge from English translated Quran using NLP pattern. J. Teknol. 77(19) (2015)

    Google Scholar 

  9. AbuShareah, E., et al.: A hybrid approach for indexing and searching the holy Quran (2014)

    Google Scholar 

  10. AlMaayah, M., Sawalha, M., Abushariah, M.: A proposed model for Quranic Arabic WordNet. In: Brierley, C., Sawalha, M., Atwell, E. (eds.) LRA, pp. 9–13 (2014)

    Google Scholar 

  11. Adeleke, A., et al.: A two-step feature selection method for quranic text classification. Indones. J. Electr. Eng. Comput. Sci. 16(2), 730–736 (2019)

    Google Scholar 

  12. Ahmad, N., Bennett, B., Atwell, E.: Semantic-based ontology for Malay Qur’an reader. In: 4th International Conference on Islamic Applications in Computer Science and Technologies, Khartoum, Sudan (2016)

    Google Scholar 

  13. Ta’a, A., Abed, Q.A., Ahmad, M.: Al-Quran ontology based on knowledge themes. J. Fundam. Appl. Sci. 9(5S), 800–817 (2018)

    Article  Google Scholar 

  14. Trstenjak, B., Mikac, S., Donko, D.: KNN with TF-IDF based framework for text categorization. Proc. Eng. 69, 1356–1364 (2014)

    Article  Google Scholar 

  15. Sembok, T.M.T., Bakar, Z.A, Ahmad, F.: Experiments in Malay information retrieval. In: 2011 International Conference on Electrical Engineering and Informatics, 17–19 July 2011, Bandung, Indonesia (2011)

    Google Scholar 

  16. Husin, M.Z., Saad, S., Noah, S.A.M.: Syntactic rule-based approach for extracting concepts from quranic translation text. In: 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI) (2017)

    Google Scholar 

  17. Bakar, Z.A., Rahman, N.A.: Evaluating the effectiveness of thesaurus and stemming methods in retrieving Malay translated Al-Quran documents. In: Sembok, T.M.T., Zaman, H.B., Chen, H., Urs, Shalini R., Myaeng, S.-H. (eds.) ICADL 2003. LNCS, vol. 2911, pp. 653–662. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24594-0_67

    Chapter  Google Scholar 

  18. Rahman, N.A., Bakar, Z., Sembok, T.: Query expansion using thesaurus in improving Malay Hadith retrieval system, vol. 3, 1404–1409 (2010)

    Google Scholar 

  19. Ahmad, N.D., Bennett, B., Atwell, E.: Retrieval performance for malay quran. Int. J. Islamic Appl. Comput. Sci. Technol. 5(2), 13–25 (2017)

    Google Scholar 

  20. Kwee, A.T., Tsai, F.S., Tang, W.: Sentence-level novelty detection in English and Malay. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 40–51. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_7

    Chapter  Google Scholar 

Download references

Acknowledgment

Special thanks to the members of the MuDIR Research Interest Group for their kind suggestion to improve this paper. The project is supported by Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haslizatul Mohamed Hanum .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Cite this paper

Hanum, H.M., Rasip, N.F., Abu Bakar, Z. (2019). Multi-word Similarity and Retrieval Model for a Refined Retrieval of Quranic Sentences. In: Badioze Zaman, H., et al. Advances in Visual Informatics. IVIC 2019. Lecture Notes in Computer Science(), vol 11870. Springer, Cham. https://doi.org/10.1007/978-3-030-34032-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34032-2_34

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34031-5

  • Online ISBN: 978-3-030-34032-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics