Skip to main content

A Group-Based Feature Selection Approach to Improve Classification of Holy Quran Verses

  • Conference paper
  • First Online:
Recent Advances on Soft Computing and Data Mining (SCDM 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 700))

Included in the following conference series:

Abstract

Most existing feature selection approach is limited to determine features from a single source of data. In this paper, a feature selection approach is proposed to consider multiple sources of textual data. The proposed GBFS approach is then applied to label Quranic verses based on two major references, the English translation and tafsir (Commentary). The verses were selected from two chapters, Surah Al-Baqarah and Surah Al-Anaam. The verses are classified into three categories: Faith, Worship, and Etiquette. The textual data from the translation and commentary were preprocessed using StringToWord Vector with weighted TF-IDF. Feature selection algorithms: information gain, chi square, Pearson correlation coefficient, relief, and correlation-based were experimented on four classifiers: naïve Bayes, libSVM, k-NN, and decision trees (J48). The proposed group-based feature selection approach has shown promising results in terms of Accuracy and Area under Receiver Operating Characteristics (ROC) curve (AUC) by achieving Accuracy of 94.5% and AUC of 0.944.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ivanovic, M., Radovanovic, M.: Modern machine learning techniques and their applications. In: International Conference on Electronics, Communications and Networks (2015)

    Google Scholar 

  2. Das, S., Dey, A., Pal, A., Roy, N.: Applications of Artificial Intelligence in Machine Learning: Review and Prospect. J. of Comput. Appl. 115, 31–41 (2015)

    Google Scholar 

  3. Talwar, A., Kumar, Y.: Machine Learning: An Artificial Intelligence Methodology. J. Eng. Comput. Sci. 2, 3400–3404 (2013)

    Google Scholar 

  4. Pundir, P., Gomanse, V., Krishnamacharya, N.: Classification and prediction techniques using machine learning for anomaly detection. J. Eng. Res. Appl. 1, 1716–1722 (2013)

    Google Scholar 

  5. Tang, J., Alelyani, S., Lin, H.: Feature selection for classification: a review. In: Data Classification: Algorithms and Applications. CRC Press (2014)

    Google Scholar 

  6. Faraz, A.: An elaboration of text categorization and automatic text classification through mathematical and graphical modelling. Comput. Sci. Eng. Int. J. 5, 1–11 (2015)

    Google Scholar 

  7. Hilal, A., Srinivas, N.: Analytical of the initial holy Quran letters based on data mining study. Am. Int. J. Res. Formal Appl. Nat. Sci. 10, 1–8 (2015)

    Google Scholar 

  8. Alhawarat, M.: Extracting Topics from the Holy Quran using generative models. J. Advanc. Comput. Sci. Appl. 6, 288–294 (2015)

    Google Scholar 

  9. Prusa, J.D., Khoshgoftaar, T.M., Dittman, D.J.: Impact of feature selection techniques for tweet sentiment classification. In: Proceedings of the Twenty-Eight International Florida Artificial Intelligence Research Society Conference. pp. 299–304 (2015)

    Google Scholar 

  10. Hamed, S.K., Ab Aziz, M.J.: A question answering system on holy Quran translation based on question expansion technique and neural network classification. J. Comput. Sci. 12, 169–177 (2016)

    Article  Google Scholar 

  11. Hamoud, B., Atwell, E.: Quran question and answer corpus for data mining with WEKA, pp. 211–216. IEEE Conference of Basic Sciences and Engineering Studies, Leeds (2016)

    Google Scholar 

  12. Akour, M., Alsmadi, I., Alazzam, I.: MQVC: measuring Quranic verses similarity and Surah classification using N-Gram. WSEAS Trans. Comput. 13, 485–491 (2014)

    Google Scholar 

  13. Siddiqui, M.K., Naahid, S., Khan, M.N.I.: A review of Quranic web portals through data mining. VAWKUM Trans. Comput. Sci. 5, 1–7 (2014)

    Google Scholar 

  14. Jamil, N.S., Ku-mahamud, K.R., Din, A.M., Ahmad, F., Chepa, N., Ishak, W.H.W., Din, R., Ahmad, F.K.: A subject identification method based on term frequency technique. J. Advanc. Comput. Res. 7, 103–110 (2017)

    Article  Google Scholar 

  15. Goudjil, M., Bedda, M., Koudil, M., Ghoggali, N.: Using active learning in text classification of Quranic sciences. In: International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, pp. 209–213 (2015)

    Google Scholar 

  16. Hassan, G.S., Mohammad, S.K., Alwan, F.M.: Categorization of ‘Holy Quran Tafseer’ using k-Nearest neighbour algorithm. Int. J. Comput. Appl. 129, 1–6 (2015)

    Google Scholar 

  17. Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, 2nd edn. Cambridge University Press, England (2014)

    Book  Google Scholar 

  18. Menaka, S., Radha, N.: Text classification using keyword extraction technique. J. Advanc. Res. Comput. Sci. Software Eng. 3, 734–740 (2013)

    Google Scholar 

  19. Chen, J., Chen, C., Liang, Y.: Optimized TF-IDF algorithm with the adaptive weight of position of word. Advanc. Intelligen. Syst. Res. 133, 114–117 (2016)

    Google Scholar 

  20. Eid, H.F., Hassanien, A.E., Kim, T.H., Banerjee, S.: Linear correlation-based feature selection for network intrusion detection model. Advanc. Security Informat. Commun. Netw. 381, 240–248 (2013)

    Google Scholar 

  21. Tang, B., He, H., Baggenstoss, P.M., Kay, S.: A Bayesian classification approach using class-specific features for text categorization. IEEE Trans. Knowl. Data Eng. 28, 1602–1606 (2016)

    Article  Google Scholar 

  22. Zharmagambetov, A.S., Pak, A.A.: Sentiment analysis of document using deep learning and decision trees. In: Twelve IEEE International Conference on Electronics Computer and Computation, pp. 1–4 (2015)

    Google Scholar 

  23. Wang, J.H., Wang, H.Y.: Incremental Neural Network Construction for Text Classification. In: IEEE International Symposium on Computer Consumer and Control, pp. 970–973 (2014)

    Google Scholar 

  24. Sabbah, T., Selamat, A.: Support vector machine based approach for Quranic words detection in online textual content. In: 8th IEEE Malaysian Software Engineering Conference, Malaysia, pp. 325–330 (2014)

    Google Scholar 

  25. Townsend, K.R., Sun, S., Johson, T., Attia, O.G., Jones, P.H., Zambreno, J.: k-NN text classification using an FPGA-based sparse matrix vector multiplication accelerator. In: IEEE International Conference on Electro/Information Technology, pp. 257–263 (2015)

    Google Scholar 

  26. Gharehchopogh, F.S., Khaze, S.R., Maleki, I.: A new approach in bloggers classification with hybrid of k-nearest neighbor and artificial neural network algorithms. Indian J. Sci. Technol. 8, 237–246 (2015)

    Article  Google Scholar 

  27. Dey, L., Chakraborty, S., Biswas, A., Bose, B., Tiwari, S.: Sentiment analysis of review datasets using Naïve Bayes’ and k- NN classifiers. J. Informat. Eng. Electron. Business. 4, 54–62 (2016)

    Google Scholar 

  28. Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an enhanced Naïve Bayes model. In: Intelligent Data Engineering and Automated Learning. 14th International Conference Proceedings, Springer, Berlin Heidelberg, vol. 8206, pp. 194–201 (2013)

    Google Scholar 

  29. Nikam, S.S.: A comparative study of classification techniques in data mining algorithms. Comput. Sci. Technol. 8, 13–19 (2015)

    Google Scholar 

  30. Amarappa, S., Sathyanarayana, S.V.: Data classification using support vector machine (SVM), a simplified approach. J. Electron. Comput. Sci. Engineering. 3, 435–445 (2014)

    Google Scholar 

  31. Sewaiwar, P., Verma, K.K.: Comparative study of various decision tree classification algorithm using WEKA. J. Emerging Res. Manag. Technol. 4, 87–91 (2015)

    Google Scholar 

  32. Teli, S., Kanikar, P.: A survey on decision tree based approaches in data mining. J. Advanc. Res. Comput. Sci. Soft. Eng. 5, 613–617 (2015)

    Google Scholar 

  33. Adamatti, D.F., Silveira, J.A., Carvalho, F.A.H.: Analyzing brain signals using decision trees: an approach based on neuroscience. Revista Eletronica Argentina-Brasil de Technologies da informacao e da Communicacao. 1, 5 (2016)

    Google Scholar 

  34. Santra, A.K., Christy, C.J.: Genetic algorithm and confusion matrix for document clustering. Int. J. Comput. Sci. Iss. 9, 322–328 (2012)

    Google Scholar 

  35. Yang, J., Qu, Z., Liu, Z.: Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization. Scientific World J. 1–17 (2014)

    Google Scholar 

  36. Hossin, M., Sulaiman, M.N.: A review on evaluation metrics for data classification evaluations. Int. J. Data Mining Knowledge Manag. Process. 5, 1–11 (2015)

    Google Scholar 

  37. Adeleke, A.O., Samsudin, N.A., Mustapha, A., Nawi, N.M.: Comparative analysis of text classification algorithms for automated labelling of Quranic verses. Int. J. Advanc. Sci. Eng. Info. Tech. 7, 1419–1427 (2017)

    Google Scholar 

Download references

Acknowledgements

This study was supported in part by a grant from the Ministry of Education of Malaysia, Research Acculturation Grant Scheme (RAGS) Vot R045, a grant from Universiti Tun Hussein Onn Malaysia Vot U611, and in part by a grant from Research Gates IT Solution Sdn. Bhd.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdullahi O. Adeleke .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Adeleke, A.O., Samsudin, N.A., Mustapha, A., Nawi, N.M. (2018). A Group-Based Feature Selection Approach to Improve Classification of Holy Quran Verses. In: Ghazali, R., Deris, M., Nawi, N., Abawajy, J. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2018. Advances in Intelligent Systems and Computing, vol 700. Springer, Cham. https://doi.org/10.1007/978-3-319-72550-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72550-5_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72549-9

  • Online ISBN: 978-3-319-72550-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics