Abstract
Most existing feature selection approach is limited to determine features from a single source of data. In this paper, a feature selection approach is proposed to consider multiple sources of textual data. The proposed GBFS approach is then applied to label Quranic verses based on two major references, the English translation and tafsir (Commentary). The verses were selected from two chapters, Surah Al-Baqarah and Surah Al-Anaam. The verses are classified into three categories: Faith, Worship, and Etiquette. The textual data from the translation and commentary were preprocessed using StringToWord Vector with weighted TF-IDF. Feature selection algorithms: information gain, chi square, Pearson correlation coefficient, relief, and correlation-based were experimented on four classifiers: naïve Bayes, libSVM, k-NN, and decision trees (J48). The proposed group-based feature selection approach has shown promising results in terms of Accuracy and Area under Receiver Operating Characteristics (ROC) curve (AUC) by achieving Accuracy of 94.5% and AUC of 0.944.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ivanovic, M., Radovanovic, M.: Modern machine learning techniques and their applications. In: International Conference on Electronics, Communications and Networks (2015)
Das, S., Dey, A., Pal, A., Roy, N.: Applications of Artificial Intelligence in Machine Learning: Review and Prospect. J. of Comput. Appl. 115, 31–41 (2015)
Talwar, A., Kumar, Y.: Machine Learning: An Artificial Intelligence Methodology. J. Eng. Comput. Sci. 2, 3400–3404 (2013)
Pundir, P., Gomanse, V., Krishnamacharya, N.: Classification and prediction techniques using machine learning for anomaly detection. J. Eng. Res. Appl. 1, 1716–1722 (2013)
Tang, J., Alelyani, S., Lin, H.: Feature selection for classification: a review. In: Data Classification: Algorithms and Applications. CRC Press (2014)
Faraz, A.: An elaboration of text categorization and automatic text classification through mathematical and graphical modelling. Comput. Sci. Eng. Int. J. 5, 1–11 (2015)
Hilal, A., Srinivas, N.: Analytical of the initial holy Quran letters based on data mining study. Am. Int. J. Res. Formal Appl. Nat. Sci. 10, 1–8 (2015)
Alhawarat, M.: Extracting Topics from the Holy Quran using generative models. J. Advanc. Comput. Sci. Appl. 6, 288–294 (2015)
Prusa, J.D., Khoshgoftaar, T.M., Dittman, D.J.: Impact of feature selection techniques for tweet sentiment classification. In: Proceedings of the Twenty-Eight International Florida Artificial Intelligence Research Society Conference. pp. 299–304 (2015)
Hamed, S.K., Ab Aziz, M.J.: A question answering system on holy Quran translation based on question expansion technique and neural network classification. J. Comput. Sci. 12, 169–177 (2016)
Hamoud, B., Atwell, E.: Quran question and answer corpus for data mining with WEKA, pp. 211–216. IEEE Conference of Basic Sciences and Engineering Studies, Leeds (2016)
Akour, M., Alsmadi, I., Alazzam, I.: MQVC: measuring Quranic verses similarity and Surah classification using N-Gram. WSEAS Trans. Comput. 13, 485–491 (2014)
Siddiqui, M.K., Naahid, S., Khan, M.N.I.: A review of Quranic web portals through data mining. VAWKUM Trans. Comput. Sci. 5, 1–7 (2014)
Jamil, N.S., Ku-mahamud, K.R., Din, A.M., Ahmad, F., Chepa, N., Ishak, W.H.W., Din, R., Ahmad, F.K.: A subject identification method based on term frequency technique. J. Advanc. Comput. Res. 7, 103–110 (2017)
Goudjil, M., Bedda, M., Koudil, M., Ghoggali, N.: Using active learning in text classification of Quranic sciences. In: International Conference on Advances in Information Technology for the Holy Quran and Its Sciences, pp. 209–213 (2015)
Hassan, G.S., Mohammad, S.K., Alwan, F.M.: Categorization of ‘Holy Quran Tafseer’ using k-Nearest neighbour algorithm. Int. J. Comput. Appl. 129, 1–6 (2015)
Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, 2nd edn. Cambridge University Press, England (2014)
Menaka, S., Radha, N.: Text classification using keyword extraction technique. J. Advanc. Res. Comput. Sci. Software Eng. 3, 734–740 (2013)
Chen, J., Chen, C., Liang, Y.: Optimized TF-IDF algorithm with the adaptive weight of position of word. Advanc. Intelligen. Syst. Res. 133, 114–117 (2016)
Eid, H.F., Hassanien, A.E., Kim, T.H., Banerjee, S.: Linear correlation-based feature selection for network intrusion detection model. Advanc. Security Informat. Commun. Netw. 381, 240–248 (2013)
Tang, B., He, H., Baggenstoss, P.M., Kay, S.: A Bayesian classification approach using class-specific features for text categorization. IEEE Trans. Knowl. Data Eng. 28, 1602–1606 (2016)
Zharmagambetov, A.S., Pak, A.A.: Sentiment analysis of document using deep learning and decision trees. In: Twelve IEEE International Conference on Electronics Computer and Computation, pp. 1–4 (2015)
Wang, J.H., Wang, H.Y.: Incremental Neural Network Construction for Text Classification. In: IEEE International Symposium on Computer Consumer and Control, pp. 970–973 (2014)
Sabbah, T., Selamat, A.: Support vector machine based approach for Quranic words detection in online textual content. In: 8th IEEE Malaysian Software Engineering Conference, Malaysia, pp. 325–330 (2014)
Townsend, K.R., Sun, S., Johson, T., Attia, O.G., Jones, P.H., Zambreno, J.: k-NN text classification using an FPGA-based sparse matrix vector multiplication accelerator. In: IEEE International Conference on Electro/Information Technology, pp. 257–263 (2015)
Gharehchopogh, F.S., Khaze, S.R., Maleki, I.: A new approach in bloggers classification with hybrid of k-nearest neighbor and artificial neural network algorithms. Indian J. Sci. Technol. 8, 237–246 (2015)
Dey, L., Chakraborty, S., Biswas, A., Bose, B., Tiwari, S.: Sentiment analysis of review datasets using Naïve Bayes’ and k- NN classifiers. J. Informat. Eng. Electron. Business. 4, 54–62 (2016)
Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an enhanced Naïve Bayes model. In: Intelligent Data Engineering and Automated Learning. 14th International Conference Proceedings, Springer, Berlin Heidelberg, vol. 8206, pp. 194–201 (2013)
Nikam, S.S.: A comparative study of classification techniques in data mining algorithms. Comput. Sci. Technol. 8, 13–19 (2015)
Amarappa, S., Sathyanarayana, S.V.: Data classification using support vector machine (SVM), a simplified approach. J. Electron. Comput. Sci. Engineering. 3, 435–445 (2014)
Sewaiwar, P., Verma, K.K.: Comparative study of various decision tree classification algorithm using WEKA. J. Emerging Res. Manag. Technol. 4, 87–91 (2015)
Teli, S., Kanikar, P.: A survey on decision tree based approaches in data mining. J. Advanc. Res. Comput. Sci. Soft. Eng. 5, 613–617 (2015)
Adamatti, D.F., Silveira, J.A., Carvalho, F.A.H.: Analyzing brain signals using decision trees: an approach based on neuroscience. Revista Eletronica Argentina-Brasil de Technologies da informacao e da Communicacao. 1, 5 (2016)
Santra, A.K., Christy, C.J.: Genetic algorithm and confusion matrix for document clustering. Int. J. Comput. Sci. Iss. 9, 322–328 (2012)
Yang, J., Qu, Z., Liu, Z.: Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization. Scientific World J. 1–17 (2014)
Hossin, M., Sulaiman, M.N.: A review on evaluation metrics for data classification evaluations. Int. J. Data Mining Knowledge Manag. Process. 5, 1–11 (2015)
Adeleke, A.O., Samsudin, N.A., Mustapha, A., Nawi, N.M.: Comparative analysis of text classification algorithms for automated labelling of Quranic verses. Int. J. Advanc. Sci. Eng. Info. Tech. 7, 1419–1427 (2017)
Acknowledgements
This study was supported in part by a grant from the Ministry of Education of Malaysia, Research Acculturation Grant Scheme (RAGS) Vot R045, a grant from Universiti Tun Hussein Onn Malaysia Vot U611, and in part by a grant from Research Gates IT Solution Sdn. Bhd.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Adeleke, A.O., Samsudin, N.A., Mustapha, A., Nawi, N.M. (2018). A Group-Based Feature Selection Approach to Improve Classification of Holy Quran Verses. In: Ghazali, R., Deris, M., Nawi, N., Abawajy, J. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2018. Advances in Intelligent Systems and Computing, vol 700. Springer, Cham. https://doi.org/10.1007/978-3-319-72550-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-72550-5_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72549-9
Online ISBN: 978-3-319-72550-5
eBook Packages: EngineeringEngineering (R0)