Abstract
This paper aims to develop an automatic text categorization system that classifies Bangla medical and non-medical text documents based on two primary features, that is, word length and the presence of English equivalent words in the text documents. To start with, it has been shown that based on the word length and the number of English equivalent words present in a particular text, Bangla medical text documents can be identified among other text documents of any domain. SGD (Stochastic Gradient Descent) classification algorithm is used and an accuracy of 97.75% has been achieved. Comparisons have also been done with other commonly used classifiers to test the system from which it has been observed that SGD performs better than those classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
DeySarkar, S., Goswami, S., Agarwal, A., Akhtar, J.: A novel feature selection technique for text classification using Naive Bayes. Int. Sch. Res. Not. 2014, 10 (2014)
Guru, D.S., Suhil, M.: A novel term_class relevance measure for text categorization. In: Proceedings of International Conference on Advanced Computing Technologies and Applications, pp. 13–22 (2015)
Jin, P., Zhang, Y., Chen, X., Xia, Y.: Bag-of-embeddings for text classification. In: Proceedings of International Joint Conference on Artificial Intelligence, pp. 2824–2830 (2016)
Wang, D., Zhang, H., Liu, R., Lv, W.: Feature selection based on term frequency and T-test for text categorization. In: Proceedings of ACM International Conference on Information and Knowledge Management, pp. 1482–1486 (2012)
Gupta, N., Gupta, V.: Punjabi text classification using naive bayes, centroid and hybrid approach. In: Proceedings of Workshop on South and South East Asian Natural Language Processing, pp. 109–122 (2012)
Mansur, M., UzZaman, N., Khan, M.: analysis of n-gram based text categorization for bangla in a newspaper corpus. In: Proceedings of International Conference on Computer and Information Technology, pp. 08 (2006)
Mandal, A.K., Sen, R.: Supervised learning methods for Bangla web document categorization. Int. J. Artif. Intell. Appl. 05, 93–105 (2014)
Kabir, F., Siddique, S., Kotwal, M.R.A., Huda, M.N.: Bangla text document categorization using stochastic gradient descent (SGD) classifier. In: Proceedings of International Conference on Cognitive Computing and Information Processing, pp. 1–4 (2015)
Islam, Md.S., Jubayer, F.E.Md., Ahmed, S.I.: A comparative study on different types of approaches to bengali document categorization. In: Proceedings of International Conference on Engineering Research, Innovation and Education, p. 06 (2017)
Islam, Md.S., Jubayer, F.E.Md., Ahmed, S.I.: A support vector machine mixed with TF-IDF algorithm to categorize bengali document. In: Proceedings of International Conference on Electrical, Computer and Communication Engineering, pp. 191–196 (2017)
Dhar, A., Dash, N.S., Roy, K.: Classification of text documents through distance measurement: an experiment with multi-domain Bangla text documents. In: Proceedings of International Conference on Advances in Computing, Communication and Automation, pp. 1–6 (2017)
Dhar, A., Dash, N.S., Roy, K.: Application of TF-IDF feature for categorizing documents of online Bangla web text corpus. In: Proceedings of International Conference on Frontiers of Intelligent Computing: Theory and Applications, pp. 51–59 (2017)
ArunaDevi, K., Saveetha, R.: A novel approach on tamil text classification using C-Feature. Int. J. Sci. Res. Dev. 02, 343–345 (2014)
Swamy, M.N., Thappa, M.H.: Indian Language text representation and categorization using supervised learning algorithm. Int. J. Data Min. Tech. Appl. 02, 251–257 (2013)
Patil, J.J., Bogiri, N.: Automatic text categorization Marathi documents. Int. J. Adv. Res. Comput. Sci. Manag. Stud. 03, 280–287 (2015)
Bolaj, P., Govilkar, S.: Text classification for Marathi documents using supervised learning methods. Int. J. Comput. Appl. 155, 6–10 (2016)
Al-Radaideh, Q.A., Al-Khateeb, S.S.: An associative rule-based classifier for Arabic medical text. Int. J. Knowl. Eng. Data Min. 03, 255–273 (2015)
Haralambous, Y., Elidrissi, Y., Lenca, P.: Arabic language text classification using dependency syntax-based feature selection. In: Proceedings of International Conference on Arabic language Processing, p. 10 (2014)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)
Acknowledgement
One of the authors would like to thank Department of Science and Technology (DST) for support in the form of INSPIRE fellowship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dhar, A., Dash, N.S., Roy, K. (2019). Categorization of Bangla Medical Text Documents Based on Hybrid Internal Feature. In: Mandal, J., Mukhopadhyay, S., Dutta, P., Dasgupta, K. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2018. Communications in Computer and Information Science, vol 1031. Springer, Singapore. https://doi.org/10.1007/978-981-13-8581-0_15
Download citation
DOI: https://doi.org/10.1007/978-981-13-8581-0_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8580-3
Online ISBN: 978-981-13-8581-0
eBook Packages: Computer ScienceComputer Science (R0)