Abstract
Text categorization is defined as the process of assigning tags to text according to its content. Some of the text classification approaches are document organization, spam email filtering, and news groupings. This paper introduces stochastic gradient-CAViaR-based deep belief networks for text categorization. The overall procedure of the proposed approach involves four steps, such as pre-processing, feature extraction, feature selection, and text categorization. At first, the pre-processing is carried out from the input data based on stemming, stop-word removal, and then, the feature extraction is performed using a vector space model. Once the extraction is done, the feature selection is carried out based on entropy. Subsequently, the selected features are given to the text categorization step. Here, the text categorization is done using the proposed SG-CAV-based deep belief networks (SG-CAV-based DBN). The proposed SG-CAV is used to train the DBN, which is designed by combining conditional autoregressive value at risk and stochastic gradient descent. The performance of the proposed SGCAV + DBN is evaluated based on the metrics, such as recall, precision, F-measure and accuracy. Also, the performance of the proposed method is compared with the existing methods, such as Naive Bayes, K-nearest neighbours, support vector machine, and deep belief network (DBN). From the analysis, it is depicted that the proposed SGCAV + DBN method achieves the maximal precision of 0.78, the maximal recall of 0.78, maximal F-measure of 0.78, and the maximal accuracy of 0.95. Among the existing methods, DBN achieves the maximum precision, recall, F-measure and accuracy, for 20 Newsgroup database and Reuter database. The performance of the proposed system is 10.98%, 11.54%, 11.538%, and 18.33% higher than the precision, recall, F-measure, and accuracy of the DBN for 20 Newsgroup database, and 2.38%, 2.38%, 2.37%, and 0.21% higher than the precision, recall, F-measure and accuracy of the DBN for Reuter database.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Al-Salemi B, Ayob M, Noah SAM (2018) Feature ranking for enhancing boosting-based multi-label text categorization. Expert Syst Appl 113:531–543
Tellez ES, Moctezuma D, Miranda-Jiménez S, Graff M (2018) An automated text categorization framework based on hyper parameter optimization. Knowl-Based Syst 149:110–123
Saad MK, Ashour W (2010) Arabic text classification using decision trees. In: Proceedings of 12th international workshop on computer science and information technologies CSIT, Moscow-Saint Petersburg, Russia
Mohammad AH, Alwadan T, Al-Momani O (2016) Arabic text categorization using support vector machine. Naïve Bayes Neural Netw 5(1):108–115
Tang B, He H, Baggenstoss PM, Kay S (2016) A Bayesian classification approach using class-specific features for text categorization. IEEE Trans Knowl Data Eng 28(6):1602–1606
Lee J, Yu I, Park J, Kim DW (2019) Memetic feature selection for multilabel text categorization using label frequency difference. Inf Sci 485:263–280
Alwehaibi A, Roy K (2018) Comparison of pre-trained word vectors for arabic text classification using deep learning approach. In: Proceedings of 17th IEEE international conference on machine learning and applications (ICMLA), Orlando, FL, pp 1471–1474
Hu Y, Yi Y, Yang T, Pan Q (2018) Short text classification with a convolutional neural networks based method. In: Proceedings of 15th international conference on control, automation, robotics and vision (ICARCV), Singapore, pp 1432–1435
Xu Z, Li J, Liu B, Bi J, Li R, Mao R (2017) Semi-supervised learning in large scale text categorization. J Shanghai Jiatong Univ 22(3):291–302
Attaccalite C, Cannuccia E, Grüning M (2017) Excitonic effects in third-harmonic generation: the case of carbon nanotubes and nanoribbons. Phys Rev B 95(12):125403
Nguyen HM, Khoa BT (2019) The relationship between the perceived mental benefits, online trust, and personal information disclosure in online shopping. J Asian Finance 6(4):261–270
Tu F, Yin S, Ouyang P, Tang S, Liu L, Wei S (2017) Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Trans Very Large Scale Integr Syst 25(8):2220–2233
Ninu Preetha NS, Praveena S (2018) Multiple feature sets and SVM classifier for the detection of diabetic retinopathy using retinal images. Multimed Res 1(1):17–26
Alzubi J, Nayyar A, Kumar A (2018) Machine learning from theory to algorithms: an overview. J Phys: Conf Ser 1142:012012
Bhopale AP, Kamath SS, Tiwari A (2018) Concise semantic analysis based text categorization using modified hybrid union feature selection approach. In: Proceedings of 4th international conference on recent advances in information technology (RAIT), Dhanbad, pp 1–7
Haryanto AW, Mawardi EK, Muljono (2018) Influence of word normalization and chi squared feature selection on support vector machine (SVM) text classification. In: Proceedings of international seminar on application for technology of information and communication, Semarang, pp 229–233
Zheng T, Wang L (2018) Unlabeled text classification optimization algorithm based on active self-paced learning. In: Proceedings of IEEE international conference on big data and smart computing (BigComp), pp 404–409
Parmar PS, Biju PK, Shankar M, Kadiresan N (2018) Multiclass text classification and analytics for improving customer support response through different classifiers. In: Proceedings of international conference on advances in computing, communications and informatics (ICACCI), Bangalore, pp 538–542
Bigi B (2003) Using Kullback–Leibler distance for text categorization. In: Advances in information retrieval, vol 2633. Springer, Berlin, pp 305–319
Ma T, Motta G, Liu K (2017) Delivering real-time information services on public transit: a framework. IEEE Trans Intell Transp Syst 18(10):2642–2656
Kouretas GP, Zarangas L (2005) Conditional autoregressive value at risk by regression quantiles estimating market risk for major stock markets, no. 0521
Kim S-B, Han K-S, Rim H-C, Myaeng SH (2006) Some effective techniques for naive Bayes text classification. IEEE Trans Knowl Data Eng 18(11):1457–1466
Liu C, Wang W, Tu G, Xiang Y, Wang S, Lv F (2017) A new centroid-based classification model for text categorization. Knowl Based Syst 136:15–26
Tang X, Dai Y, Xiang Y (2019) Feature selection based on feature interactions with application to text categorization. Expert Syst Appl 120:207–216
Zheng T, Zheng T, Wang L (2018) Unlabeled text classification optimization algorithm based on active self-paced learning. In: Proceedings of IEEE international conference on big data and smart computing
Liu B, Xiao Y, Hao Z (2018) A selective multiple instance transfer learning method for text categorization problems. Knowl-Based Syst 141:178–187
Kim K, Zhang SY (2018) Trigonometric comparison measure: a feature selection method for text categorization. Data Knowl Eng 119:1–12
Feng G, Li S, Sun T, Zhang B (2018) A probabilistic model derived term weighting scheme for text classification. Pattern Recogn Lett 110:23–29
Yang J, Yang G (2018) Modified convolutional neural network based on dropout and the stochastic gradient descent optimizer. Algorithms 11(3):28
Dai W, Xue G-R, Yang Q, Yu Y (2007) Transferring Naive Bayes classifiers for text classification. In: AAAI, vol 7, pp 540–545
Camastra F, Razi G (2019) Italian text categorization with lemmatization and support vector machines. In: Neural approaches to dynamics of signal exchanges, vol 151, pp 47–54
Jo T (2019) Improving K nearest neighbor into string vector version for text categorization. In: 21st international conference on advanced communication technology (ICACT), PyeongChang Kwangwoon_Do, Korea (South)
Berge GT, Granmo O-C, Tveit TO, Goodwin M, Jiao L, Matheussen BV (2019) Using the Tsetlin machine to learn human-interpretable rules for high-accuracy text categorization with medical applications. In: IEEE Access, vol 7, pp 115134–115146
Engle RF, Manganelli S (2004) CAViaR: conditional autoregressive value at risk by regression quantiles. J Bus Econ Stat 22(4):367–381
Ranjan NM, Prasad RS (2018) LFNN: lion fuzzy neural network-based evolutionary model for text classification using context and sense based features. Appl Soft Comput J 71:994–1008
Huang D, Yu B, Fabozzi FJ, Fukushima M (2009) CAViaR-based forecast for oil price risk. Energy Econ 31:511–518
Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554
Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent. In: Advances in neural information processing systems 23 (NIPS 2010)
Newsgroup database. http://qwone.com/~jason/20Newsgroups/. Accessed October 2018
Reuter database. https://archive.ics.uci.edu/ml/machine-learningdatabases/reuters21578-mld/. Accessed October 2018
Wajeed MA, Adilakshmi T (2011) Using KNN algorithm for text categorization. In: Proceedings of international conference on computational intelligence and information technology, pp 796–801
Parmar PS, Biju PK, Shankar M, Kadiresan N (2018) Multiclass text classification and analytics for improving customer support response through different classifiers. In: Proceedings of international conference on advance in computing, communications, and informatics (ICACCI)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Srilakshmi, V., Anuradha, K. & Shoba Bindu, C. Stochastic gradient-CAViaR-based deep belief network for text categorization. Evol. Intel. 14, 1727–1741 (2021). https://doi.org/10.1007/s12065-020-00449-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-020-00449-x