Skip to main content

Advertisement

Log in

Stochastic gradient-CAViaR-based deep belief network for text categorization

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Text categorization is defined as the process of assigning tags to text according to its content. Some of the text classification approaches are document organization, spam email filtering, and news groupings. This paper introduces stochastic gradient-CAViaR-based deep belief networks for text categorization. The overall procedure of the proposed approach involves four steps, such as pre-processing, feature extraction, feature selection, and text categorization. At first, the pre-processing is carried out from the input data based on stemming, stop-word removal, and then, the feature extraction is performed using a vector space model. Once the extraction is done, the feature selection is carried out based on entropy. Subsequently, the selected features are given to the text categorization step. Here, the text categorization is done using the proposed SG-CAV-based deep belief networks (SG-CAV-based DBN). The proposed SG-CAV is used to train the DBN, which is designed by combining conditional autoregressive value at risk and stochastic gradient descent. The performance of the proposed SGCAV + DBN is evaluated based on the metrics, such as recall, precision, F-measure and accuracy. Also, the performance of the proposed method is compared with the existing methods, such as Naive Bayes, K-nearest neighbours, support vector machine, and deep belief network (DBN). From the analysis, it is depicted that the proposed SGCAV + DBN method achieves the maximal precision of 0.78, the maximal recall of 0.78, maximal F-measure of 0.78, and the maximal accuracy of 0.95. Among the existing methods, DBN achieves the maximum precision, recall, F-measure and accuracy, for 20 Newsgroup database and Reuter database. The performance of the proposed system is 10.98%, 11.54%, 11.538%, and 18.33% higher than the precision, recall, F-measure, and accuracy of the DBN for 20 Newsgroup database, and 2.38%, 2.38%, 2.37%, and 0.21% higher than the precision, recall, F-measure and accuracy of the DBN for Reuter database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Al-Salemi B, Ayob M, Noah SAM (2018) Feature ranking for enhancing boosting-based multi-label text categorization. Expert Syst Appl 113:531–543

    Article  Google Scholar 

  2. Tellez ES, Moctezuma D, Miranda-Jiménez S, Graff M (2018) An automated text categorization framework based on hyper parameter optimization. Knowl-Based Syst 149:110–123

    Article  Google Scholar 

  3. Saad MK, Ashour W (2010) Arabic text classification using decision trees. In: Proceedings of 12th international workshop on computer science and information technologies CSIT, Moscow-Saint Petersburg, Russia

  4. Mohammad AH, Alwadan T, Al-Momani O (2016) Arabic text categorization using support vector machine. Naïve Bayes Neural Netw 5(1):108–115

    Google Scholar 

  5. Tang B, He H, Baggenstoss PM, Kay S (2016) A Bayesian classification approach using class-specific features for text categorization. IEEE Trans Knowl Data Eng 28(6):1602–1606

    Article  Google Scholar 

  6. Lee J, Yu I, Park J, Kim DW (2019) Memetic feature selection for multilabel text categorization using label frequency difference. Inf Sci 485:263–280

    Article  Google Scholar 

  7. Alwehaibi A, Roy K (2018) Comparison of pre-trained word vectors for arabic text classification using deep learning approach. In: Proceedings of 17th IEEE international conference on machine learning and applications (ICMLA), Orlando, FL, pp 1471–1474

  8. Hu Y, Yi Y, Yang T, Pan Q (2018) Short text classification with a convolutional neural networks based method. In: Proceedings of 15th international conference on control, automation, robotics and vision (ICARCV), Singapore, pp 1432–1435

  9. Xu Z, Li J, Liu B, Bi J, Li R, Mao R (2017) Semi-supervised learning in large scale text categorization. J Shanghai Jiatong Univ 22(3):291–302

    Article  Google Scholar 

  10. Attaccalite C, Cannuccia E, Grüning M (2017) Excitonic effects in third-harmonic generation: the case of carbon nanotubes and nanoribbons. Phys Rev B 95(12):125403

    Article  Google Scholar 

  11. Nguyen HM, Khoa BT (2019) The relationship between the perceived mental benefits, online trust, and personal information disclosure in online shopping. J Asian Finance 6(4):261–270

    Article  Google Scholar 

  12. Tu F, Yin S, Ouyang P, Tang S, Liu L, Wei S (2017) Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Trans Very Large Scale Integr Syst 25(8):2220–2233

    Article  Google Scholar 

  13. Ninu Preetha NS, Praveena S (2018) Multiple feature sets and SVM classifier for the detection of diabetic retinopathy using retinal images. Multimed Res 1(1):17–26

    Google Scholar 

  14. Alzubi J, Nayyar A, Kumar A (2018) Machine learning from theory to algorithms: an overview. J Phys: Conf Ser 1142:012012

    Google Scholar 

  15. Bhopale AP, Kamath SS, Tiwari A (2018) Concise semantic analysis based text categorization using modified hybrid union feature selection approach. In: Proceedings of 4th international conference on recent advances in information technology (RAIT), Dhanbad, pp 1–7

  16. Haryanto AW, Mawardi EK, Muljono (2018) Influence of word normalization and chi squared feature selection on support vector machine (SVM) text classification. In: Proceedings of international seminar on application for technology of information and communication, Semarang, pp 229–233

  17. Zheng T, Wang L (2018) Unlabeled text classification optimization algorithm based on active self-paced learning. In: Proceedings of IEEE international conference on big data and smart computing (BigComp), pp 404–409

  18. Parmar PS, Biju PK, Shankar M, Kadiresan N (2018) Multiclass text classification and analytics for improving customer support response through different classifiers. In: Proceedings of international conference on advances in computing, communications and informatics (ICACCI), Bangalore, pp 538–542

  19. Bigi B (2003) Using Kullback–Leibler distance for text categorization. In: Advances in information retrieval, vol 2633. Springer, Berlin, pp 305–319

  20. Ma T, Motta G, Liu K (2017) Delivering real-time information services on public transit: a framework. IEEE Trans Intell Transp Syst 18(10):2642–2656

    Article  Google Scholar 

  21. Kouretas GP, Zarangas L (2005) Conditional autoregressive value at risk by regression quantiles estimating market risk for major stock markets, no. 0521

  22. Kim S-B, Han K-S, Rim H-C, Myaeng SH (2006) Some effective techniques for naive Bayes text classification. IEEE Trans Knowl Data Eng 18(11):1457–1466

    Article  Google Scholar 

  23. Liu C, Wang W, Tu G, Xiang Y, Wang S, Lv F (2017) A new centroid-based classification model for text categorization. Knowl Based Syst 136:15–26

    Article  Google Scholar 

  24. Tang X, Dai Y, Xiang Y (2019) Feature selection based on feature interactions with application to text categorization. Expert Syst Appl 120:207–216

    Article  Google Scholar 

  25. Zheng T, Zheng T, Wang L (2018) Unlabeled text classification optimization algorithm based on active self-paced learning. In: Proceedings of IEEE international conference on big data and smart computing

  26. Liu B, Xiao Y, Hao Z (2018) A selective multiple instance transfer learning method for text categorization problems. Knowl-Based Syst 141:178–187

    Article  Google Scholar 

  27. Kim K, Zhang SY (2018) Trigonometric comparison measure: a feature selection method for text categorization. Data Knowl Eng 119:1–12

    Article  Google Scholar 

  28. Feng G, Li S, Sun T, Zhang B (2018) A probabilistic model derived term weighting scheme for text classification. Pattern Recogn Lett 110:23–29

    Article  Google Scholar 

  29. Yang J, Yang G (2018) Modified convolutional neural network based on dropout and the stochastic gradient descent optimizer. Algorithms 11(3):28

    Article  MathSciNet  Google Scholar 

  30. Dai W, Xue G-R, Yang Q, Yu Y (2007) Transferring Naive Bayes classifiers for text classification. In: AAAI, vol 7, pp 540–545

  31. Camastra F, Razi G (2019) Italian text categorization with lemmatization and support vector machines. In: Neural approaches to dynamics of signal exchanges, vol 151, pp 47–54

  32. Jo T (2019) Improving K nearest neighbor into string vector version for text categorization. In: 21st international conference on advanced communication technology (ICACT), PyeongChang Kwangwoon_Do, Korea (South)

  33. Berge GT, Granmo O-C, Tveit TO, Goodwin M, Jiao L, Matheussen BV (2019) Using the Tsetlin machine to learn human-interpretable rules for high-accuracy text categorization with medical applications. In: IEEE Access, vol 7, pp 115134–115146

  34. Engle RF, Manganelli S (2004) CAViaR: conditional autoregressive value at risk by regression quantiles. J Bus Econ Stat 22(4):367–381

    Article  MathSciNet  Google Scholar 

  35. Ranjan NM, Prasad RS (2018) LFNN: lion fuzzy neural network-based evolutionary model for text classification using context and sense based features. Appl Soft Comput J 71:994–1008

    Article  Google Scholar 

  36. Huang D, Yu B, Fabozzi FJ, Fukushima M (2009) CAViaR-based forecast for oil price risk. Energy Econ 31:511–518

    Article  Google Scholar 

  37. Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554

    Article  MathSciNet  Google Scholar 

  38. Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent. In: Advances in neural information processing systems 23 (NIPS 2010)

  39. Newsgroup database. http://qwone.com/~jason/20Newsgroups/. Accessed October 2018

  40. Reuter database. https://archive.ics.uci.edu/ml/machine-learningdatabases/reuters21578-mld/. Accessed October 2018

  41. Wajeed MA, Adilakshmi T (2011) Using KNN algorithm for text categorization. In: Proceedings of international conference on computational intelligence and information technology, pp 796–801

  42. Parmar PS, Biju PK, Shankar M, Kadiresan N (2018) Multiclass text classification and analytics for improving customer support response through different classifiers. In: Proceedings of international conference on advance in computing, communications, and informatics (ICACCI)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Srilakshmi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Srilakshmi, V., Anuradha, K. & Shoba Bindu, C. Stochastic gradient-CAViaR-based deep belief network for text categorization. Evol. Intel. 14, 1727–1741 (2021). https://doi.org/10.1007/s12065-020-00449-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-020-00449-x

Keywords

Navigation