Combination of loss functions for deep text classification

Hajiabadi, Hamideh; Molla-Aliod, Diego; Monsefi, Reza; Yazdi, Hadi Sadoghi

doi:10.1007/s13042-019-00982-x

Combination of loss functions for deep text classification

Original Article
Published: 19 July 2019

Volume 11, pages 751–761, (2020)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Hamideh Hajiabadi¹,
Diego Molla-Aliod²,
Reza Monsefi¹ &
…
Hadi Sadoghi Yazdi¹

867 Accesses
13 Citations
Explore all metrics

Abstract

Ensemble methods have shown to improve the results of statistical classifiers by combining multiple single learners into a strong one. In this paper, we explore the use of ensemble methods at the level of the objective function of a deep neural network. We propose a novel objective function that is a linear combination of single losses and integrate the proposed objective function into a deep neural network. By doing so, the weights associated with the linear combination of losses are learned by back propagation during the training stage. We study the impact of such an ensemble loss function on the state-of-the-art convolutional neural networks for text classification. We show the effectiveness of our approach through comprehensive experiments on text classification. The experimental results demonstrate a significant improvement compared with the conventional state-of-the-art methods in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning for time series classification: a review

Article 02 March 2019

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

Visualizing and Understanding Convolutional Networks

References

Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101(473):138–156
Article MathSciNet Google Scholar
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(Feb):1137–1155
MATH Google Scholar
Biau G, Devroye L, Lugosi G (2008) Consistency of random forests and other averaging classifiers. J Mach Learn Res 9(Sep):2015–2033
MathSciNet MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Chen L, Qu H, Zhao J (2017) Generalized correntropy based deep learning in presence of non-gaussian noises. Neurocomputing 278:41–50
Article Google Scholar
Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning. ACM, New York, pp 160–167
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537
MATH Google Scholar
Condorcet MJANC (1955) Sketch for a historical picture of the progress of the human mind
Dasarathy BV, Sheela BV (1979) A composite classifier system design: concepts and methodology. Proc IEEE 67(5):708–713
Article Google Scholar
De Boer P-T, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on the cross-entropy method. Ann Oper Res 134(1):19–67
Article MathSciNet Google Scholar
Dragoni M, Petrucci G (2018) A fuzzy-based strategy for multi-domain sentiment analysis. Int J Approx Reason 93:59–73
Article MathSciNet Google Scholar
Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: ICML'96 Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, 03–06 July 1996. Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 148–156
Glowinski R, Le Tallec P (1989) Augmented Lagrangian and operator-splitting methods in nonlinear mechanics, vol 9. SIAM, Philadelphia
Book Google Scholar
Hajiabadi H, Molla-Aliod D, Monsefi R (2017) On extending neural networks with loss ensembles for text classification. arXiv:1711.05170 (preprint)
Hajiabadi H, Monsefi R, Yazdi HS (2018) relf: robust regression extended with ensemble loss function. Appl Intell 49(4):1437–1450
Article Google Scholar
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001
Article Google Scholar
He R, Zheng W-S, Bao-Gang H (2011) Maximum correntropy criterion for robust face recognition. IEEE Trans Pattern Anal Mach Intell 33(8):1561–1576
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 22 August 2004. ACM, pp 168–177
Kim HC, Pang S, Je HM, Kim D, Bang SY (2002) Support vector machine ensemble with bagging. Pattern recognition with support vector machines. Springer, New York, pp 397–408
Chapter Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882 (preprint)
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Li X, Roth D (2002) Learning question classifiers. In: Proceedings of the 19th international conference on Computational linguistics, vol 1, 24 August 2002. Association for Computational Linguistics, pp 1–7
Liu W, Pokharel PP, Principe JC (2006) Correntropy: a localized similarity measure. In: The IEEE international joint conference on neural network proceedings, 16 July 2006. IEEE, pp 4919–4924
Mandelbaum A, Shalev A (2016) Word embeddings and their use in sentence classification tasks. arXiv:1610.08229 (preprint)
Mannor S, Meir R (2001) Weak learners and improved rates of convergence in boosting. In: Advances in neural information processing systems, pp 280–286
Masnadi-Shirazi H, Vasconcelos N (2009) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Advances in neural information processing systems, pp 1049–1056
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Moore R, DeNero J (2011) L1 and L2 regularization for multiclass hinge loss models. In: Symposium on machine learning in speech and language processing
Nocedal J, Wright SJ (2006) Penalty and augmented Lagrangian methods. In: Numerical Optimization, pp 497–528
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, 25 June 2005. Association for Computational Linguistics, pp 115–124
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642
Sundermeyer M, Schlüter R, Ney H (2012) Lstm neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112
Yu CH (1977) Exploratory data analysis. Methods 2:131–160
Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Wang P, Xu J, Xu B, Liu C, Zhang H, Wang F, Hao H (2015) Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd annual meeting of the association for computational Linguistics and the 7th international joint conference on natural language processing (vol 2: short papers), pp 352–357
Wang W (2008) Some fundamental issues in ensemble methods. In: IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1 June 2008. IEEE, pp 2243–2250
Weingessel A, Dimitriadou E, Hornik K (2003) An ensemble method for clustering. In: Proceedings of the 3rd international workshop on distributed statistical computing
Yan K, Li Z, Zhang C (2016) A new multi-instance multi-label learning approach for image and text classification. Multimed Tools Appl 75(13):7875–7890
Article Google Scholar
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. European conference on computer vision. Springer, New York, pp 818–833
Google Scholar
Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv:1510.03820 (preprint)
Zhao L, Mammadov M, Yearwood J (2010) From convex to nonconvex: a loss function analysis for binary classification. In: IEEE International Conference on Data Mining Workshops, 13 December 2010. IEEE, pp 1281–1288

Download references

Author information

Authors and Affiliations

Computer Department, Ferdowsi University of Mashhad (FUM), Mashhad, Iran
Hamideh Hajiabadi, Reza Monsefi & Hadi Sadoghi Yazdi
Macquarie University, Sydney, NSW, 2109, Australia
Diego Molla-Aliod

Authors

Hamideh Hajiabadi
View author publications
You can also search for this author in PubMed Google Scholar
Diego Molla-Aliod
View author publications
You can also search for this author in PubMed Google Scholar
Reza Monsefi
View author publications
You can also search for this author in PubMed Google Scholar
Hadi Sadoghi Yazdi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reza Monsefi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hajiabadi, H., Molla-Aliod, D., Monsefi, R. et al. Combination of loss functions for deep text classification. Int. J. Mach. Learn. & Cyber. 11, 751–761 (2020). https://doi.org/10.1007/s13042-019-00982-x

Download citation

Received: 05 August 2018
Accepted: 10 July 2019
Published: 19 July 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s13042-019-00982-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combination of loss functions for deep text classification

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

A comparative analysis of gradient boosting algorithms

Visualizing and Understanding Convolutional Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combination of loss functions for deep text classification

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

A comparative analysis of gradient boosting algorithms

Visualizing and Understanding Convolutional Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation