skip to main content
10.1145/3548608.3559206acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccirConference Proceedingsconference-collections
research-article

Truncated Cross-entropy: A New Loss Function for Multi-category Classification

Authors Info & Claims
Published:14 October 2022Publication History

ABSTRACT

Deep learning is one of the hottest research topics which has received a lot of attention in recent years. During the training of deep learning models, the loss function is a critical indicating objective which measures the difference between the predicted value and the distribution of the real data. It is also an important indicator for evaluating the performance of a deep learning model. The most popular loss functions used in deep learning include mean square error (MSE), cross-entropy error, etc. Obviously, the loss function has non-negligible influence to the optimizer. The most common optimizers include stochastic gradient descent method (SGD), mini-batch stochastic gradient descent method (MBGD) and Adaptive moment estimation (ADAM). Among them, the MBGD is widely used for its equilibrium between accuracy and speed. However, how to set the batch size is a big challenge. If the batch size is too large, the cost of computation and memory increases accordingly. On the other hand, the gradient descent process could be more oscillated with small batch size. Therefore, this paper proposes a improved loss function named truncated cross-entropy to stabilize the convergence procedure of the optimizer. Experiments show that the proposed method could speed up the convergence of training and reduce the oscillation. The proposed method can achieve similar performance to large-batch-size training with relatively small batch size.

References

  1. Qian N. 1999. On the momentum term in gradient descent learning algorithms. J. Neural networks, 12, 1, 145-151. https://doi.org/10.1016/S0893-6080(98)00116-6Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Duchi J, Hazan E, Singer Y. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Journal of machine learning research, 12, 7Google ScholarGoogle Scholar
  3. Zeiler M D. 2012. Adadelta: an adaptive learning rate method. arXiv:1212.5701. Retrieved from https://arxiv.org/abs/1212.5701Google ScholarGoogle Scholar
  4. Kingma D P, Ba J 2015. Adam: a Method for Stochastic Optimization. Interna-tional Conference on Learning Representations, 1-13.Google ScholarGoogle Scholar
  5. De S, Mukherjee A, Ullah E. 2018. Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration. arXiv:1807.06766. Retrieved from https://arxiv.org/abs/1807.06766Google ScholarGoogle Scholar
  6. Szegedy C, Vanhoucke V, Ioffe S, 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2818-2826.Google ScholarGoogle ScholarCross RefCross Ref
  7. Beyer L, Hénaff O J, Kolesnikov A, 2020. Are we done with imagenet? arXiv:2006.07159. Retrieved from https://arxiv.org/abs/2006.07159Google ScholarGoogle Scholar
  8. Lapin M, Hein M, Schiele B. 2017. Analysis and optimization of loss functions for multiclass, top-k, and multilabel classification. J. IEEE transactions on pattern analysis and machine intelligence, 40, 7, 1533-1554. https://doi.org/10.1109/TPAMI.2017.2751607Google ScholarGoogle Scholar
  9. Brian Lucena. 2022. Loss Functions for Classification using Structured Entropy. arXiv:2206.07122. Retrieved from https://arxiv.org/abs/2206.07122Google ScholarGoogle Scholar
  10. Bertinetto L, Mueller R, Tertikas K, 2020. Making better mistakes: Leveraging class hierarchies with deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12506-12515.Google ScholarGoogle ScholarCross RefCross Ref
  11. Feng L, Shu S, Lin Z, 2021. Can cross entropy loss be robust to label noise? In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 2206-2212.Google ScholarGoogle Scholar
  12. Santiago Gonzalez and Risto Miikkulainen. 2020. Optimizing loss functions through multivariate taylor polynomial parameterization. arXiv:2002.00059, 2020b. Retrieved from https://arxiv.org/abs/2002.00059Google ScholarGoogle Scholar
  13. Tong Y, Yu L, Li S, 2021. Polynomial fitting algorithm based on neural network. J. ASP Transactions on Pattern Recognition and Intelligent Systems, 1, 1, 32-39.Google ScholarGoogle ScholarCross RefCross Ref
  14. Gonzalez S, Miikkulainen R. 2020. Evolving loss functions with multivariate taylor polynomial parameterizations. arXiv:2002.00059. Retrieved from https://arxiv.org/abs/2002.00059v2Google ScholarGoogle Scholar
  15. Wang D, Smith A, Xu J. 2019. Noninteractive locally private learning of linear models via polynomial approximations. Algorithmic Learning Theory. PMLR, 898-903.Google ScholarGoogle Scholar
  16. Leng Z, Tan M, Liu C, 2022. PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions. arXiv:2204.12511. Retrieved from https://arxiv.org/abs/2204.12511Google ScholarGoogle Scholar
  1. Truncated Cross-entropy: A New Loss Function for Multi-category Classification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICCIR '22: Proceedings of the 2022 2nd International Conference on Control and Intelligent Robotics
      June 2022
      905 pages
      ISBN:9781450397179
      DOI:10.1145/3548608

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 October 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate131of239submissions,55%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format