skip to main content
10.1145/3507548.3507580acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsaiConference Proceedingsconference-collections
research-article

Regression Algorithm Based on Self-Distillation and Ensemble Learning

Authors Info & Claims
Published:09 March 2022Publication History

ABSTRACT

Low-dimensional feature regression is a common problem in various disciplines, such as chemistry, kinetics, and medicine, etc. Most common solutions are based on machine learning, but as deep learning evolves, there is room for performance improvements. A few researchers have proposed deep learning-based solutions such as ResidualNet, GrowNet and EnsembleNet. The latter two methods are both boost methods, which are more suitable for shallow network, and the model performance is basically determined by the first model, with limited effect of subsequent boosting steps. We propose a method based on self-distillation and bagging, which selects the well-performing base model and distills several student models by appropriate regression distillation algorithm. Finally, the output of these student models is averaged as the final result. This integration method can be applied to any form of network. The method achieves good results in the CASP dataset, and the R2(Coefficient of Determination) of the model is improved from (0.65) to (0.70) in comparison with the best base model ResidualNet.

References

  1. Smola, Alex J and Schölkopf, Bernhard. 2004. A tutorial on support vector regression. Statistics and computing, 14, 3 (August 2004), 199-222. https://doi.org/10.1023/b:stco.0000035301.49549.88Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Criminisi, Antonio, Shotton, Jamie and Konukoglu, Ender. 2011. Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Microsoft Research Cambridge, Tech. Rep. MSRTR-2011-114, 5, 6 (2011), 12.Google ScholarGoogle Scholar
  3. Freund, Yoav and Schapire, Robert E. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55, 1 (August 1997), 119-139. https://doi.org/10.1006/jcss.1997.1504Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Friedman, Jerome H. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics, 29, 5 (October 2001), 1189-1232. https://doi.org/10.1214/aos/1013203451Google ScholarGoogle ScholarCross RefCross Ref
  5. Chen, Dongwei, Hu, Fei, Nian, Guokui and Yang, Tiantian. 2020. Deep residual learning for nonlinear regression. Entropy, 22, 2 (February 2020), 193. https://doi.org/10.3390/e22020193Google ScholarGoogle Scholar
  6. Badirli, Sarkhan, Liu, Xuanqing, Xing, Zhengming, Bhowmik, Avradeep, Doan, Khoa and Keerthi, Sathiya S. 2020. Gradient boosting neural networks: Grownet. arXiv preprint arXiv:2002.07971 (2020).Google ScholarGoogle Scholar
  7. Park, Minyoung, Lee, Seungyeon, Hwang, Sangheum and Kim, Dohyun. 2020. Additive Ensemble Neural Networks. IEEE Access, 8 (2020), 113192-113199. https://doi.org/10.1109/access.2020.3003748Google ScholarGoogle ScholarCross RefCross Ref
  8. Martınez-Munoz, Gonzalo and Superior, Escuela Politéctica. 2019. Sequential training of neural networks with gradient boosting. arXiv preprint arXiv:1909.12098 (2019).Google ScholarGoogle Scholar
  9. Furlanello, Tommaso, Lipton, Zachary, Tschannen, Michael, Itti, Laurent and Anandkumar, Anima. Born again neural networks. 2018. In Proceedings of International Conference on Machine Learning, 1607-1616.Google ScholarGoogle Scholar
  10. Yuen, Kevin Kam Fung. 2017. Towards multiple regression analyses for relationships of air quality and weather. Journal of Advances in Information Technology Vol, 8, 2 (May 2017), 135-140. 10.12720/jait.8.2.135-140Google ScholarGoogle Scholar
  11. Lo, Wai Lun, Zhu, Meimei and Fu, Hong. 2020. Meteorology visibility estimation by using multi-support vector regression method. Journal of Advances in Information Technology Vol, 11, 2 (May 2020), 40-47. 10.12720/jait.11.2.40-47Google ScholarGoogle Scholar
  12. Daghistani, Tahani and Alshammari, Riyad. 2020. Comparison of statistical logistic regression and random forest machine learning techniques in predicting diabetes. Journal of Advances in Information Technology Vol, 11, 2 (May 2020), 78-83. 10.12720/jait.11.2.78-83Google ScholarGoogle Scholar
  13. He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing and Sun, Jian. Deep residual learning for image recognition. 2016. In Proceedings of Proceedings of the IEEE conference on computer vision and pattern recognition, 770-778.Google ScholarGoogle Scholar
  14. Sahoo, Doyen, Pham, Quang, Lu, Jing and Hoi, Steven CH. 2017. Online deep learning: Learning deep neural networks on the fly. arXiv preprint arXiv:1711.03705 (July 2017). https://doi.org/10.24963/ijcai.2018/369Google ScholarGoogle Scholar
  15. Zhang, Si-si, Liu, Jian-wei, Zuo, Xin, Lu, Run-kun and Lian, Si-ming. 2021. Online deep learning based on auto-encoder. Applied Intelligence (2021), 1-20.Google ScholarGoogle Scholar
  16. Hansen, Lars Kai and Salamon, Peter. 1990. Neural network ensembles. IEEE transactions on pattern analysis and machine intelligence, 12, 10 (1990), 993-1001.Google ScholarGoogle Scholar
  17. Ganaie, MA and Hu, Minghui. 2021. Ensemble deep learning: A review. arXiv preprint arXiv:2104.02395 (2021).Google ScholarGoogle Scholar
  18. Brownlee, Jason. 2018. Ensemble Learning Methods for Deep Learning Neural Networks. December 19, 2018 from https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/Google ScholarGoogle Scholar
  19. Hinton, Geoffrey, Vinyals, Oriol and Dean, Jeff. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google ScholarGoogle Scholar
  20. Xie, Qizhe, Luong, Minh-Thang, Hovy, Eduard and Le, Quoc V. Self-training with noisy student improves imagenet classification. 2020. In Proceedings of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10687-10698. https://doi.org/10.1109/cvpr42600.2020.01070Google ScholarGoogle Scholar
  21. Cho, Jang Hyun and Hariharan, Bharath. On the efficacy of knowledge distillation. 2019. In Proceedings of Proceedings of the IEEE/CVF International Conference on Computer Vision, 4794-4802. https://doi.org/10.1109/iccv.2019.00489Google ScholarGoogle Scholar
  22. Yang, Chenglin, Xie, Lingxi, Qiao, Siyuan and Yuille, Alan L. Training deep neural networks in generations: A more tolerant teacher educates better students. 2019. In Proceedings of Proceedings of the AAAI Conference on Artificial Intelligence, 5628-5635. https://doi.org/10.1609/aaai.v33i01.33015628Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chen, Guobin, Choi, Wongun, Yu, Xiang, Han, Tony and Chandraker, Manmohan. 2017. Learning efficient object detection models with knowledge distillation. Advances in neural information processing systems, 30 (2017).Google ScholarGoogle Scholar
  24. Saputra, Muhamad Risqi U, De Gusmao, Pedro PB, Almalioglu, Yasin, Markham, Andrew and Trigoni, Niki. Distilling knowledge from a deep pose regressor network. 2019. In Proceedings of Proceedings of the IEEE/CVF International Conference on Computer Vision, 263-272. https://doi.org/10.1109/iccv.2019.00035Google ScholarGoogle Scholar
  25. Rana, Prashant Singh. Physicochemical Properties of Protein Tertiary Structure Data Set. March 31, 2013 from https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+StructureGoogle ScholarGoogle Scholar
  26. harlfoxem. House Sales in King County, USA. 2016 from https://www.kaggle.com/harlfoxem/housesalespredictionGoogle ScholarGoogle Scholar
  27. Arzamasov, Vadim. Electrical Grid Stability Simulated Data Data Set. November 16, 2018 from https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+Google ScholarGoogle Scholar
  28. Kamath, RS and Kamat, RK. 2018. Modelling Physicochemical Properties for Protein Tertiary Structure Prediction: Performance Analysis of Regression Models (December 2018).Google ScholarGoogle Scholar

Index Terms

  1. Regression Algorithm Based on Self-Distillation and Ensemble Learning
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        CSAI '21: Proceedings of the 2021 5th International Conference on Computer Science and Artificial Intelligence
        December 2021
        437 pages
        ISBN:9781450384155
        DOI:10.1145/3507548

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 March 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)61
        • Downloads (Last 6 weeks)3

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format