ABSTRACT
Low-dimensional feature regression is a common problem in various disciplines, such as chemistry, kinetics, and medicine, etc. Most common solutions are based on machine learning, but as deep learning evolves, there is room for performance improvements. A few researchers have proposed deep learning-based solutions such as ResidualNet, GrowNet and EnsembleNet. The latter two methods are both boost methods, which are more suitable for shallow network, and the model performance is basically determined by the first model, with limited effect of subsequent boosting steps. We propose a method based on self-distillation and bagging, which selects the well-performing base model and distills several student models by appropriate regression distillation algorithm. Finally, the output of these student models is averaged as the final result. This integration method can be applied to any form of network. The method achieves good results in the CASP dataset, and the R2(Coefficient of Determination) of the model is improved from (0.65) to (0.70) in comparison with the best base model ResidualNet.
- Smola, Alex J and Schölkopf, Bernhard. 2004. A tutorial on support vector regression. Statistics and computing, 14, 3 (August 2004), 199-222. https://doi.org/10.1023/b:stco.0000035301.49549.88Google ScholarDigital Library
- Criminisi, Antonio, Shotton, Jamie and Konukoglu, Ender. 2011. Decision forests for classification, regression, density estimation, manifold learning and semi-supervised learning. Microsoft Research Cambridge, Tech. Rep. MSRTR-2011-114, 5, 6 (2011), 12.Google Scholar
- Freund, Yoav and Schapire, Robert E. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55, 1 (August 1997), 119-139. https://doi.org/10.1006/jcss.1997.1504Google ScholarDigital Library
- Friedman, Jerome H. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics, 29, 5 (October 2001), 1189-1232. https://doi.org/10.1214/aos/1013203451Google ScholarCross Ref
- Chen, Dongwei, Hu, Fei, Nian, Guokui and Yang, Tiantian. 2020. Deep residual learning for nonlinear regression. Entropy, 22, 2 (February 2020), 193. https://doi.org/10.3390/e22020193Google Scholar
- Badirli, Sarkhan, Liu, Xuanqing, Xing, Zhengming, Bhowmik, Avradeep, Doan, Khoa and Keerthi, Sathiya S. 2020. Gradient boosting neural networks: Grownet. arXiv preprint arXiv:2002.07971 (2020).Google Scholar
- Park, Minyoung, Lee, Seungyeon, Hwang, Sangheum and Kim, Dohyun. 2020. Additive Ensemble Neural Networks. IEEE Access, 8 (2020), 113192-113199. https://doi.org/10.1109/access.2020.3003748Google ScholarCross Ref
- Martınez-Munoz, Gonzalo and Superior, Escuela Politéctica. 2019. Sequential training of neural networks with gradient boosting. arXiv preprint arXiv:1909.12098 (2019).Google Scholar
- Furlanello, Tommaso, Lipton, Zachary, Tschannen, Michael, Itti, Laurent and Anandkumar, Anima. Born again neural networks. 2018. In Proceedings of International Conference on Machine Learning, 1607-1616.Google Scholar
- Yuen, Kevin Kam Fung. 2017. Towards multiple regression analyses for relationships of air quality and weather. Journal of Advances in Information Technology Vol, 8, 2 (May 2017), 135-140. 10.12720/jait.8.2.135-140Google Scholar
- Lo, Wai Lun, Zhu, Meimei and Fu, Hong. 2020. Meteorology visibility estimation by using multi-support vector regression method. Journal of Advances in Information Technology Vol, 11, 2 (May 2020), 40-47. 10.12720/jait.11.2.40-47Google Scholar
- Daghistani, Tahani and Alshammari, Riyad. 2020. Comparison of statistical logistic regression and random forest machine learning techniques in predicting diabetes. Journal of Advances in Information Technology Vol, 11, 2 (May 2020), 78-83. 10.12720/jait.11.2.78-83Google Scholar
- He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing and Sun, Jian. Deep residual learning for image recognition. 2016. In Proceedings of Proceedings of the IEEE conference on computer vision and pattern recognition, 770-778.Google Scholar
- Sahoo, Doyen, Pham, Quang, Lu, Jing and Hoi, Steven CH. 2017. Online deep learning: Learning deep neural networks on the fly. arXiv preprint arXiv:1711.03705 (July 2017). https://doi.org/10.24963/ijcai.2018/369Google Scholar
- Zhang, Si-si, Liu, Jian-wei, Zuo, Xin, Lu, Run-kun and Lian, Si-ming. 2021. Online deep learning based on auto-encoder. Applied Intelligence (2021), 1-20.Google Scholar
- Hansen, Lars Kai and Salamon, Peter. 1990. Neural network ensembles. IEEE transactions on pattern analysis and machine intelligence, 12, 10 (1990), 993-1001.Google Scholar
- Ganaie, MA and Hu, Minghui. 2021. Ensemble deep learning: A review. arXiv preprint arXiv:2104.02395 (2021).Google Scholar
- Brownlee, Jason. 2018. Ensemble Learning Methods for Deep Learning Neural Networks. December 19, 2018 from https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/Google Scholar
- Hinton, Geoffrey, Vinyals, Oriol and Dean, Jeff. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google Scholar
- Xie, Qizhe, Luong, Minh-Thang, Hovy, Eduard and Le, Quoc V. Self-training with noisy student improves imagenet classification. 2020. In Proceedings of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10687-10698. https://doi.org/10.1109/cvpr42600.2020.01070Google Scholar
- Cho, Jang Hyun and Hariharan, Bharath. On the efficacy of knowledge distillation. 2019. In Proceedings of Proceedings of the IEEE/CVF International Conference on Computer Vision, 4794-4802. https://doi.org/10.1109/iccv.2019.00489Google Scholar
- Yang, Chenglin, Xie, Lingxi, Qiao, Siyuan and Yuille, Alan L. Training deep neural networks in generations: A more tolerant teacher educates better students. 2019. In Proceedings of Proceedings of the AAAI Conference on Artificial Intelligence, 5628-5635. https://doi.org/10.1609/aaai.v33i01.33015628Google ScholarDigital Library
- Chen, Guobin, Choi, Wongun, Yu, Xiang, Han, Tony and Chandraker, Manmohan. 2017. Learning efficient object detection models with knowledge distillation. Advances in neural information processing systems, 30 (2017).Google Scholar
- Saputra, Muhamad Risqi U, De Gusmao, Pedro PB, Almalioglu, Yasin, Markham, Andrew and Trigoni, Niki. Distilling knowledge from a deep pose regressor network. 2019. In Proceedings of Proceedings of the IEEE/CVF International Conference on Computer Vision, 263-272. https://doi.org/10.1109/iccv.2019.00035Google Scholar
- Rana, Prashant Singh. Physicochemical Properties of Protein Tertiary Structure Data Set. March 31, 2013 from https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+StructureGoogle Scholar
- harlfoxem. House Sales in King County, USA. 2016 from https://www.kaggle.com/harlfoxem/housesalespredictionGoogle Scholar
- Arzamasov, Vadim. Electrical Grid Stability Simulated Data Data Set. November 16, 2018 from https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+Google Scholar
- Kamath, RS and Kamat, RK. 2018. Modelling Physicochemical Properties for Protein Tertiary Structure Prediction: Performance Analysis of Regression Models (December 2018).Google Scholar
Index Terms
- Regression Algorithm Based on Self-Distillation and Ensemble Learning
Recommendations
“In-Network Ensemble”: Deep Ensemble Learning with Diversified Knowledge Distillation
Ensemble learning is a widely used technique to train deep convolutional neural networks (CNNs) for improved robustness and accuracy. While existing algorithms usually first train multiple diversified networks and then assemble these networks as an ...
Ensemble deep learning: A review
AbstractEnsemble learning combines several individual models to obtain better generalization performance. Currently, deep learning architectures are showing better performance compared to the shallow or traditional models. Deep ensemble ...
Highlights- This paper reviews the state-of-art deep ensemble models and hence serves as an extensive summary for the researchers.
A comprehensive review on ensemble deep learning: Opportunities and challenges
AbstractIn machine learning, two approaches outperform traditional algorithms: ensemble learning and deep learning. The former refers to methods that integrate multiple base models in the same framework to obtain a stronger model that ...
Comments