Abstract
Deep neural network (DNN) models typically have many hyperparameters that can be configured to achieve optimal performance on a particular dataset. Practitioners usually tune the hyperparameters of their DNN models by training a number of trial models with different configurations of the hyperparameters, to find the optimal hyperparameter configuration that maximizes the training accuracy or minimizes the training loss. As such hyperparameter tuning usually focuses on the model accuracy or the loss function, it is not clear and remains under-explored how the process impacts other performance properties of DNN models, such as inference latency and model size. On the other hand, standard DNN models are often large in size and computing-intensive, prohibiting them from being directly deployed in resource-bounded environments such as mobile devices and Internet of Things (IoT) devices. To tackle this problem, various model optimization techniques (e.g., pruning or quantization) are proposed to make DNN models smaller and less computing-intensive so that they are better suited for resource-bounded environments. However, it is neither clear how the model optimization techniques impact other performance properties of DNN models such as inference latency and battery consumption, nor how the model optimization techniques impact the effect of hyperparameter tuning (i.e., the compounding effect). Therefore, in this paper, we perform a comprehensive study on four representative and widely-adopted DNN models, i.e., CNN image classification, Resnet-50, CNN text classification, and LSTM sentiment classification, to investigate how different DNN model hyperparameters affect the standard DNN models, as well as how the hyperparameter tuning combined with model optimization affect the optimized DNN models, in terms of various performance properties (e.g., inference latency or battery consumption). Our empirical results indicate that tuning specific hyperparameters has heterogeneous impact on the performance of DNN models across different models and different performance properties. In particular, although the top tuned DNN models usually have very similar accuracy, they may have significantly different performance in terms of other aspects (e.g., inference latency). We also observe that model optimization has a confounding effect on the impact of hyperparameters on DNN model performance. For example, two sets of hyperparameters may result in standard models with similar performance but their performance may become significantly different after they are optimized and deployed on the mobile device. Our findings highlight that practitioners can benefit from paying attention to a variety of performance properties and the confounding effect of model optimization when tuning and optimizing their DNN models.
- [1] 2019. Hyperparameters in Deep Learning. https://towardsdatascience.com/hyperparameters-in-deep-learning-927f7b2084dd.
Last accessed 10/10/2020. Google Scholar - [2] 2020. Keras Code Examples. https://github.com/keras-team/keras-io/tree/master/examples.
Last accessed 10/16/2020. Google Scholar - [3] 2020. Pruning in Keras Example. https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras.Google Scholar
- [4] 2020. Tensorflow Model Optimization. https://www.tensorflow.org/model_optimization.Google Scholar
- [5] 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/.
Software available from tensorflow.org. Google Scholar - [6] . 2017. Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems (JETC) 13, 3 (2017), 1–18.Google ScholarDigital Library
- [7] . 2018. N2N learning: Network to network compression via policy gradient reinforcement learning. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Conference Track Proceedings. OpenReview.net.Google Scholar
- [8] . 2018. Building efficient ConvNets using redundant feature pruning. arXiv preprint arXiv:1802.07653 (2018).Google Scholar
- [9] . 2018. A deep recurrent neural network with BiLSTM model for sentiment classification. In 2018 International Conference on Bangla Speech and Language Processing (ICBSLP). 1–4.Google ScholarCross Ref
- [10] . 2018. Accelerating neural architecture search using performance prediction. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Workshop Track Proceedings. OpenReview.net.Google Scholar
- [11] . 2012. Random search for hyper-parameter optimization. The Journal of Machine Learning Research 13, 1 (2012), 281–305.Google ScholarDigital Library
- [12] . 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013(
JMLR Workshop and Conference Proceedings , Vol. 28). JMLR.org, 115–123.Google Scholar - [13] . 2011. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems. 2546–2554.Google Scholar
- [14] . 2018. Benchmark analysis of representative deep neural network architectures. IEEE Access 6 (2018), 64270–64277.Google ScholarCross Ref
- [15] . 2019. Google AutoML: Cloud vision. In Building Machine Learning and Deep Learning Models on Google Cloud Platform. Springer, 581–598.Google ScholarCross Ref
- [16] . 1936. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze 8 (1936), 3–62.Google Scholar
- [17] . 2018. ProxylessNAS: Direct neural architecture search on target task and hardware. CoRR abs/1812.00332 (2018).Google Scholar
- [18] . 2018. Deeplearning model used in text classification. In 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP). 123–126.Google ScholarCross Ref
- [19] . 2016. An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678 (2016).Google Scholar
- [20] . 2016. CacheOptimizer: Helping developers configure caching frameworks for hibernate-based database-centric web applications. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 666–677.Google ScholarDigital Library
- [21] . 2017. A survey of model compression and acceleration for deep neural networks. CoRR abs/1710.09282 (2017).Google Scholar
- [22] . 2020. Universal deep neural network compression. IEEE Journal of Selected Topics in Signal Processing (2020).Google ScholarCross Ref
- [23] . 2015. Keras. https://keras.io.Google Scholar
- [24] . 2019. Low-bit quantization of neural networks for efficient inference. In 2019 IEEE/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019, Seoul, Korea (South), October 27–28, 2019. IEEE, 3009–3018.Google ScholarCross Ref
- [25] . 2014. Ordinal Methods for Behavioral Data Analysis. Psychology Press.Google ScholarCross Ref
- [26] . 2013. Classification of the coefficients of variation for sugarcane crops. Ciência Rural 43 (2013), 957–961.Google ScholarCross Ref
- [27] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers), , , and (Eds.). Association for Computational Linguistics, 4171–4186.Google Scholar
- [28] . 2011. Hypervolume-based expected improvement: Monotonicity properties and exact computation. In Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2011, New Orleans, LA, USA, 5–8 June, 2011. IEEE, 2147–2154.Google ScholarCross Ref
- [29] . 2019. Automated deep learning design for medical image classification by health-care professionals with no coding experience: A feasibility study. The Lancet Digital Health 1, 5 (2019), e232–e242.Google ScholarCross Ref
- [30] . 2018. BOHB: Robust and efficient hyperparameter optimization at scale. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018(
Proceedings of Machine Learning Research , Vol. 80), and (Eds.). PMLR, 1436–1445.Google Scholar - [31] . 2016. Analysing differences between algorithm configurations through ablation. J. Heuristics 22, 4 (2016), 431–458.Google ScholarDigital Library
- [32] . 2020. Auto-Sklearn 2.0. arXiv:2007.04074 [cs.LG] (2020).Google Scholar
- [33] . 2020. Estimating GPU Memory Consumption of Deep Learning Models.
Technical Report MSR-TR-2020-20. Microsoft.Google ScholarDigital Library - [34] . 2013. Variable selection and sensitivity analysis using dynamic trees, with an application to computer code performance tuning. The Annals of Applied Statistics (2013), 51–80.Google Scholar
- [35] . 2017. Pruning ConvNets online for efficient specialist models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 113–120.Google ScholarCross Ref
- [36] . 2019. Which Deep Learning Framework is Growing Fastest? https://towardsdatascience.com/which-deep-learning-framework-is-growing-fastest-3f77f14aa318.Google Scholar
- [37] . 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google Scholar
- [38] . 2015. Learning both weights and connections for efficient neural networks. CoRR abs/1506.02626 (2015).Google Scholar
- [39] . 2017. Implementation of image classification CNN using multi thread GPU. In 2017 International SoC Design Conference (ISOCC). 296–297.Google ScholarCross Ref
- [40] . 2018. Soft filter pruning for accelerating deep convolutional neural networks. arXiv preprint arXiv:1808.06866 (2018).Google Scholar
- [41] . 2018. MONAS: Multi-objective neural architecture search using reinforcement learning. CoRR abs/1806.10332 (2018).Google Scholar
- [42] . 2013. Identifying key algorithm parameters and instance features using forward selection. In Learning and Intelligent Optimization - 7th International Conference, LION 7, Catania, Italy, January 7–11, 2013, Revised Selected Papers(
Lecture Notes in Computer Science , Vol. 7997), and (Eds.). Springer, 364–381.Google ScholarDigital Library - [43] . 2014. An efficient approach for assessing hyperparameter importance. In Proceedings of the 31st International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014(
JMLR Workshop and Conference Proceedings , Vol. 32). JMLR.org, 754–762.Google Scholar - [44] . 2020. Efficient network architecture search via multiobjective particle swarm optimization based on decomposition. Neural Networks 123 (2020), 305–316.Google ScholarDigital Library
- [45] . 2019. Auto-Keras: An efficient neural architecture search system. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4–8, 2019, , , , , , and (Eds.). ACM, 1946–1956.Google ScholarDigital Library
- [46] . 2020. Hands on Hyperparameter Tuning with Keras Tuner. https://www.sicara.ai/blog/hyperparameter-tuning-keras-tuner.Google Scholar
- [47] . 2015. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530 (2015).Google Scholar
- [48] . 2006. ParEGO: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Trans. Evol. Comput. 10, 1 (2006), 50–66.Google ScholarDigital Library
- [49] . 2017. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. J. Mach. Learn. Res. 18 (2017), 25:1–25:5.Google Scholar
- [50] . 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- [51] . 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.Google ScholarDigital Library
- [52] . 2010. MNIST handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2 (2010).Google Scholar
- [53] . 2016. Evaluating the energy efficiency of deep convolutional neural networks on CPUs and GPUs. In 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom). IEEE, 477–484.Google ScholarCross Ref
- [54] . 2016. Pruning filters for efficient ConvNets. arXiv preprint arXiv:1608.08710 (2016).Google Scholar
- [55] . 2020. Using black-box performance models to detect performance regressions under varying workloads: An empirical study. Empirical Software Engineering 25, 5 (2020), 4130–4160.Google ScholarDigital Library
- [56] . 2019. Best practices for scientific research on neural architecture search. CoRR abs/1909.02453 (2019).Google Scholar
- [57] . 2018. Structure learning for deep neural networks based on multiobjective optimization. IEEE Trans. Neural Networks Learn. Syst. 29, 6 (2018), 2450–2463.Google ScholarCross Ref
- [58] . 2020. DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems. Microprocess. Microsystems 73 (2020), 102989.Google ScholarDigital Library
- [59] . 2019. Research on text classification based on CNN and LSTM. In 2019 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). 352–355.Google ScholarCross Ref
- [60] . 2019. TrafficPredict: Trajectory prediction for heterogeneous traffic-agents. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 6120–6127.Google ScholarDigital Library
- [61] . 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 142–150.Google ScholarDigital Library
- [62] . 2021. NAS-Bench-ASR: Reproducible neural architecture search for speech recognition. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net.Google Scholar
- [63] . 2013. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).Google Scholar
- [64] . 2016. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440 (2016).Google Scholar
- [65] . 2008. The Mann-Whitney U: A test for assessing whether two independent samples come from the same distribution. Tutorials in Quantitative Methods for Psychology 4, 1 (2008), 13–20.Google ScholarCross Ref
- [66] . 2019. Keras Tuner. https://github.com/keras-team/keras-tuner.Google Scholar
- [67] . 2019. A flexible framework for multi-objective Bayesian optimization using random scalarizations. In Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22–25, 2019(
Proceedings of Machine Learning Research , Vol. 115), and (Eds.). AUAI Press, 766–776.Google Scholar - [68] . 2017. DeepXplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. 1–18.Google ScholarDigital Library
- [69] . 2020. Sentiment analysis of IMDb movie reviews using long short-term memory. In 2020 2nd International Conference on Computer and Information Sciences (ICCIS). 1–4.Google ScholarCross Ref
- [70] . 2019. EEGNAS: Neural architecture search for electroencephalography data analysis and decoding. In Human Brain and Artificial Intelligence - First International Workshop, HBAI 2019, Held in Conjunction with IJCAI 2019, Macao, China, August 12, 2019, Revised Selected Papers(
Communications in Computer and Information Science , Vol. 1072), , , , , , and (Eds.). Springer, 3–20.Google Scholar - [71] . 2017. Optimal hyperparameters for deep LSTM-networks for sequence labeling tasks. arXiv preprint arXiv:1707.06799 (2017).Google Scholar
- [72] . 2020. A comprehensive survey of neural architecture search: Challenges and solutions. CoRR abs/2006.02903 (2020).Google Scholar
- [73] . 2013. A study on normalization techniques for privacy preserving data mining. International Journal of Engineering and Technology (IJET) 5, 3 (2013), 2701–2704.Google Scholar
- [74] . 2016. The validity and reliability of global positioning systems in team sport: A brief review. The Journal of Strength & Conditioning Research 30, 5 (2016), 1470–1490.Google ScholarCross Ref
- [75] . 2012. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems. 2951–2959.Google ScholarDigital Library
- [76] . 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 56 (2014), 1929–1958.Google ScholarDigital Library
- [77] . 2007. Models for Probability and Statistical Inference: Theory and Applications, Vol. 652, John Wiley & Sons.Google ScholarCross Ref
- [78] . 2021. Training with quantization noise for extreme model compression. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021.Google Scholar
- [79] . 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295–2329.Google ScholarCross Ref
- [80] . 2019. A unified view of parallel multi-objective evolutionary algorithms. J. Parallel Distributed Comput. 133 (2019), 349–358.Google ScholarCross Ref
- [81] . 2019. MnasNet: Platform-aware neural architecture search for mobile. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019. Computer Vision Foundation/IEEE, 2820–2828.Google ScholarCross Ref
- [82] . 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11–14, 2013, , , , , , , , , and (Eds.). ACM, 847–855.Google ScholarDigital Library
- [83] . 2018. Clip-Q: Deep network compression learning by in-parallel pruning-quantization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7873–7882.Google ScholarCross Ref
- [84] . 2017. Advertisement image classification using convolutional neural network. In 2017 9th International Conference on Knowledge and Systems Engineering (KSE). 197–202.Google ScholarCross Ref
- [85] . 2016. Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 606–615.Google ScholarCross Ref
- [86] . 2019. Can hyperparameter tuning improve the performance of a super learner?: A case study. Epidemiology (Cambridge, Mass.) 30, 4 (2019), 521.Google ScholarCross Ref
- [87] . 2018. FastDeepIoT: Towards understanding and optimizing neural network execution time on mobile and embedded devices. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. 278–291.Google ScholarDigital Library
- [88] . 2021. HAWQ-V3: Dyadic neural network quantization. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event(
Proceedings of Machine Learning Research , Vol. 139). PMLR, 11875–11886.Google Scholar - [89] . 2019. Progressive DNN compression: A key to achieve ultra-high weight pruning and quantization rates using ADMM. arXiv preprint arXiv:1903.09769 (2019).Google Scholar
- [90] . 2009. Sequential Approximate Multiobjective Optimization Using Computational Intelligence. Springer.Google ScholarCross Ref
- [91] . 2020. Food recognition with ResNet-50. In 2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET). 1–5.Google ScholarCross Ref
- [92] . 2012. Huffman encoding and data compression.Google Scholar
- [93] . 2007. MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11, 6 (2007), 712–731.Google ScholarDigital Library
- [94] . 2020. Random hypervolume scalarizations for provable multi-objective black box optimization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event(
Proceedings of Machine Learning Research , Vol. 119). PMLR, 11096–11105.Google Scholar
Index Terms
- An Empirical Study of the Impact of Hyperparameter Tuning and Model Optimization on the Performance Properties of Deep Neural Networks
Recommendations
Hyperparameter tuning of convolutional neural networks for building construction image classification
AbstractDeep Learning models have important applications in image processing. However, one of the challenges in this field is the definition of hyperparameters. Thus, the objective of this work is to propose a rigorous methodology for hyperparameter ...
Hyperparameter optimization of pre-trained convolutional neural networks using adolescent identity search algorithm
AbstractConvolutional neural networks (CNNs) are widely used deep learning (DL) models for image classification. The selected hyperparameters for training convolutional neural network (CNN) models have a significant effect on the performance. Therefore, ...
Hyperparameter optimization of deep neural network using univariate dynamic encoding algorithm for searches
AbstractThis paper proposes a method to find the hyperparameter tuning for a deep neural network by using a univariate dynamic encoding algorithm for searches. Optimizing hyperparameters for such a neural network is difficult because the ...
Highlights- An optimization method for hyper-parameters for a deep neural network.
- ...
Comments