skip to main content
10.1145/3460319.3464843acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Predoo: precision testing of deep learning operators

Published: 11 July 2021 Publication History

Abstract

Deep learning(DL) techniques attract people from various fields with superior performance in making progressive breakthroughs. To ensure the quality of DL techniques, researchers have been working on testing and verification approaches. Some recent studies reveal that the underlying DL operators could cause defects inside a DL model. DL operators work as fundamental components in DL libraries. Library developers still work on practical approaches to ensure the quality of operators they provide. However, the variety of DL operators and the implementation complexity make it challenging to evaluate their quality. Operator testing with limited test cases may fail to reveal hidden defects inside the implementation. Besides, the existing model-to-library testing approach requires extra labor and time cost to identify and locate errors, i.e., developers can only react to the exposed defects. This paper proposes a fuzzing-based operator-level precision testing approach to estimate individual DL operators' precision errors to bridge this gap. Unlike conventional fuzzing techniques, valid shape variable inputs and fine-grained precision error evaluation are implemented. The testing of DL operators is treated as a searching problem to maximize output precision errors. We implement our approach in a tool named Predoo and conduct an experiment on seven DL operators from TensorFlow. The experiment result shows that Predoo can trigger larger precision errors compared to the error threshold declared in the testing scripts from the TensorFlow repository.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2016. Tensorflow: A System for Large-scale Machine Learning. In $USENIX$ Symposium on Operating Systems Design and Implementation. 265–283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
[2]
Rachel Anderson, Hui Li, Yu Ji, Peifang Liu, and Maryellen L. Giger. 2019. Evaluating Deep Learning Techniques for Dynamic Contrast-enhanced MRI in The Diagnosis of Breast Cancer. In Medical Imaging 2019: Computer-Aided Diagnosis, San Diego, California, United States, 16-21 February 2019, Kensaku Mori and Horst K. Hahn (Eds.) (SPIE Proceedings, Vol. 10950). SPIE, 1095006. https://doi.org/10.1117/12.2512667
[3]
Roberto Bagnara, Matthieu Carlier, Roberta Gori, and Arnaud Gotlieb. 2013. Symbolic Path-oriented Test Data Generation for Floating-point Programs. In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation. 1–10. https://doi.org/10.1109/ICST.2013.17
[4]
Tao Bao and Xiangyu Zhang. 2013. On-the-fly Detection of Instability Problems in Floating-point Program Execution. In Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications. 817–832. https://doi.org/10.1145/2509136.2509526
[5]
Earl T Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2014. The Oracle Problem in Software Testing: A Survey. IEEE Transactions on Software Engineering, 41, 5 (2014), 507–525. https://doi.org/10.1109/TSE.2014.2372785
[6]
Earl T Barr, Thanh Vo, Vu Le, and Zhendong Su. 2013. Automatic Detection of Floating-point Exceptions. ACM Sigplan Notices, 48, 1 (2013), 549–560. https://doi.org/10.1145/2429069.2429133
[7]
Yoshua Bengio and Yann LeCun. 2007. Scaling Learning Algorithms Towards AI. Large-scale Kernel Machines, 34, 5 (2007), 1–41.
[8]
Wei-Fan Chiang, Ganesh Gopalakrishnan, Zvonimir Rakamaric, and Alexey Solovyev. 2014. Efficient Search for Inputs Causing High Floating-point Errors. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 43–52. https://doi.org/10.1145/2555243.2555265
[9]
Anthony Di Franco, Hui Guo, and Cindy Rubio-González. 2017. A Comprehensive Study of Real-world Numerical Bug Characteristics. In IEEE/ACM International Conference on Automated Software Engineering. 509–519. https://doi.org/10.1109/ASE.2017.8115662
[10]
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep Learning. 1, MIT Press Cambridge. https://doi.org/10.1007/s10710-017-9314-z
[11]
Antonio Gulli and Sujit Pal. 2017. Deep Learning with Keras. Packt Publishing Ltd.
[12]
Hui Guo and Cindy Rubio-González. 2020. Efficient generation of error-inducing floating-point inputs via symbolic execution. In ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1261–1272. https://doi.org/10.1145/3377811.3380359
[13]
Qianyu Guo, Xiaofei Xie, Yi Li, Xiaoyu Zhang, Yang Liu, Xiaohong Li, and Chao Shen. 2020. Audee: Automated Testing for Deep Learning Frameworks. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 486–498. https://doi.org/10.1145/3324884.3416571
[14]
Richard Hamlet and J Maciniak. 1994. Random Testing. Encyclopedia of Software Engineering. Wiley: New York, 970–978.
[15]
Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A Comprehensive Study on Deep Learning Bug Characteristics. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 510–520. https://doi.org/10.1145/3338906.3338955
[16]
Anastasiia Izycheva and Eva Darulova. 2017. On Sound Relative Error Bounds for Floating-point Arithmetic. In Formal Methods in Computer Aided Design. 15–22. https://doi.org/10.23919/FMCAD.2017.8102236
[17]
Li Jia, Hao Zhong, Xiaoyin Wang, Linpeng Huang, and Xuansheng Lu. 2020. An Empirical Study on Bugs Inside TensorFlow. In Database Systems for Advanced Applications - 25th International Conference, DASFAA 2020, Jeju, South Korea, September 24-27, 2020, Proceedings, Part I, Yunmook Nah, Bin Cui, Sang-Won Lee, Jeffrey Xu Yu, Yang-Sae Moon, and Steven Euijong Whang (Eds.) (Lecture Notes in Computer Science, Vol. 12112). Springer, 604–620. https://doi.org/10.1007/978-3-030-59410-7_40
[18]
Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lv, and Zhihua Wu. 2020. MNN: A Universal and Efficient Inference Engine. In MLSys. https://proceedings.mlsys.org/book/287.pdf
[19]
Tamara G Kolda and Brett W Bader. 2009. Tensor Decompositions and Applications. SIAM review, 51, 3 (2009), 455–500. https://doi.org/10.1137/07070111X
[20]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems. 1097–1105. https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html
[21]
Ignacio Laguna. 2019. FPChecker: Detecting Floating-point Exceptions in GPU Applications. In IEEE/ACM International Conference on Automated Software Engineering. 1126–1129. https://doi.org/10.1109/ASE.2019.00118
[22]
Marius Leordeanu. 2020. Unsupervised Learning in Space and Time - A Modern Approach for Computer Vision using Graph-based Techniques and Deep Neural Networks. Springer. isbn:978-3-030-42127-4 https://doi.org/10.1007/978-3-030-42128-1
[23]
Weisi Luo, Dong Chai, Xiaoyue Run, Jiang Wang, Chunrong Fang, and Zhenyu Chen. 2021. Graph-based Fuzz Testing for Deep Learning Inference Engines. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 288–299. https://doi.org/10.1109/ICSE43902.2021.00037
[24]
William M McKeeman. 1998. Differential Testing for Software. Digital Technical Journal, 10, 1 (1998), 100–107. http://www.hpl.hp.com/hpjournal/dtj/vol10num1/vol10num1art9.pdf
[25]
Charlie Miller and Zachary NJ Peterson. 2007. Analysis of mutation and generation-based fuzzing. Independent Security Evaluators, Tech. Rep, 56 (2007), 127–135.
[26]
Mahdi Nejadgholi and Jinqiu Yang. 2019. A Study of Oracle Approximations in Testing Deep Learning Libraries. In IEEE/ACM International Conference on Automated Software Engineering. 785–796. https://doi.org/10.1109/ASE.2019.00078
[27]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, and Luca Antiga. 2019. Pytorch: An Imperative Style, High-performance Deep Learning Library. In Advances in Neural Information Processing Systems. 8026–8037. https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html
[28]
Vern I Paulsen and RR Smith. 1987. Multilinear Maps and Tensor Norms on Operator Systems. Journal of functional analysis, 73, 2 (1987), 258–276. https://doi.org/10.1016/0022-1236(87)90068-1
[29]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated Whitebox Testing of Deep Learning Systems. In Symposium on Operating Systems Principles. 1–18. https://doi.org/10.1145/3132747.3132785
[30]
Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: Cross-backend Validation to Detect and Localize Bugs in Deep Learning Libraries. In International Conference on Software Engineering. 1027–1038. https://doi.org/10.1109/ICSE.2019.00107
[31]
Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. 2018. Reachability Analysis of Deep Neural Networks with Provable Guarantees. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, Jérôme Lang (Ed.). ijcai.org, 2651–2659. https://doi.org/10.24963/ijcai.2018/368
[32]
Stephen F Siegel, Anastasia Mironova, George S Avrunin, and Lori A Clarke. 2008. Combining Symbolic Execution with Model Checking to Verify Parallel Numerical Programs. ACM Transactions on Software Engineering and Methodology, 17, 2 (2008), 1–34. https://doi.org/10.1145/1348250.1348256
[33]
Laura Titolo, Marco A Feliú, Mariano Moscato, and César A Muñoz. 2018. An Abstract Interpretation Framework for The Round-off Error Analysis of Floating-point Programs. In International Conference on Verification, Model Checking, and Abstract Interpretation. 516–537. https://doi.org/10.1007/978-3-319-73721-8_24
[34]
Petar Tsankov, Mohammad Torabi Dashti, and David Basin. 2013. Semi-valid Input Coverage for Fuzz Testing. In Proceedings of the 2013 International Symposium on Software Testing and Analysis. 56–66. https://doi.org/10.1145/2483760.2483787
[35]
Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, and Dongdi Zhang. 2020. Deep Learning Library Testing via Effective Model Generation. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 788–799. https://doi.org/10.1145/3368089.3409761
[36]
W Eric Wong, Joseph R Horgan, Saul London, and Hiralal Agrawal. 1997. A Study of Effective Regression Testing in Practice. In International Symposium on Software Reliability Engineering. 264–274. https://doi.org/10.1109/ISSRE.1997.630875
[37]
Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. Deephunter: A Coverage-guided Fuzz Testing Framework for Deep Neural Networks. In ACM SIGSOFT International Symposium on Software Testing and Analysis. 146–157. https://doi.org/10.1145/3293882.3330579
[38]
Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. 2015. Empirical Evaluation of Rectified Activations in Convolutional Network. arXiv preprint arXiv:1505.00853, arxiv:1505.00853
[39]
Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Zhiheng Huang, Brian Guenter, Oleksii Kuchaiev, Yu Zhang, Frank Seide, and Huaming Wang. 2014. An Introduction to Computational Networks and The Computational Network Toolkit. Microsoft Technical Report MSR-TR-2014–112.
[40]
Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. 2019. Adversarial Examples: Attacks and Defenses for Deep Learning. IEEE Transactions on Neural Networks and Learning Systems, 30, 9 (2019), 2805–2824. https://doi.org/10.1109/TNNLS.2018.2886017
[41]
Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An Empirical Study on TensorFlow Program Bugs. In ACM SIGSOFT International Symposium on Software Testing and Analysis. 129–140. https://doi.org/10.1145/3213846.3213866
[42]
Yuhao Zhang, Luyao Ren, Liqian Chen, Yingfei Xiong, Shing-Chi Cheung, and Tao Xie. 2020. Detecting Numerical Bugs in Neural Network Architectures. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 826–837. https://doi.org/10.1145/3368089.3409720
[43]
Peiyuan Zong, Tao Lv, Dawei Wang, Zizhuang Deng, Ruigang Liang, and Kai Chen. 2020. FuzzGuard: Filtering out Unreachable Inputs in Directed Grey-box Fuzzing through Deep Learning. In USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020, Srdjan Capkun and Franziska Roesner (Eds.). USENIX Association, 2255–2269. https://www.usenix.org/conference/usenixsecurity20/presentation/zong
[44]
Daming Zou, Ran Wang, Yingfei Xiong, Lu Zhang, Zhendong Su, and Hong Mei. 2015. A Genetic Algorithm for Detecting Significant Floating-point Inaccuracies. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. 1, 529–539. https://doi.org/10.1109/ICSE.2015.70

Cited By

View all
  • (2025)Deep Learning Library Testing: Definition, Methods and ChallengesACM Computing Surveys10.1145/3716497Online publication date: 5-Feb-2025
  • (2025)D3: Differential Testing of Distributed Deep Learning With Model GenerationIEEE Transactions on Software Engineering10.1109/TSE.2024.346165751:1(38-52)Online publication date: 1-Jan-2025
  • (2025)A Comprehensive Review on Deep Learning System TestingAlgorithms and Architectures for Parallel Processing10.1007/978-981-96-1548-3_12(181-191)Online publication date: 17-Feb-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2021: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2021
685 pages
ISBN:9781450384599
DOI:10.1145/3460319
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning operators
  2. floating-point operation
  3. precision testing

Qualifiers

  • Research-article

Funding Sources

Conference

ISSTA '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)114
  • Downloads (Last 6 weeks)12
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Deep Learning Library Testing: Definition, Methods and ChallengesACM Computing Surveys10.1145/3716497Online publication date: 5-Feb-2025
  • (2025)D3: Differential Testing of Distributed Deep Learning With Model GenerationIEEE Transactions on Software Engineering10.1109/TSE.2024.346165751:1(38-52)Online publication date: 1-Jan-2025
  • (2025)A Comprehensive Review on Deep Learning System TestingAlgorithms and Architectures for Parallel Processing10.1007/978-981-96-1548-3_12(181-191)Online publication date: 17-Feb-2025
  • (2024)Mutation-Based Deep Learning Framework Testing Method in JavaScript EnvironmentProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695478(970-981)Online publication date: 27-Oct-2024
  • (2024)A PSO-based Method to Test Deep Learning Library at API LevelProceedings of the 3rd International Conference on Computer, Artificial Intelligence and Control Engineering10.1145/3672758.3672777(117-130)Online publication date: 26-Jan-2024
  • (2024)A Miss Is as Good as A Mile: Metamorphic Testing for Deep Learning OperatorsProceedings of the ACM on Software Engineering10.1145/36607961:FSE(2005-2027)Online publication date: 12-Jul-2024
  • (2024)CGFuzz: A Dynamic Test Case Generation Method for DL Framework Based on Function Coverage2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C63300.2024.00070(507-516)Online publication date: 1-Jul-2024
  • (2024)Detecting Numerical Deviations in Deep Learning Models Introduced by the TVM Compiler2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00018(73-83)Online publication date: 28-Oct-2024
  • (2024)A Survey of Security Testing Techniques for Deep Learning Frameworks2024 9th International Conference on Signal and Image Processing (ICSIP)10.1109/ICSIP61881.2024.10671492(404-415)Online publication date: 12-Jul-2024
  • (2024)Python Coverage Guided Fuzzing for Deep Learning Framework2024 International Conference on Electronic Engineering and Information Systems (EEISS)10.1109/EEISS62553.2024.00007(1-6)Online publication date: 13-Jan-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media