research-article

Predoo: precision testing of deep learning operators

Authors:

Zhenyu ChenAuthors Info & Claims

ISSTA 2021: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 400 - 412

https://doi.org/10.1145/3460319.3464843

Published: 11 July 2021 Publication History

Abstract

Deep learning(DL) techniques attract people from various fields with superior performance in making progressive breakthroughs. To ensure the quality of DL techniques, researchers have been working on testing and verification approaches. Some recent studies reveal that the underlying DL operators could cause defects inside a DL model. DL operators work as fundamental components in DL libraries. Library developers still work on practical approaches to ensure the quality of operators they provide. However, the variety of DL operators and the implementation complexity make it challenging to evaluate their quality. Operator testing with limited test cases may fail to reveal hidden defects inside the implementation. Besides, the existing model-to-library testing approach requires extra labor and time cost to identify and locate errors, i.e., developers can only react to the exposed defects. This paper proposes a fuzzing-based operator-level precision testing approach to estimate individual DL operators' precision errors to bridge this gap. Unlike conventional fuzzing techniques, valid shape variable inputs and fine-grained precision error evaluation are implemented. The testing of DL operators is treated as a searching problem to maximize output precision errors. We implement our approach in a tool named Predoo and conduct an experiment on seven DL operators from TensorFlow. The experiment result shows that Predoo can trigger larger precision errors compared to the error threshold declared in the testing scripts from the TensorFlow repository.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2016. Tensorflow: A System for Large-scale Machine Learning. In $USENIX$ Symposium on Operating Systems Design and Implementation. 265–283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi

[2]

Rachel Anderson, Hui Li, Yu Ji, Peifang Liu, and Maryellen L. Giger. 2019. Evaluating Deep Learning Techniques for Dynamic Contrast-enhanced MRI in The Diagnosis of Breast Cancer. In Medical Imaging 2019: Computer-Aided Diagnosis, San Diego, California, United States, 16-21 February 2019, Kensaku Mori and Horst K. Hahn (Eds.) (SPIE Proceedings, Vol. 10950). SPIE, 1095006. https://doi.org/10.1117/12.2512667

[3]

Roberto Bagnara, Matthieu Carlier, Roberta Gori, and Arnaud Gotlieb. 2013. Symbolic Path-oriented Test Data Generation for Floating-point Programs. In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation. 1–10. https://doi.org/10.1109/ICST.2013.17

Digital Library

[4]

Tao Bao and Xiangyu Zhang. 2013. On-the-fly Detection of Instability Problems in Floating-point Program Execution. In Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications. 817–832. https://doi.org/10.1145/2509136.2509526

Digital Library

[5]

Earl T Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2014. The Oracle Problem in Software Testing: A Survey. IEEE Transactions on Software Engineering, 41, 5 (2014), 507–525. https://doi.org/10.1109/TSE.2014.2372785

Digital Library

[6]

Earl T Barr, Thanh Vo, Vu Le, and Zhendong Su. 2013. Automatic Detection of Floating-point Exceptions. ACM Sigplan Notices, 48, 1 (2013), 549–560. https://doi.org/10.1145/2429069.2429133

Digital Library

[7]

Yoshua Bengio and Yann LeCun. 2007. Scaling Learning Algorithms Towards AI. Large-scale Kernel Machines, 34, 5 (2007), 1–41.

[8]

Wei-Fan Chiang, Ganesh Gopalakrishnan, Zvonimir Rakamaric, and Alexey Solovyev. 2014. Efficient Search for Inputs Causing High Floating-point Errors. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 43–52. https://doi.org/10.1145/2555243.2555265

Digital Library

[9]

Anthony Di Franco, Hui Guo, and Cindy Rubio-González. 2017. A Comprehensive Study of Real-world Numerical Bug Characteristics. In IEEE/ACM International Conference on Automated Software Engineering. 509–519. https://doi.org/10.1109/ASE.2017.8115662

[10]

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep Learning. 1, MIT Press Cambridge. https://doi.org/10.1007/s10710-017-9314-z

Digital Library

[11]

Antonio Gulli and Sujit Pal. 2017. Deep Learning with Keras. Packt Publishing Ltd.

[12]

Hui Guo and Cindy Rubio-González. 2020. Efficient generation of error-inducing floating-point inputs via symbolic execution. In ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, Gregg Rothermel and Doo-Hwan Bae (Eds.). ACM, 1261–1272. https://doi.org/10.1145/3377811.3380359

Digital Library

[13]

Qianyu Guo, Xiaofei Xie, Yi Li, Xiaoyu Zhang, Yang Liu, Xiaohong Li, and Chao Shen. 2020. Audee: Automated Testing for Deep Learning Frameworks. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 486–498. https://doi.org/10.1145/3324884.3416571

Digital Library

[14]

Richard Hamlet and J Maciniak. 1994. Random Testing. Encyclopedia of Software Engineering. Wiley: New York, 970–978.

[15]

Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A Comprehensive Study on Deep Learning Bug Characteristics. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 510–520. https://doi.org/10.1145/3338906.3338955

Digital Library

[16]

Anastasiia Izycheva and Eva Darulova. 2017. On Sound Relative Error Bounds for Floating-point Arithmetic. In Formal Methods in Computer Aided Design. 15–22. https://doi.org/10.23919/FMCAD.2017.8102236

[17]

Li Jia, Hao Zhong, Xiaoyin Wang, Linpeng Huang, and Xuansheng Lu. 2020. An Empirical Study on Bugs Inside TensorFlow. In Database Systems for Advanced Applications - 25th International Conference, DASFAA 2020, Jeju, South Korea, September 24-27, 2020, Proceedings, Part I, Yunmook Nah, Bin Cui, Sang-Won Lee, Jeffrey Xu Yu, Yang-Sae Moon, and Steven Euijong Whang (Eds.) (Lecture Notes in Computer Science, Vol. 12112). Springer, 604–620. https://doi.org/10.1007/978-3-030-59410-7_40

Digital Library

[18]

Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lv, and Zhihua Wu. 2020. MNN: A Universal and Efficient Inference Engine. In MLSys. https://proceedings.mlsys.org/book/287.pdf

[19]

Tamara G Kolda and Brett W Bader. 2009. Tensor Decompositions and Applications. SIAM review, 51, 3 (2009), 455–500. https://doi.org/10.1137/07070111X

Digital Library

[20]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems. 1097–1105. https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html

[21]

Ignacio Laguna. 2019. FPChecker: Detecting Floating-point Exceptions in GPU Applications. In IEEE/ACM International Conference on Automated Software Engineering. 1126–1129. https://doi.org/10.1109/ASE.2019.00118

Digital Library

[22]

Marius Leordeanu. 2020. Unsupervised Learning in Space and Time - A Modern Approach for Computer Vision using Graph-based Techniques and Deep Neural Networks. Springer. isbn:978-3-030-42127-4 https://doi.org/10.1007/978-3-030-42128-1

[23]

Weisi Luo, Dong Chai, Xiaoyue Run, Jiang Wang, Chunrong Fang, and Zhenyu Chen. 2021. Graph-based Fuzz Testing for Deep Learning Inference Engines. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 288–299. https://doi.org/10.1109/ICSE43902.2021.00037

Digital Library

[24]

William M McKeeman. 1998. Differential Testing for Software. Digital Technical Journal, 10, 1 (1998), 100–107. http://www.hpl.hp.com/hpjournal/dtj/vol10num1/vol10num1art9.pdf

[25]

Charlie Miller and Zachary NJ Peterson. 2007. Analysis of mutation and generation-based fuzzing. Independent Security Evaluators, Tech. Rep, 56 (2007), 127–135.

[26]

Mahdi Nejadgholi and Jinqiu Yang. 2019. A Study of Oracle Approximations in Testing Deep Learning Libraries. In IEEE/ACM International Conference on Automated Software Engineering. 785–796. https://doi.org/10.1109/ASE.2019.00078

Digital Library

[27]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, and Luca Antiga. 2019. Pytorch: An Imperative Style, High-performance Deep Learning Library. In Advances in Neural Information Processing Systems. 8026–8037. https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html

[28]

Vern I Paulsen and RR Smith. 1987. Multilinear Maps and Tensor Norms on Operator Systems. Journal of functional analysis, 73, 2 (1987), 258–276. https://doi.org/10.1016/0022-1236(87)90068-1

[29]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated Whitebox Testing of Deep Learning Systems. In Symposium on Operating Systems Principles. 1–18. https://doi.org/10.1145/3132747.3132785

Digital Library

[30]

Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: Cross-backend Validation to Detect and Localize Bugs in Deep Learning Libraries. In International Conference on Software Engineering. 1027–1038. https://doi.org/10.1109/ICSE.2019.00107

Digital Library

[31]

Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. 2018. Reachability Analysis of Deep Neural Networks with Provable Guarantees. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, Jérôme Lang (Ed.). ijcai.org, 2651–2659. https://doi.org/10.24963/ijcai.2018/368

[32]

Stephen F Siegel, Anastasia Mironova, George S Avrunin, and Lori A Clarke. 2008. Combining Symbolic Execution with Model Checking to Verify Parallel Numerical Programs. ACM Transactions on Software Engineering and Methodology, 17, 2 (2008), 1–34. https://doi.org/10.1145/1348250.1348256

Digital Library

[33]

Laura Titolo, Marco A Feliú, Mariano Moscato, and César A Muñoz. 2018. An Abstract Interpretation Framework for The Round-off Error Analysis of Floating-point Programs. In International Conference on Verification, Model Checking, and Abstract Interpretation. 516–537. https://doi.org/10.1007/978-3-319-73721-8_24

[34]

Petar Tsankov, Mohammad Torabi Dashti, and David Basin. 2013. Semi-valid Input Coverage for Fuzz Testing. In Proceedings of the 2013 International Symposium on Software Testing and Analysis. 56–66. https://doi.org/10.1145/2483760.2483787

Digital Library

[35]

Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, and Dongdi Zhang. 2020. Deep Learning Library Testing via Effective Model Generation. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 788–799. https://doi.org/10.1145/3368089.3409761

Digital Library

[36]

W Eric Wong, Joseph R Horgan, Saul London, and Hiralal Agrawal. 1997. A Study of Effective Regression Testing in Practice. In International Symposium on Software Reliability Engineering. 264–274. https://doi.org/10.1109/ISSRE.1997.630875

[37]

Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. Deephunter: A Coverage-guided Fuzz Testing Framework for Deep Neural Networks. In ACM SIGSOFT International Symposium on Software Testing and Analysis. 146–157. https://doi.org/10.1145/3293882.3330579

Digital Library

[38]

Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. 2015. Empirical Evaluation of Rectified Activations in Convolutional Network. arXiv preprint arXiv:1505.00853, arxiv:1505.00853

[39]

Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Zhiheng Huang, Brian Guenter, Oleksii Kuchaiev, Yu Zhang, Frank Seide, and Huaming Wang. 2014. An Introduction to Computational Networks and The Computational Network Toolkit. Microsoft Technical Report MSR-TR-2014–112.

[40]

Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. 2019. Adversarial Examples: Attacks and Defenses for Deep Learning. IEEE Transactions on Neural Networks and Learning Systems, 30, 9 (2019), 2805–2824. https://doi.org/10.1109/TNNLS.2018.2886017

[41]

Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An Empirical Study on TensorFlow Program Bugs. In ACM SIGSOFT International Symposium on Software Testing and Analysis. 129–140. https://doi.org/10.1145/3213846.3213866

Digital Library

[42]

Yuhao Zhang, Luyao Ren, Liqian Chen, Yingfei Xiong, Shing-Chi Cheung, and Tao Xie. 2020. Detecting Numerical Bugs in Neural Network Architectures. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 826–837. https://doi.org/10.1145/3368089.3409720

Digital Library

[43]

Peiyuan Zong, Tao Lv, Dawei Wang, Zizhuang Deng, Ruigang Liang, and Kai Chen. 2020. FuzzGuard: Filtering out Unreachable Inputs in Directed Grey-box Fuzzing through Deep Learning. In USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020, Srdjan Capkun and Franziska Roesner (Eds.). USENIX Association, 2255–2269. https://www.usenix.org/conference/usenixsecurity20/presentation/zong

[44]

Daming Zou, Ran Wang, Yingfei Xiong, Lu Zhang, Zhendong Su, and Hong Mei. 2015. A Genetic Algorithm for Detecting Significant Floating-point Inaccuracies. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. 1, 529–539. https://doi.org/10.1109/ICSE.2015.70

Cited By

Zhang XJiang WShen CLi QWang QLin CGuan X(2025)Deep Learning Library Testing: Definition, Methods and ChallengesACM Computing Surveys10.1145/3716497Online publication date: 5-Feb-2025
https://dl.acm.org/doi/10.1145/3716497
Wang JPham HLi QTan LGuo YAziz AMeijer E(2025)D³: Differential Testing of Distributed Deep Learning With Model GenerationIEEE Transactions on Software Engineering10.1109/TSE.2024.346165751:1(38-52)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TSE.2024.3461657
Li YShan CLiu ZLiao S(2025)A Comprehensive Review on Deep Learning System TestingAlgorithms and Architectures for Parallel Processing10.1007/978-981-96-1548-3_12(181-191)Online publication date: 17-Feb-2025
https://doi.org/10.1007/978-981-96-1548-3_12
Show More Cited By

Recommendations

Deep learning library testing via effective model generation
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Deep learning (DL) techniques are rapidly developed and have been widely adopted in practice. However, similar to traditional software systems, DL systems also contain bugs, which could cause serious impacts especially in safety-critical domains. ...
DocTer: documentation-guided fuzzing for testing deep learning API functions
ISSTA 2022: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Input constraints are useful for many software development tasks. For example, input constraints of a function enable the generation of valid inputs, i.e., inputs that follow these constraints, to test the function deeper. API functions of deep learning ...
Free lunch for testing: fuzzing deep-learning libraries from open source
ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Deep learning (DL) systems can make our life much easier, and thus are gaining more and more attention from both academia and industry. Meanwhile, bugs in DL systems can be disastrous, and can even threaten human lives in safety-critical applications. To ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2021: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 2021

685 pages

ISBN:9781450384599

DOI:10.1145/3460319

General Chair:
Cristian Cadar
Imperial College London, UK
,
Program Chair:
Xiangyu Zhang
Purdue University, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ISSTA '21

Sponsor:

SIGSOFT

ISSTA '21: 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 11 - 17, 2021

Virtual, Denmark

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
641
Total Downloads

Downloads (Last 12 months)114
Downloads (Last 6 weeks)12

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XJiang WShen CLi QWang QLin CGuan X(2025)Deep Learning Library Testing: Definition, Methods and ChallengesACM Computing Surveys10.1145/3716497Online publication date: 5-Feb-2025
https://dl.acm.org/doi/10.1145/3716497
Wang JPham HLi QTan LGuo YAziz AMeijer E(2025)D³: Differential Testing of Distributed Deep Learning With Model GenerationIEEE Transactions on Software Engineering10.1109/TSE.2024.346165751:1(38-52)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TSE.2024.3461657
Li YShan CLiu ZLiao S(2025)A Comprehensive Review on Deep Learning System TestingAlgorithms and Architectures for Parallel Processing10.1007/978-981-96-1548-3_12(181-191)Online publication date: 17-Feb-2025
https://doi.org/10.1007/978-981-96-1548-3_12
Zou YZhai JFang CLiu JZheng TChen ZFilkov VRay BZhou M(2024)Mutation-Based Deep Learning Framework Testing Method in JavaScript EnvironmentProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695478(970-981)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695478
Liao SShan C(2024)A PSO-based Method to Test Deep Learning Library at API LevelProceedings of the 3rd International Conference on Computer, Artificial Intelligence and Control Engineering10.1145/3672758.3672777(117-130)Online publication date: 26-Jan-2024
https://dl.acm.org/doi/10.1145/3672758.3672777
Chen JJia CYan YGe JZheng HCheng Y(2024)A Miss Is as Good as A Mile: Metamorphic Testing for Deep Learning OperatorsProceedings of the ACM on Software Engineering10.1145/36607961:FSE(2005-2027)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660796
Cai QYin BShi J(2024)CGFuzz: A Dynamic Test Case Generation Method for DL Framework Based on Function Coverage2024 IEEE 24th International Conference on Software Quality, Reliability, and Security Companion (QRS-C)10.1109/QRS-C63300.2024.00070(507-516)Online publication date: 1-Jul-2024
https://doi.org/10.1109/QRS-C63300.2024.00070
Xia ZChen YNie PWang Z(2024)Detecting Numerical Deviations in Deep Learning Models Introduced by the TVM Compiler2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE62328.2024.00018(73-83)Online publication date: 28-Oct-2024
https://doi.org/10.1109/ISSRE62328.2024.00018
Li HLi XNie YTian J(2024)A Survey of Security Testing Techniques for Deep Learning Frameworks2024 9th International Conference on Signal and Image Processing (ICSIP)10.1109/ICSIP61881.2024.10671492(404-415)Online publication date: 12-Jul-2024
https://doi.org/10.1109/ICSIP61881.2024.10671492
Nie YXiao XYang BLi HLuo LYu HSun G(2024)Python Coverage Guided Fuzzing for Deep Learning Framework2024 International Conference on Electronic Engineering and Information Systems (EEISS)10.1109/EEISS62553.2024.00007(1-6)Online publication date: 13-Jan-2024
https://doi.org/10.1109/EEISS62553.2024.00007
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten