skip to main content
10.1145/3510003.3510229acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

VulCNN: an image-inspired scalable vulnerability detection system

Authors Info & Claims
Published:05 July 2022Publication History

ABSTRACT

Since deep learning (DL) can automatically learn features from source code, it has been widely used to detect source code vulnerability. To achieve scalable vulnerability scanning, some prior studies intend to process the source code directly by treating them as text. To achieve accurate vulnerability detection, other approaches consider distilling the program semantics into graph representations and using them to detect vulnerability. In practice, text-based techniques are scalable but not accurate due to the lack of program semantics. Graph-based methods are accurate but not scalable since graph analysis is typically time-consuming.

In this paper, we aim to achieve both scalability and accuracy on scanning large-scale source code vulnerabilities. Inspired by existing DL-based image classification which has the ability to analyze millions of images accurately, we prefer to use these techniques to accomplish our purpose. Specifically, we propose a novel idea that can efficiently convert the source code of a function into an image while preserving the program details. We implement VulCNN and evaluate it on a dataset of 13,687 vulnerable functions and 26,970 non-vulnerable functions. Experimental results report that VulCNN can achieve better accuracy than eight state-of-the-art vulnerability detectors (i.e., Checkmarx, FlawFinder, RATS, TokenCNN, VulDeePecker, SySeVR, VulDeeLocator, and Devign). As for scalability, VulCNN is about four times faster than VulDeePecker and SySeVR, about 15 times faster than VulDeeLocator, and about six times faster than Devign. Furthermore, we conduct a case study on more than 25 million lines of code and the result indicates that VulCNN can detect large-scale vulnerability. Through the scanning reports, we finally discover 73 vulnerabilities that are not reported in NVD.

References

  1. 2020. 5 key takeaways from the 2020 Open Source Security and Risk Analysis report. https://securityboulevard.com/2020/05/5-key-takeaways-from-the-2020-open-source-security-and-risk-analysis-report.Google ScholarGoogle Scholar
  2. 2020. The Exactis Breach: 5 Things You Need to Know. https://blog.infoarmor.com/individuals-and-families/the-exactis-breach-5-things-you-need-to-know.Google ScholarGoogle Scholar
  3. 2020. WannaCry ransomware attack. https://en.wikipedia.org/wiki/WannaCry_ransomware_attack.Google ScholarGoogle Scholar
  4. 2021. Adjacency Matrix. https://en.wikipedia.org/wiki/Adjacency_matrix/.Google ScholarGoogle Scholar
  5. 2021. Checkmarx. https://www.checkmarx.com/.Google ScholarGoogle Scholar
  6. 2021. FlawFinder. http://www.dwheeler.com/flawfinde/r.Google ScholarGoogle Scholar
  7. 2021. Frama-C. http://frama-c.com/.Google ScholarGoogle Scholar
  8. 2021. Libav. https://libav.org/.Google ScholarGoogle Scholar
  9. 2021. National Institute of Standards and Technology. https://www.nist.gov/.Google ScholarGoogle Scholar
  10. 2021. National Vulnerability Database. https://nvd.nist.gov.Google ScholarGoogle Scholar
  11. 2021. Open-source code analysis platform for C/C++ based on code property graphs. https://joern.io/.Google ScholarGoogle Scholar
  12. 2021. Rough Audit Tool for Security. https://code.google.com/archive/p/rough-auditing-tool-for-security/.Google ScholarGoogle Scholar
  13. 2021. Seamonkey. https://www.seamonkey-project.org/.Google ScholarGoogle Scholar
  14. 2021. Software Assurance Reference Dataset. https://samate.nist.gov/SRD/index.php.Google ScholarGoogle Scholar
  15. 2021. Software for complex networks (Networkx). http://networkx.github.io.Google ScholarGoogle Scholar
  16. 2021. Tensors and Dynamic neural networks in Python with strong GPU acceleration (PyTorch). https://pytorch.org.Google ScholarGoogle Scholar
  17. 2021. Xen. https://xenproject.org/xen-project-archives/.Google ScholarGoogle Scholar
  18. Michael Backes, Boris Köpf, and Andrey Rybalchenko. 2009. Automatic discovery and quantification of information leaks. In Proceedings of the 2009 IEEE Symposium on Security and Privacy (S&P'09). 141--153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed greybox fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS'17). 2329--2344.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N. Balasubramanian. 2018. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV'18). 839--847.Google ScholarGoogle Scholar
  21. Hongxu Chen, Yinxing Xue, Yuekang Li, Bihuan Chen, Xiaofei Xie, Xiuheng Wu, and Yang Liu. 2018. Hawkeye: Towards a desired directed grey-box fuzzer. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS'18). 2095--2108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Xiao Cheng, Haoyu Wang, Jiayi Hua, Guoai Xu, and Yulei Sui. 2021. DeepWukong: Statically detecting software vulnerabilities using deep graph neural network. ACM Transactions on Software Engineering and Methodology 30, 3 (2021), 1--33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. George E. Dahl, Tara N. Sainath, and Geoffrey E. Hinton. 2013. Improving deep neural networks for LVCSR using rectified linear units and dropout. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'13). 8609--8613.Google ScholarGoogle Scholar
  24. Xu Duan, Jingzheng Wu, Shouling Ji, Zhiqing Rui, Tianyue Luo, Mutian Yang, and Yanjun Wu. 2019. VulSniper: Focus your attention to shoot fine-grained vulnerabilities. In Proceedings of the 2019 International Joint Conference on Artificial Intelligence (IJCAI'19). 4665--4671.Google ScholarGoogle ScholarCross RefCross Ref
  25. Linton C. Freeman. 1978. Centrality in social networks conceptual clarification. Social Networks 1, 3 (1978), 215--239.Google ScholarGoogle ScholarCross RefCross Ref
  26. Roger Guimera, Stefano Mossa, Adrian Turtschi, and Luis A. Nunes Amaral. 2005. The worldwide air transportation network: Anomalous centrality, community structure, and cities' global roles. Proceedings of the National Academy of Sciences 102, 22 (2005), 7794--7799.Google ScholarGoogle ScholarCross RefCross Ref
  27. Jiyong Jang, Abeer Agrawal, and David Brumley. 2012. ReDeBug: Finding unpatched code clones in entire OS distributions. In Proceedings of the 2012 IEEE Symposium on Security and Privacy (S&P'12). 48--62.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hawoong Jeong, Sean P. Mason, Albert L. Barabási, and Zoltan N. Oltvai. 2001. Lethality and centrality in protein networks. Nature 411, 6833 (2001), 41--42.Google ScholarGoogle Scholar
  29. Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. 2007. Deckard: Scalable and accurate tree-based detection of code clones. In Proceedings of the 29th International Conference on Software Engineering (ICSE'07). 96--105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (2002), 654--670.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Leo Katz. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 1 (1953), 39--43.Google ScholarGoogle ScholarCross RefCross Ref
  32. Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. 2017. VUDDY: A scalable approach for vulnerable code clone discovery. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (S&P'17). 595--614.Google ScholarGoogle ScholarCross RefCross Ref
  33. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  34. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the 2012 Advances in Neural Information Processing Systems (NIPS'12). 1097--1105.Google ScholarGoogle Scholar
  35. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.Google ScholarGoogle Scholar
  36. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  37. Jingyue Li and Michael D. Ernst. 2012. CBCD: Cloned buggy code detector. In Proceedings of the 34th International Conference on Software Engineering (ICSE'12). 310--320.Google ScholarGoogle Scholar
  38. Zhen Li, Deqing Zou, Shouhuai Xu, Zhaoxuan Chen, Yawei Zhu, and Hai Jin. 2021. Vuldeelocator: a deep learning-based fine-grained vulnerability detector. IEEE Transactions on Dependable and Secure Computing (2021), 1--17.Google ScholarGoogle Scholar
  39. Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Hanchao Qi, and Jie Hu. 2016. VulPecker: An automated vulnerability detection system based on code similarity analysis. In Proceedings of the 32nd Annual Conference on Computer Security Applications (ACSAC'16). 201--213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. SySeVR: A framework for using deep learning to detect software vulnerabilities. IEEE Transactions on Dependable and Secure Computing (2021), 1--15.Google ScholarGoogle Scholar
  41. Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, and Yuyi Zhong. 2018. VulDeePecker: A deep learning-based system for vulnerability detection. In Proceedings of the 2018 Network and Distributed System Security Symposium (NDSS'18). 1--15.Google ScholarGoogle ScholarCross RefCross Ref
  42. Guanjun Lin, Wei Xiao, Jun Zhang, and Yang Xiang. 2019. Deep learning-based vulnerable function detection: A benchmark. In Proceedings of the 2019 International Conference on Information and Communications Security (ICICS'19). 219--232.Google ScholarGoogle Scholar
  43. Guanjun Lin, Jun Zhang, Wei Luo, Lei Pan, and Yang Xiang. 2017. POSTER: Vulnerability discovery with function representation learning from unlabeled projects. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS'17). 2539--2541.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Stephan Neuhaus, Thomas Zimmermann, Christian Holler, and Andreas Zeller. 2007. Predicting vulnerable software components. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS'07). 529--540.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Matteo Pagliardini, Prakhar Gupta, and Martin Jaggi. 2017. Unsupervised learning of sentence embeddings using compositional n-gram features. arXiv preprint arXiv:1703.02507 (2017).Google ScholarGoogle Scholar
  46. Nam H. Pham, Tung Thanh Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2010. Detection of recurring software vulnerabilities. In Proceedings of the 2010 International Conference on Automated Software Engineering (ASE'10). 447--456.Google ScholarGoogle Scholar
  47. Rebecca Russell, Louis Kim, Lei Hamilton, Tomo Lazovich, Jacob Harer, Onur Ozdemir, Paul Ellingwood, and Marc McConley. 2018. Automated vulnerability detection in source code using deep representation learning. In Proceedings of the 2018 IEEE International Conference on Machine Learning and Applications (ICMLA'18). 757--762.Google ScholarGoogle ScholarCross RefCross Ref
  48. Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling code clone detection to big-code. In Proceedings of the 38th International Conference on Software Engineering (ICSE'16). 1157--1168.Google ScholarGoogle Scholar
  49. Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV'17). 618--626.Google ScholarGoogle ScholarCross RefCross Ref
  50. Umesh Shankar, Kunal Talwar, Jeffrey S. Foster, and David A. Wagner. 2001. Detecting format string vulnerabilities with type qualifiers. In Proceedings of the 2001 USENIX Security Symposium (USENIX Security'01). 201--220.Google ScholarGoogle Scholar
  51. Lwin Khin Shar, Lionel C. Briand, and Hee Beng Kuan Tan. 2014. Web application vulnerability prediction using hybrid program analysis and machine learning. IEEE Transactions on Dependable and Secure Computing 12, 6 (2014), 688--707.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Yueming Wu, Xiaodi Li, Deqing Zou, Wei Yang, Xin Zhang, and Hai Jin. 2019. MalScan: Fast market-wide mobile malware scanning by social-network centrality analysis. In Proceedings of the 34th International Conference on Automated Software Engineering (ASE'19). 139--150.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and discovering vulnerabilities with code property graphs. In Proceddings of the 2014 IEEE Symposium on Security and Privacy (S&P'14). 590--604.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Fabian Yamaguchi, Markus Lottmann, and Konrad Rieck. 2012. Generalized vulnerability extrapolation using abstract syntax trees. In Proceedings of the 28th Annual Computer Security Applications Conference (ACSAC'12). 359--368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Fabian Yamaguchi, Alwin Maier, Hugo Gascon, and Konrad Rieck. 2015. Automatic inference of search patterns for taint-style vulnerabilities. In Proceedings of the 2015 IEEE Symposium on Security and Privacy (S&P'15). 797--812.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Proceedings of the 2019 Advances in Neural Information Processing Systems (NIPS'19). 10197--10207.Google ScholarGoogle Scholar
  57. Deqing Zou, Sujuan Wang, Shouhuai Xu, Zhen Li, and Hai Jin. 2019. μVulDeePecker: A deep learning-based system for multiclass vulnerability detection. IEEE Transactions on Dependable and Secure Computing 18, 5 (2019), 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. VulCNN: an image-inspired scalable vulnerability detection system

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICSE '22: Proceedings of the 44th International Conference on Software Engineering
      May 2022
      2508 pages
      ISBN:9781450392211
      DOI:10.1145/3510003

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 July 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate276of1,856submissions,15%

      Upcoming Conference

      ICSE 2025

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader