Skip to main content

Restructured Cloning Vulnerability Detection Based on Function Semantic Reserving and Reiteration Screening

  • Conference paper
  • First Online:
  • 3663 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12308))

Abstract

Although code cloning may speed up the process of software development, it could be detrimental to the software security as undiscovered vulnerabilities can be easily propagated through code clones. Even worse, since developers tend not to simply clone the original code fragments, but also add variable and debug statements, detecting propagated vulnerable code clone is challenging. A few approaches have been proposed to detect such vulnerability- named as restructured cloning vulnerability; However, they usually cannot effectively obtain the vulnerability context and related semantic information. To address this limitation, we propose in this paper a novel approach, called RCVD++, for detecting restructured cloning vulnerabilities, which introduces a new feature extraction for vulnerable code based on program slicing and optimizes the code abstraction and detection granularity. Our approach further features reiteration screening to compensate for the lack of retroactive detection of fingerprint matching. Compared with our previous work RCVD, RCVD++ innovatively utilizes two granularities including line and function, allowing additional detection for exact and renamed clones. Besides, it retains more semantics by identifying library functions and reduces the false positives by screening the detection results. The experimental results on three different datasets indicate that RCVD++ performs better than other detection tools for restructured cloning vulnerability detection.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Li, Z., et al.: Vuldeepecker: a deep learning-based system for vulnerability detection. In: Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS) (2018)

    Google Scholar 

  2. Kim, S., Woo, S., Lee, H., Oh, H.: Vuddy: a scalable approach for vulnerable code clone discovery. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 595–614. IEEE, May 2017

    Google Scholar 

  3. Pham, N.H., Nguyen, T.T., Nguyen, H.A., Wang, X., Nguyen, A.T., Nguyen, T.N.: Detecting recurring and similar software vulnerabilities. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 2, pp. 227–230. ACM, May 2010

    Google Scholar 

  4. Li, J., Ernst, M.D.: CBCD: cloned buggy code detector. In: Proceedings of the 34th International Conference on Software Engineering, pp. 310–320. IEEE Press, June 2012

    Google Scholar 

  5. Jang, J., Agrawal, A., Brumley, D.: ReDeBug: finding unpatched code clones in entire os distributions. In: 2012 IEEE Symposium on Security and Privacy, pp. 48–62. IEEE, May 2012

    Google Scholar 

  6. Li, H., Kwon, H., Kwon, J., Lee, H.: CLORIFI: software vulnerability discovery using code clone verification. Concurr. Comput. Pract. Exp. 28(6), 1900–1917 (2016)

    Article  Google Scholar 

  7. Gan, S., Qin, X., Chen, Z., Wang, L.: Software vulnerability code clone detection method based on characteristic metrics. J. Softw. 26(2), 348–363 (2015)

    Google Scholar 

  8. Liu, Z., Wei, Q., Cao, Y.: Vfdetect: a vulnerable code clone detection system based on vulnerability fingerprint. In: 2017 IEEE 3rd Information Technology and Mechatronics Engineering Conference (ITOEC), pp. 548–553. IEEE, October 2017

    Google Scholar 

  9. Nishi, M.A., Damevski, K.: Scalable code clone detection and search based on adaptive prefix filtering. J. Syst. Softw. 137, 130–142 (2018)

    Article  Google Scholar 

  10. Jiang, L., Misherghi, G., Su, Z., Glondu, S.: Deckard: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on Software Engineering, pp. 96–105. IEEE Computer Society, May 2007

    Google Scholar 

  11. Jiang, W., Wu, B., Jiang, Z., Yang, S.: Cloning vulnerability detection in driver layer of IoT devices. In: Zhou, J., Luo, X., Shen, Q., Xu, Z. (eds.) ICICS 2019. LNCS, vol. 11999, pp. 89–104. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41579-2_6

    Chapter  Google Scholar 

  12. Svajlenko, J., Roy, C.K.: Cloneworks: a fast and flexible large-scale near-miss clone detection tool. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 177–179. IEEE, May 2017

    Google Scholar 

  13. Roy, C.K., Cordy, J.R., Koschke, R.: Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci. Comput. Program. 74(7), 470–495 (2009)

    Article  MathSciNet  Google Scholar 

  14. Yamaguchi, F., Lindner, F., Rieck, K.: Vulnerability extrapolation: assisted discovery of vulnerabilities using machine learning. In: Proceedings of the 5th USENIX Conference on Offensive Technologies, p. 13. USENIX Association, August 2011

    Google Scholar 

  15. Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. 28(7), 654–670 (2002)

    Article  Google Scholar 

  16. Sajnani, H., Saini, V., Svajlenko, J., Roy, C.K., Lopes, C.V.: SourcererCC: scaling code clone detection to big-code. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pp. 1157–1168. IEEE, May 2016

    Google Scholar 

  17. Xu, B., Qian, J., Zhang, X., Wu, Z., Chen, L.: A brief survey of program slicing. ACM Sigsoft Softw. Eng. Notes 30(2), 1–36 (2005)

    Article  Google Scholar 

  18. joern. https://joern.readthedocs.io

  19. Li, Z., et al.: Sysevr: a framework for using deep learning to detect software vulnerabilities. arXiv preprint arXiv:1807.06756 (2018)

  20. Li, Z., Lu, S., Myagmar, S., Zhou, Y.: CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In: OSdi, vol. 4, no. 19, pp. 289–302, December 2004

    Google Scholar 

  21. Li, Z., et al.: VulPecker: an automated vulnerability detection system based on code similarity analysis. In: Proceedings of the 32nd Annual Conference on Computer Security Applications, pp. 201–213. ACM, December 2016

    Google Scholar 

  22. Lin, G., et al.: Cross-project transfer representation learning for vulnerable function discovery. IEEE Trans. Ind. Inform. 14(7), 3289–3297 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the National Nature Science Foundation of China under Grant No. U1936119, National Nature Science Foundation of China under Grant No. 61941116 and National Key R&D Program of China under Grant No. 2019QY(Y)0602.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Wu .

Editor information

Editors and Affiliations

Appendix

Appendix

In this section, we provided the algorithms of greedy-based matching.

Greedy-Based Matching

Algorithm 1 introduces the pseudo code for matching algorithm. C is the target code and F is the fingerprint. The output R will be True if code C contains the fingerprint F, else it will be False. If the length of C is less than the length of F, it’s impossible for C to match the F. If the nth element of C is the same as the mth element of F, the n and m will increase by one at the same time. Otherwise, only n will increase. If F is completely matched, then the fingerprint matching is considered as successful and returns True. From Algorithm 1, it’s explicit that the time complexity is independent of fingerprint length and it has a linear relationship with the lines of code and the number of fingerprints.

figure a

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jiang, W., Wu, B., Yu, X., Xue, R., Yu, Z. (2020). Restructured Cloning Vulnerability Detection Based on Function Semantic Reserving and Reiteration Screening. In: Chen, L., Li, N., Liang, K., Schneider, S. (eds) Computer Security – ESORICS 2020. ESORICS 2020. Lecture Notes in Computer Science(), vol 12308. Springer, Cham. https://doi.org/10.1007/978-3-030-58951-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58951-6_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58950-9

  • Online ISBN: 978-3-030-58951-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics