Abstract
The existing advanced automatic vulnerability detection methods based on source code are mainly learning-based, such as machine learning and deep learning. These models can capture the vulnerability pattern through learning, which is more automatic and intelligent. However, the outputs of many learning-based vulnerability detection models are unexplainable, even though they usually show high accuracy. It’s meaningful to verify the credibility of the models so that we can better understand and use them in practice. To alleviate the above issue, we use an interpretation method called LIME to explain the learning-based automatic vulnerability detection model. For one thing, the preprocessing methods are all interpretable, including symbolization and vector representation, where the Bag of words model is chosen for source code vector representation. For another, the vulnerability detection models we select are based on Logistic Regression and Bi-LSTM. The former is interpretable, which is used to verify the effectiveness of LIME in the field of source code vulnerability detection. The latter is unexplained that is interpreted by LIME to its credibility on source code vulnerability detection. The experimental results show that LIME can effectively explain the learning-based automatic vulnerability detection model. Moreover, we find that under the condition of local interpretation, the predictions of the model based on Bi-LSTM are credible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amershi, S., Chickering, et al.: Modeltracker: redesigning performance analysis tools for machine learning. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 337–346 (2015)
Checkmarx: In: https://www.checkmarx.com/
Chernis, B., Verma, R.: Machine learning methods for software vulnerability detection. In: Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics, pp. 31–39 (2018)
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches, pp. 103–111. Association for Computational Linguistics (2014). https://doi.org/10.3115/v1/W14-4012
Dai, W., Qiu, M., Qiu, L., Chen, L., Wu, A.: Who moved my data? privacy protection in smartphones. IEEE Commun. Mag. 55(1), 20–25 (2017)
FlawFinder: In: http://www.dwheeler.com/flawfinder
Fortify, H.: In: https://www.hpfod.com/
Gai, K., Qiu, M.: Optimal resource allocation using reinforcement learning for iot content-centric services. Appl. Soft Comput. 70, 12–21 (2018)
Gai, K., Qiu, M.: Reinforcement learning-based content-centric services in mobile sensing. IEEE Netw. 32(4), 34–39 (2018)
Gai, K., Qiu, M., Zhao, H., Sun, X.: Resource management in sustainable cyber-physical systems using heterogeneous cloud computing. IEEE Transactions on Sustainable Computing, pp. 1–1 (2018)
Gai, K., Qiu, M., Elnagdy, S.A.: Security-aware information classifications using supervised learning for cloud-based cyber risk management in financial big data. In: 2016 IEEE 2nd International Conference on Big Data Security on Cloud, pp. 197–202. IEEE (2016)
Gai, K., Qiu, M., Sun, X., Zhao, H.: Security and privacy issues: a survey on fintech. In: International Conference on Smart Computing and Communication, pp. 236–247. Springer, Cham (2016)
Gai, K., Qiu, M., Zhao, H., Dai, W.: Anti-counterfeit scheme using monte carlo simulation for e-commerce in cloud systems. In: 2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing, pp. 74–79. IEEE (2015)
Gai, K., Wu, Y., Zhu, L., Zhang, Z., Qiu, M.: Differential privacy-based blockchain for industrial internet-of-things. IEEE Trans. Ind. Inf. 16(6), 4156–4165 (2019)
Groce, A., Kulesza, T., Zhang, et al.: You are the only possible oracle: effective test selection for end users of interactive machine learning systems. IEEE Trans. Softw. Eng.40(3), 307–323 (2013)
Harer, J.A., Kim, et al.: Automated software vulnerability detection with machine learning. CoRR abs/1803.04497 (2018)
Huang, T., Zhu, Y., Zhang, Qiu, M., et al.: An lof-based adaptive anomaly detection scheme for cloud computing. In: 2013 IEEE 37th Annual Computer Software and Applications Conference Workshops, pp. 206–211. IEEE (2013)
Kulesza, T., Burnett, M., Wong, W.K., Stumpf, S.: Principles of explanatory debugging to personalize interactive machine learning. In: Proceedings of the 20th international conference on intelligent user interfaces, pp. 126–137 (2015)
Li, Z., Zou, Deqing, A.O.: Vuldeepecker: a deep learning-based system for vulnerability detection. In: 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, 18–21 February 2018
Niu, J., Gao, Y., Qiu, M., Ming, Z.: Selecting proper wireless network interfaces for user experience enhancement with guaranteed probability. J. Parallel Distrib. Comput. 72(12), 1565–1575 (2012)
Qiu, M., Ming, Z., Wang, J., Yang, L.T., Xiang, Y.: Enabling cloud computing in emergency management systems. IEEE Cloud Comput. 1(4), 60–67 (2014)
Ribeiro, M.T., Singh, S., Guestrin, C.: “why should i trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144 (2016)
Savchenko, A., Fokin, O., Chernousov, A., Sinelnikova, O., Osadchyi, S.: Deedp: vulnerability detection and patching based on deep learning. Theor. Appl. Cybersecur. 2(1), 1–7 (2020)
Shuai, B., Li, H., Li, et al.: Automatic classification for vulnerability based on machine learning. In: 2013 IEEE International Conference on Information and Automation (ICIA), pp. 312–318. IEEE (2013)
Srikant, S., Lesimple, N., O’Reilly, U.M.: Dependency-based neural representations for classifying lines of programs. CoRR abs/2004.10166 (2020)
Tao, L., Golikov, S., Gai, K., Qiu, M.: A reusable software component for integrated syntax and semantic validation for services computing. In: 2015 IEEE Symposium on Service-Oriented System Engineering, pp. 127–132. IEEE (2015)
Thakur, K., Qiu, M., Gai, K., Ali, M.L.: An investigation on cyber security threats and security models. In: 2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing, pp. 307–311. IEEE (2015)
Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect prediction. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pp. 297–308. IEEE (2016)
Zhang, Q., Huang, T., Zhu, Y., Qiu, M.: A case study of sensor data collection and analysis in smart city: provenance in smart food supply chain. Int. J. Distrib. Sensor Netw. 9(11), 382132 (2013)
Zhang, Z., Wu, J., Deng, J., Qiu, M.: Jamming ack attack to wireless networks and a mitigation approach. In: IEEE GLOBECOM 2008–2008 IEEE Global Telecommunications Conference, pp. 1–5. IEEE (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Tang, G. et al. (2021). Interpretation of Learning-Based Automatic Source Code Vulnerability Detection Model Using LIME. In: Qiu, H., Zhang, C., Fei, Z., Qiu, M., Kung, SY. (eds) Knowledge Science, Engineering and Management. KSEM 2021. Lecture Notes in Computer Science(), vol 12817. Springer, Cham. https://doi.org/10.1007/978-3-030-82153-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-82153-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82152-4
Online ISBN: 978-3-030-82153-1
eBook Packages: Computer ScienceComputer Science (R0)