SibStCNN and TBCNN + kNN-TED: New Models over Tree Structures for Source Code Classification

Phan, Anh Viet; Nguyen, Minh Le; Bui, Lam Thu

doi:10.1007/978-3-319-68935-7_14

Anh Viet Phan^22,23,
Minh Le Nguyen²² &
Lam Thu Bui²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10585))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

2195 Accesses
2 Citations

Abstract

This paper aims to solve a software engineering problem by applying several approaches to exploit tree representations of programs. Firstly, we propose a new sibling-subtree convolutional neural network (SibStCNN), and combination models of tree-based neural networks and k-Nearest Neighbors (kNN) for source code classification. Secondly, we present a pruning tree technique to reduce data dimension and strengthen classifiers. The experiments show that the proposed models outperform other methods, and the pruning tree leads to not only a substantial reduction in execution time but also an increase in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our models are publicly available at https://github.com/nguyenlab/TBCNN_kNN_SVM.git.

References

Binkley, D., Feild, H., Lawrie, D., Pighin, M.: Software fault prediction using language processing. In: Testing: Academic and Industrial Conference Practice and Research Techniques-MUTATION, 2007. TAICPART-MUTATION 2007, pp. 99–110. IEEE (2007)
Google Scholar
Huo, X., Li, M., Zhou, Z.-H.: Learning unified features from natural and programming languages for locating buggy source code
Google Scholar
Joachims, T.: Making large scale SVM learning practical. Technical report, Universität Dortmund (1999)
Google Scholar
Kaur, J., Singh, S., Kahlon, K.S., Bassi, P.: Neural network-a novel technique for software effort estimation. Int. J. Comput. Theor. Eng. 2(1), 17 (2010)
Article Google Scholar
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)
Article Google Scholar
Mo, R., Cai, Y., Kazman, R., Xiao, L., Feng, Q.: Decoupling level: a new metric for architectural maintenance complexity. In: Proceedings of the 38th International Conference on Software Engineering, pp. 499–510. ACM (2016)
Google Scholar
Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Phan, V.A., Chau, N.P., Nguyen, M.L.: Exploiting tree structures for classifying programs by functionalities. In: 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), pp. 85–90. IEEE (2016)
Google Scholar
Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)
Google Scholar
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)
Google Scholar
Ugurel, S., Krovetz, R., Giles, C.L.: What’s the code?: automatic classification of source code archives. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 632–638. ACM (2002)
Google Scholar
Wang, S., Yao, X.: Using class imbalance learning for software defect prediction. IEEE Trans. Reliab. 62(2), 434–443 (2013)
Article Google Scholar

Download references

Acknowledgements

This work was supported partly by JSPS KAKENHI Grant number 15K16048 and the first author would like to thank the scholarship from Ministry of Training and Education (MOET), Vietnam under the project 911.

Author information

Authors and Affiliations

Japan Advanced Institute of Information Technology, Nomi, Japan
Anh Viet Phan & Minh Le Nguyen
Le Quy Don Technical University, Hanoi, Vietnam
Anh Viet Phan & Lam Thu Bui

Authors

Anh Viet Phan
View author publications
You can also search for this author in PubMed Google Scholar
Minh Le Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Lam Thu Bui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minh Le Nguyen .

Editor information

Editors and Affiliations

University of Manchester, Manchester, United Kingdom
Hujun Yin
School of Electronic and Electrical Engineering, Nanjing University, Nanjiing, China
Yang Gao
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Songcan Chen
Guilin University of Electronic Technology, Guilin, China
Yimin Wen
Guilin University of Electronic Technology, Guilin, China
Guoyong Cai
Guilin University of Electronic Technology, Guilin, China
Tianlong Gu
Beijing University of Posts and Telecommunications, Beijing, China
Junping Du
University of Seville, Seville, Spain
Antonio J. Tallón-Ballesteros
Southeast University, Nanjing, China
Minling Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Phan, A.V., Nguyen, M.L., Bui, L.T. (2017). SibStCNN and TBCNN + kNN-TED: New Models over Tree Structures for Source Code Classification. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2017. IDEAL 2017. Lecture Notes in Computer Science(), vol 10585. Springer, Cham. https://doi.org/10.1007/978-3-319-68935-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-68935-7_14
Published: 06 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68934-0
Online ISBN: 978-3-319-68935-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics