Skip to main content

SibStCNN and TBCNN + kNN-TED: New Models over Tree Structures for Source Code Classification

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2017 (IDEAL 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10585))

Abstract

This paper aims to solve a software engineering problem by applying several approaches to exploit tree representations of programs. Firstly, we propose a new sibling-subtree convolutional neural network (SibStCNN), and combination models of tree-based neural networks and k-Nearest Neighbors (kNN) for source code classification. Secondly, we present a pruning tree technique to reduce data dimension and strengthen classifiers. The experiments show that the proposed models outperform other methods, and the pruning tree leads to not only a substantial reduction in execution time but also an increase in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Our models are publicly available at https://github.com/nguyenlab/TBCNN_kNN_SVM.git.

References

  1. Binkley, D., Feild, H., Lawrie, D., Pighin, M.: Software fault prediction using language processing. In: Testing: Academic and Industrial Conference Practice and Research Techniques-MUTATION, 2007. TAICPART-MUTATION 2007, pp. 99–110. IEEE (2007)

    Google Scholar 

  2. Huo, X., Li, M., Zhou, Z.-H.: Learning unified features from natural and programming languages for locating buggy source code

    Google Scholar 

  3. Joachims, T.: Making large scale SVM learning practical. Technical report, Universität Dortmund (1999)

    Google Scholar 

  4. Kaur, J., Singh, S., Kahlon, K.S., Bassi, P.: Neural network-a novel technique for software effort estimation. Int. J. Comput. Theor. Eng. 2(1), 17 (2010)

    Article  Google Scholar 

  5. Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)

    Article  Google Scholar 

  6. Mo, R., Cai, Y., Kazman, R., Xiao, L., Feng, Q.: Decoupling level: a new metric for architectural maintenance complexity. In: Proceedings of the 38th International Conference on Software Engineering, pp. 499–510. ACM (2016)

    Google Scholar 

  7. Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  8. Phan, V.A., Chau, N.P., Nguyen, M.L.: Exploiting tree structures for classifying programs by functionalities. In: 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE), pp. 85–90. IEEE (2016)

    Google Scholar 

  9. Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)

    Google Scholar 

  10. Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)

    Google Scholar 

  11. Ugurel, S., Krovetz, R., Giles, C.L.: What’s the code?: automatic classification of source code archives. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 632–638. ACM (2002)

    Google Scholar 

  12. Wang, S., Yao, X.: Using class imbalance learning for software defect prediction. IEEE Trans. Reliab. 62(2), 434–443 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported partly by JSPS KAKENHI Grant number 15K16048 and the first author would like to thank the scholarship from Ministry of Training and Education (MOET), Vietnam under the project 911.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minh Le Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Phan, A.V., Nguyen, M.L., Bui, L.T. (2017). SibStCNN and TBCNN + kNN-TED: New Models over Tree Structures for Source Code Classification. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2017. IDEAL 2017. Lecture Notes in Computer Science(), vol 10585. Springer, Cham. https://doi.org/10.1007/978-3-319-68935-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68935-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68934-0

  • Online ISBN: 978-3-319-68935-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics