Skip to main content

Towards One Reusable Model for Various Software Defect Mining Tasks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11441))

Abstract

Software defect mining is playing an important role in software quality assurance. Many deep neural network based models have been proposed for software defect mining tasks, and have pushed forward the state-of-the-art mining performance. These deep models usually require a huge amount of task-specific source code for training to capture the code functionality to mine the defects. But such requirement is often hard to be satisfied in practice. On the other hand, lots of free source code and corresponding textual explanations are publicly available in the open source software repositories, which is potentially useful in modeling code functionality. However, no previous studies ever leverage these resources to help defect mining tasks. In this paper, we propose a novel framework to learn one reusable deep model for code functional representation using the huge amount of publicly available task-free source code as well as their textual explanations. And then reuse it for various software defect mining tasks. Experimental results on three major defect mining tasks with real world datasets indicate that by reusing this model in specific tasks, the mining performance outperforms its counterpart that learns deep models from scratch, especially when the training data is insufficient.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://sourceforge.net/.

  2. 2.

    https://stackoverflow.com/.

  3. 3.

    https://archive.org/details/stackexchange.

References

  1. Alemi, M., Haghighi, H., Shahrivari, S.: CCFinder: using Spark to find clustering coefficient in big graphs. J. Supercomput. 73(11), 4683–4710 (2017)

    Article  Google Scholar 

  2. Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a Siamese time delay neural network. In: Advances in Neural Information Processing Systems, pp. 737–744 (1993)

    Google Scholar 

  3. D’Ambros, M., Lanza, M., Robbes, R.: Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir. Softw. Eng. 17(4–5), 531–577 (2012)

    Article  Google Scholar 

  4. Gay, G., Haiduc, S., Marcus, A., Menzies, T.: On the use of relevance feedback in IR-based concept location. In: Proceedings of the 25th IEEE International Conference on Software Maintenance, pp. 351–360 (2009)

    Google Scholar 

  5. Huo, X., Li, M.: Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 1909–1915 (2017)

    Google Scholar 

  6. Huo, X., Li, M., Zhou, Z.H.: Learning unified features from natural and programming languages for locating buggy source code. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, pp. 1606–1612 (2016)

    Google Scholar 

  7. Jiang, L., Misherghi, G., Su, Z., Glondu, S.: DECKARD: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on Software Engineering, pp. 96–105 (2007)

    Google Scholar 

  8. Johnson, R., Zhang, T.: Effective use of word order for text categorization with convolutional neural networks. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 103–112 (2015)

    Google Scholar 

  9. Kim, S., Zimmermann, T., Whitehead Jr., E.J., Zeller, A.: Predicting faults from cached history. In: Proceedings of the 29th International Conference on Software Engineering, pp. 489–498 (2007)

    Google Scholar 

  10. Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: Proceedings of the 32nd International Conference on Machine Learning Deep Learning Workshop, vol. 2 (2015)

    Google Scholar 

  11. Komondoor, R., Horwitz, S.: Using slicing to identify duplication in source code. In: Cousot, P. (ed.) SAS 2001. LNCS, vol. 2126, pp. 40–56. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-47764-0_3

    Chapter  Google Scholar 

  12. Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, pp. 2786–2792 (2016)

    Google Scholar 

  13. Roy, C.K., Cordy, J.R.: A survey on software clone detection research. Queen’s Sch. Comput. TR 541(115), 64–68 (2007)

    Google Scholar 

  14. Saha, R.K., Lease, M., Khurshid, S., Perry, D.E.: Improving bug localization using structured information retrieval. In: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, pp. 345–355 (2013)

    Google Scholar 

  15. de Souza, S.C.B., Anquetil, N., de Oliveira, K.M.: A study of the documentation essential to software maintenance. In: Proceedings of the 23rd Annual International Conference on Design of Communication: Documenting & Designing for Pervasive Information, pp. 68–75 (2005)

    Google Scholar 

  16. Svajlenko, J., Islam, J.F., Keivanloo, I., Roy, C.K., Mia, M.M.: Towards a big data curated benchmark of inter-project code clones. In: Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution, pp. 476–480 (2014)

    Google Scholar 

  17. Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect prediction. In: Proceedings of the 38th International Conference on Software Engineering, pp. 297–308 (2016)

    Google Scholar 

  18. Wei, H.H., Li, M.: Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 3034–3040 (2017)

    Google Scholar 

  19. White, M., Tufano, M., Vendome, C., Poshyvanyk, D.: Deep learning code fragments for code clone detection. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 87–98 (2016)

    Google Scholar 

  20. Yang, X., Lo, D., Xia, X., Zhang, Y., Sun, J.: Deep learning for just-in-time defect prediction. In: Proceedings of the 2015 IEEE International Conference on Software Quality, Reliability and Security, pp. 17–26 (2015)

    Google Scholar 

  21. Zhou, J., Zhang, H., Lo, D.: Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering, pp. 14–24 (2012)

    Google Scholar 

  22. Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for Eclipse. In: Proceedings of the 3rd International Workshop on Predictor Models in Software Engineering, p. 9 (2007)

    Google Scholar 

Download references

Acknowledgment

This research was supported by National Key Research and Development Program (2017YFB1001903) and NSFC (61751306).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, HY., Li, M., Zhou, ZH. (2019). Towards One Reusable Model for Various Software Defect Mining Tasks. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11441. Springer, Cham. https://doi.org/10.1007/978-3-030-16142-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-16142-2_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16141-5

  • Online ISBN: 978-3-030-16142-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics