Cross-Project Issue Classification Based on Ensemble Modeling in a Social Coding World

Zeng, Yarong; Yu, Yue; Fan, Qiang; Zhang, Xunhui; Wang, Tao; Yin, Gang; Wang, Huaimin

doi:10.1007/978-3-030-04212-7_24

Yarong Zeng¹⁶,
Yue Yu¹⁶,
Qiang Fan¹⁶,
Xunhui Zhang¹⁶,
Tao Wang¹⁶,
Gang Yin¹⁶ &
…
Huaimin Wang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11304))

Included in the following conference series:

International Conference on Neural Information Processing

Abstract

The simplified and deformalized contribution mechanisms in social coding are attracting more and more contributors involved in the collaborative software development. To reduce the burden on the side of project core team, various kinds of automated and intelligent approaches have been proposed based on machine learning and data mining technologies, which would be restricted by the lack of training data. In this paper, we conduct an extensive empirical study of transferring and aggregating reusable models across projects in the context of issue classification, based on a large-scale dataset including 799 open source projects and more than 795,000 issues. We propose a novel cross-project approach which integrate multiple models learned from various source projects to classify target project. We evaluate our approach through conducting comparative experiments with the within-project classification and a typical cross-project method called Bellwether. The results show that our cross-project approach based on ensemble modeling can obtain great performance, which comparable to the within-project classification and performs better than Bellwether.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

FENSE: A feature-based ensemble modeling approach to cross-project just-in-time defect prediction

Article 20 September 2022

Studying just-in-time defect prediction using cross-project models

Article 14 September 2015

Combined classifier for cross-project defect prediction: an extended empirical study

Article 15 February 2018

Notes

References

Antoniol, G., Ayari, K., Di Penta, M., Khomh, F., Guéhéneuc, Y.G.: Is it a bug or an enhancement?: A text-based approach to classify change requests. In: Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, p. 23. ACM (2008)
Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)
Article Google Scholar
Bettenburg, N., Nagappan, M., Hassan, A.E.: Think locally, act globally: improving defect and effort prediction models. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 60–69. IEEE (2012)
Google Scholar
Bissyandé, T.F., Lo, D., Jiang, L., Réveillere, L., Klein, J., Le Traon, Y.: Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. In: 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE), pp. 188–197. IEEE (2013)
Google Scholar
Fan, Q., Yu, Y., Yin, G., Wang, T., Wang, H.: Where is the road for issue reports classification based on text mining? In: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 121–130. IEEE (2017)
Google Scholar
Gousios, G., Pinzger, M., Deursen, A.V.: An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering, pp. 345–355. ACM (2014)
Google Scholar
He, P., Li, B., Ma, Y.: Towards cross-project defect prediction with imbalanced feature sets. arXiv preprint arXiv:1411.4228 (2014)
Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: The promises and perils of mining GitHub. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 92–101. ACM (2014)
Google Scholar
Kitchenham, B.A., Mendes, E., Travassos, G.H.: Cross versus within-company cost estimation studies: a systematic review. IEEE Trans. Softw. Eng. 33(5), 316–329 (2007)
Article Google Scholar
Konietschke, F., Hothorn, L.A., Brunner, E., et al.: Rank-based multiple test procedures and simultaneous confidence intervals. Electron. J. Stat. 6, 738–759 (2012)
Article MathSciNet Google Scholar
Konietschke, F., Placzek, M., Schaarschmidt, F., Hothorn, L.A.: nparcomp: An R software package for nonparametric multiple comparisons and simultaneous confidence intervals (2015)
Google Scholar
Krishna, R., Menzies, T., Fu, W.: Too much automation? The bellwether effect and its implications for transfer learning. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 122–131. ACM (2016)
Google Scholar
Lan, L., Tao, D., Gong, C., Guan, N., Luo, Z.: Online multi-object tracking by quadratic pseudo-boolean optimization. In: IJCAI, pp. 3396–3402 (2016)
Google Scholar
Ma, Y., Luo, G., Zeng, X., Chen, A.: Transfer learning for cross-company software defect prediction. Inf. Softw. Technol. 54(3), 248–256 (2012)
Article Google Scholar
Menzies, T., Butcher, A., Marcus, A., Zimmermann, T., Cok, D.: Local vs. global models for effort estimation and defect prediction. In: Automated Software Engineering, pp. 343–351. IEEE (2011)
Google Scholar
Merten, T., Falis, M., Hübner, P., Quirchmayr, T., Bürsner, S., Paech, B.: Software feature request detection in issue tracking systems. In: 2016 IEEE 24th International Requirements Engineering Conference (RE), pp. 166–175. IEEE (2016)
Google Scholar
Nagappan, N., Ball, T., Zeller, A.: Mining metrics to predict component failures. In: Proceedings of the 28th International Conference on Software Engineering, pp. 452–461. ACM (2006)
Google Scholar
Nam, J., Pan, S.J., Kim, S.: Transfer defect learning. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 382–391. IEEE Press (2013)
Google Scholar
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011)
Article Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Peters, F., Menzies, T., Marcus, A.: Better cross company defect prediction. In: Mining Software Repositories, pp. 409–418 (2013)
Google Scholar
Posnett, D., Filkov, V., Devanbu, P.: Ecological inference in empirical software engineering. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, pp. 362–371. IEEE Computer Society (2011)
Google Scholar
Premraj, R., Herzig, K.: Network versus code metrics to predict defects: a replication study. In: 2011 International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 215–224. IEEE (2011)
Google Scholar
Turhan, B., Menzies, T., Bener, A.B., Di Stefano, J.: On the relative value of cross-company and within-company data for defect prediction. Empirical Softw. Eng. 14(5), 540–578 (2009)
Article Google Scholar
Uddin, J., Ghazali, R., Deris, M.M., Naseem, R., Shah, H.: A survey on bug prioritization. Artif. Intell. Rev. 47(2), 145–180 (2017)
Article Google Scholar
Van Der Veen, E., Gousios, G., Zaidman, A.: Automatically prioritizing pull requests. In: Proceedings of the 12th Working Conference on Mining Software Repositories, pp. 357–361. IEEE Press (2015)
Google Scholar
Yu, Y., Wang, H., Filkov, V., Devanbu, P., Vasilescu, B.: Wait for it: determinants of pull request evaluation latency on GitHub. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR), pp. 367–371. IEEE (2015)
Google Scholar
Yu, Y., Wang, H., Yin, G., Wang, T.: Reviewer recommendation for pull-requests in github: what can we learn from code review and bug assignment? Inf. Softw. Technol. 74, 204–218 (2016)
Article Google Scholar
Zanetti, M.S., Scholtes, I., Tessone, C.J., Schweitzer, F.: Categorizing bugs with social networks: a case study on four open source software communities. In: Proceedings of the 35th International Conference on Software Engineering, pp. 1032–1041. IEEE (2013)
Google Scholar
Zhang, F., Mockus, A., Keivanloo, I., Zou, Y.: Towards building a universal defect prediction model. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 182–191. ACM (2014)
Google Scholar
Zhang, F., Zheng, Q., Zou, Y., Hassan, A.E.: Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proceedings of the 38th International Conference on Software Engineering, pp. 309–320. ACM (2016)
Google Scholar
Zhou, Y., Tong, Y., Gu, R., Gall, H.: Combining text mining and data mining for bug report classification. J. Softw. Evol. Process 28(3), 150–176 (2016)
Article Google Scholar
Zimmermann, T., Nagappan, N., Gall, H., Giger, E., Murphy, B.: Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 91–100. ACM (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory for Parallel and Distributed Processing, National University of Defence Technology, Changsha, China
Yarong Zeng, Yue Yu, Qiang Fan, Xunhui Zhang, Tao Wang, Gang Yin & Huaimin Wang

Authors

Yarong Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yue Yu
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Fan
View author publications
You can also search for this author in PubMed Google Scholar
Xunhui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Yin
View author publications
You can also search for this author in PubMed Google Scholar
Huaimin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yarong Zeng .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, Y. et al. (2018). Cross-Project Issue Classification Based on Ensemble Modeling in a Social Coding World. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11304. Springer, Cham. https://doi.org/10.1007/978-3-030-04212-7_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-04212-7_24
Published: 17 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04211-0
Online ISBN: 978-3-030-04212-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics