short-paper

Transferring Well-Trained Models for Cross-Project Issue Classification: A Large-Scale Empirical Study

Authors:
Yue Yu

Laboratory of Software Engineering for Complex Systems, National University of Defense Technology, Changsha, Hunan, China

Laboratory of Software Engineering for Complex Systems, National University of Defense Technology, Changsha, Hunan, China
View Profile

,
Yarong Zeng

College of Computer, National University of Defense Technology, Changsha, Hunan, China

College of Computer, National University of Defense Technology, Changsha, Hunan, China
View Profile

,
Qiang Fan

College of Computer, National University of Defense Technology, Changsha, Hunan, China

College of Computer, National University of Defense Technology, Changsha, Hunan, China
View Profile

,
Huaimin Wang

College of Computer, National University of Defense Technology, Changsha, Hunan, China

College of Computer, National University of Defense Technology, Changsha, Hunan, China
View Profile

Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on InternetwareSeptember 2018Article No.: 18Pages 1–6https://doi.org/10.1145/3275219.3275237

Published:16 September 2018Publication History

Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on Internetware

Pages 1–6

ABSTRACT

In modern software engineering practices, various kinds of automated and intelligent methodologies have been proposed to improve the efficiency of collaborative development. However, most of those approaches are heavily dependent on supervised or semi-supervised learning technologies, which would be restricted by the lack of training data. Inspired by the theories and techniques of transfer learning, cross-project approaches have been proposed, but hard to achieve a consistent and desirable performances. In this paper, we conduct an extensive empirical study to capture the determinants that affect the performances of transferring reusable models across projects in the context of issue classification. Starting from a large-scale dataset, containing 799 OSS projects and more than 795,000 issues, we have extracted 28 attributes grouped into 4 different dimensions. The results show that the performance of cross-project issue classification based on model transferring is sensitive and unstable, which is influenced by multiple factors spreading among transferred model training, project construction, and technical and socail relations between source and target.

References

John Anvik, Lyndon Hiew, and Gail C Murphy. 2006. Who should fix this bug?. In Proceedings of the 28th international conference on Software engineering. ACM, 361--370. Google ScholarDigital Library
Philip Bille. 2005. A survey on tree edit distance and related problems. Theoretical computer science 337, 1 (2005), 217--239. Google ScholarDigital Library
Kelly Blincoe, Francis Harrison, and Daniela Damian. 2015. Ecosystems in GitHub and a method for ecosystem identification using reference coupling. In Proceedings of the 12th Working Conference on Mining Software Repositories. IEEE Press, 202--207. Google ScholarDigital Library
Jacob Cohen, Patricia Cohen, Stephen G West, and Leona S Aiken. 2013. Applied multiple regression/correlation analysis for the behavioral sciences. Routledge.Google Scholar
Qiang Fan, Yue Yu, Gang Yin, Tao Wang, and Huaimin Wang. 2017. Where Is the Road for Issue Reports Classification Based on Text Mining?. In Empirical Software Engineering and Measurement (ESEM), 2017 ACM/IEEE International Symposium on. IEEE, 121--130. Google ScholarDigital Library
Github Help. 2018. About organizations. https://help.github.com/articles/about-organizations/. (2018).Google Scholar
Kim Herzig, Sascha Just, and Andreas Zeller. 2013. It's not a bug, it's a feature: how misclassification impacts bug prediction. In Proceedings of the 2013 international conference on software engineering. IEEE Press, 392--401. Google ScholarDigital Library
Gaeul Jeong, Sunghun Kim, and Thomas Zimmermann. 2009. Improving bug triage with bug tossing graphs. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM, 111--120. Google ScholarDigital Library
Rahul Krishna and Tim Menzies. 2017. Simpler Transfer Learning (Using" Bellwethers"). arXiv preprint arXiv:1703.06218 (2017).Google Scholar
Rahul Krishna, Tim Menzies, and Wei Fu. 2016. Too much automation? The bellwether effect and its implications for transfer learning. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 122--131. Google ScholarDigital Library
Ying Ma, Guangchun Luo, Xue Zeng, and Aiguo Chen. 2012. Transfer learning for cross-company software defect prediction. Information and Software Technology 54, 3 (2012), 248--256. Google ScholarDigital Library
Jaechang Nam, Sinno Jialin Pan, and Sunghun Kim. 2013. Transfer defect learning. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 382--391. Google ScholarDigital Library
Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N Nguyen, David Lo, and Chengnian Sun. 2012. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. ACM, 70--79. Google ScholarDigital Library
Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10 (2010), 1345--1359. Google ScholarDigital Library
Foyzur Rahman and Premkumar Devanbu. 2013. How, and why, process metrics are better. In Software Engineering (ICSE), 2013 35th International Conference on. IEEE, 432--441. Google ScholarDigital Library
Foyzur Rahman, Daryl Posnett, and Premkumar Devanbu. 2012. Recalling the imprecision of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, 61. Google ScholarDigital Library
Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Raula Gaikovina Kula, Norihiro Yoshida, Hajimu Iida, and Ken-ichi Matsumoto. 2015. Who should review my code? A file location-based code-reviewer recommendation approach for modern code review. In Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on. IEEE, 141--150.Google Scholar
Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Influence of social and technical factors for evaluating contribution in GitHub. In Proceedings of the 36th international conference on Software engineering. ACM, 356--366. Google ScholarDigital Library
Burak Turhan, Tim Menzies, Ayşe B Bener, and Justin Di Stefano. 2009. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering 14, 5 (2009), 540--578. Google ScholarDigital Library
Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. 2008. An approach to detecting duplicate bug reports using natural language and execution information. In Software Engineering, 2008. ICSE'08. ACM/IEEE 30th International Conference on. IEEE, 461--470. Google ScholarDigital Library
Yue Yu, Zhixing Li, Gang Yin, Tao Wang, and Huaimin Wang. 2018. A dataset of duplicate pull-requests in github. In Proceedings of the 15th International Conference on Mining Software Repositories. ACM, 22--25. Google ScholarDigital Library
Yue Yu, Huaimin Wang, Vladimir Filkov, Premkumar Devanbu, and Bogdan Vasilescu. 2015. Wait for it: Determinants of pull request evaluation latency on GitHub. In Mining software repositories (MSR), 2015 IEEE/ACM 12th working conference on. IEEE, 367--371. Google ScholarDigital Library
Yue Yu, Huaimin Wang, Gang Yin, and Charles X Ling. 2014. Reviewer recommender of pull-requests in GitHub. In IEEE International Conference on Software Maintenance and Evolution. IEEE, 609--612. Google ScholarDigital Library
Yue Yu, Huaimin Wang, Gang Yin, and Tao Wang. 2016. Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? Information and Software Technology 74 (2016), 204--218. Google ScholarDigital Library
Kaizhong Zhang and Dennis Shasha. 1989. Simple fast algorithms for the editing distance between trees and related problems. SIAM journal on computing 18, 6 (1989), 1245--1262. Google ScholarDigital Library
Y. Zhou, Y. Tong, R. Gu, and H. Gall. 2014. Combining Text Mining and Data Mining for Bug Report Classification. In 2014 IEEE International Conference on Software Maintenance and Evolution. IEEE Press, 311--320. Google ScholarDigital Library
Yu Zhou, Yanxiang Tong, Ruihang Gu, and Harald Gall. 2016. Combining text mining and data mining for bug report classification. Journal of Software: Evolution and Process 28, 3 (2016), 150--176. Google ScholarDigital Library
Thomas Zimmermann, Nachiappan Nagappan, Harald Gall, Emanuel Giger, and Brendan Murphy. 2009. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM, 91--100. Google ScholarDigital Library

Index Terms

Transferring Well-Trained Models for Cross-Project Issue Classification: A Large-Scale Empirical Study
1. Software and its engineering
  1. Software notations and tools
    1. Software maintenance tools

Recommendations

Cross-project bug type prediction based on transfer learning
Abstract
The prediction of bug types provides useful insights into the software maintenance process. It can improve the efficiency of software testing and help developers adopt corresponding strategies to fix bugs before releasing software projects. ...
Read More
Cross-project clone consistent-defect prediction via transfer-learning method
Abstract
Code clones are comparable code snippets that are introduced into software by developers in order to increase software development productivity. A change to code clone may result in a consistent-defect if the developers ...
Highlights
- We propose a clone cross-project consistent-defect prediction approach for both clone-creating and changing times.
Read More
An investigation on the feasibility of cross-project defect prediction

Software defect prediction helps to optimize testing resources allocation by identifying defect-prone modules prior to testing. Most existing models build their prediction capability based on a set of historical data, presumably from the same or similar ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on Internetware
September 2018
167 pages
ISBN:9781450365901
DOI:10.1145/3275219

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 September 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cross-Project
Issue Classification;
Transfer Learning
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Internetware '18 Paper Acceptance Rate20of26submissions,77%Overall Acceptance Rate55of111submissions,50%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 76
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Transferring Well-Trained Models for Cross-Project Issue Classification: A Large-Scale Empirical Study

Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on Internetware

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cross-project bug type prediction based on transfer learning

Cross-project clone consistent-defect prediction via transfer-learning method

An investigation on the feasibility of cross-project defect prediction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Transferring Well-Trained Models for Cross-Project Issue Classification: A Large-Scale Empirical Study

Internetware '18: Proceedings of the 10th Asia-Pacific Symposium on Internetware

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cross-project bug type prediction based on transfer learning

Cross-project clone consistent-defect prediction via transfer-learning method

An investigation on the feasibility of cross-project defect prediction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media