Mining authorship characteristics in bug repositories

Jiang, He; Zhang, Jingxuan; Ma, Hongjing; Nazar, Najam; Ren, Zhilei

doi:10.1007/s11432-014-0372-y

Mining authorship characteristics in bug repositories

缺陷仓库中写作风格挖掘

Research Paper
Published: 23 November 2016

Volume 60, article number 012107, (2017)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

He Jiang^1,2,3,
Jingxuan Zhang^1,2,
Hongjing Ma^1,2,
Najam Nazar^1,2 &
…
Zhilei Ren^1,2

183 Accesses
20 Citations
Explore all metrics

Abstract

Bug reports are widely employed to facilitate software tasks in software maintenance. Since bug reports are contributed by people, the authorship characteristics of contributors may heavily impact the perfor-mance of resolving software tasks. Poorly written bug reports may delay developers when fixing bugs. However, no in-depth investigation has been conducted over the authorship characteristics. In this study, we first leverage byte-level N-grams to model the authorship characteristics and employ Normalized Simplified Profile Intersection (NSPI) to identify the similarity of the authorship characteristics. Then, we investigate a series of properties related to contributors’ authorship characteristics, including the evolvement over time and the variation among distinct products in open source projects. Moreover, we show how to leverage the authorship characteristics to facilitate a well-known task in software maintenance, namely Bug Report Summarization (BRS). Experiments on open source projects validate that incorporating the authorship characteristics can effectively improve a state-of-the-art method in BRS. Our findings suggest that contributors should retain stable authorship characteristics and the authorship characteristics can assist in resolving software tasks.

创新点

本文创造性的利用比特级N元文法来为缺陷仓库中的贡献者的写作风格建模, 同时引入NSPI来度量两种写作风格之间的相似度。本文研究了贡献者写作风格的一些性质, 包括贡献者写作风格随时间的变化情况以及在不同产品的变化情况等。进而利用贡献者写作风格来帮助解决一个典型的软件维护任务, 即缺陷报告摘要。本文的实验数据已经公开。实验结果表明, 利用开发者写作风格能够有效的提升缺陷报告摘要的效果

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Pressman R S, Ince D. Software Engineering: A Practitioner’s Approach. New York: McGraw-Hill, 2010
Google Scholar
Anvik J, Hiew L, Murphy G C. Who should fix this bug? In: Proceedings of the 28th International Conference on Software Engineering, Shanghai, 2006. 361–370
Google Scholar
Anvik J, Murphy G C. Reducing the effort of bug report triage: recommenders for development-oriented decisions. ACM Trans Softw Eng Methodol, 2011, 20: 10
Article Google Scholar
Bishnu P S, Bhattacherjee V. Software fault prediction using Quad Tree-based K-means clustering algorithm. IEEE Trans Knowl Data Eng, 2012, 24: 1146–1150
Article Google Scholar
Shivaji S, Whitehead J, Akella R, et al. Reducing features to improve code change based bug prediction. IEEE Trans Softw Eng, 2012, 22: 1–17
Google Scholar
Artzi S, Kiezun A, Dolby J, et al. Finding bugs in web applications using dynamic test generation and explicit state model checking. IEEE Softw, 2010, 36: 474–494
Article Google Scholar
Zhou J, Zhang H Y, Lo D. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th International Conference on Software Engineering, Zurich, 2012. 14–24
Google Scholar
Mani S, Catherine R, Sinha V S, et al. AUSUM: approach for unsupervised bug report summarization. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, New York, 2012. 11–21
Google Scholar
Rastkar S, Murphy G C, Murray G. Automatic summarization of bug reports. IEEE Trans Softw Eng, 2014, 40: 366–380
Article Google Scholar
Lotufo R, Malik Z, Czarnecki K. Modelling the ‘hurrie’ bug report reading process to summarize bug report. In: Pro-ceedings of the International Conference on Software Maintenance, Trento, 2012. 430–439
Google Scholar
Zimmermann T, Premraj R, Bettenburg N, et al. What makes a good bug report? IEEE Trans Softw Eng, 2010, 36: 618–643
Article Google Scholar
Keselj V, Peng F, Cercone N, et al. N-gram based author profiles for authorship attribution. In: Proceedings of Pacific Association for Computational Linguistics, Harifax, 2003. 255–264
Google Scholar
Frantzeskou G, Stamatatos E, Gritzalis S, et al. Effective identification of source code authors using byte-level infor-mation. In: Proceedings of the 28th International Conference on Software Engineering, Shanghai, 2006. 893–896
Google Scholar
Herzig K, Just S, Zeller A. It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 35th International Conference on Software Engineering, San Francisco, 2013. 392–401
Google Scholar
Rahman F, Devanbu P. Ownership, experience and defects: a fine-grained study of authorship. In: Proceedings of the 33rd International Conference on Software Engineering, New York, 2011. 491–500
Google Scholar
Bird C, Nagappan N, Murphy B, et al. Don’t touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, New York, 2011. 4–14
Chapter Google Scholar
Burrows S, Uitdenbogerd A L, Turpin A. Comparing techniques for authorship attribution of source code. Softw Pract Exper, 2014, 44: 1–32
Article Google Scholar
Zou W Q, Xia X, Zhang W Q, et al. An empirical study of bug fixing rate. In: Proceedings of the 39th Annual International Computers, Software & Applications Conference, Taichung, 2015. 254–263
Google Scholar
Zhang R, Yu W Z, Sha C F, et al. Product-oriented review summarization and scoring. Front Comput Sci, 2015, 9: 210–223
Article MathSciNet Google Scholar
Nenkova A, Passonneau R. Evaluating content selection in summarization: the pyramid method. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Boston, 2004. 145–152
Google Scholar
Carenini G, Ng R T, Zhou X. Summarizing emails with conversational cohesion and subjectivity. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, New York, 2008. 353–361
Google Scholar
Xie T, Thummalapenta S, Lo D, et al. Data mining for software engineering. Computer, 2009, 8: 55–62
Article Google Scholar
Zhang W Q, Nie L M, Jiang H, et al. Developer social networks in software engineering: construction, analysis, and applications. Sci China Inf Sci, 2014, 57: 121101
Google Scholar
Jeong G, Kim S, Zimmermann T. Improving bug triage with tossing graphs. In: Proceedings Joint Meeting of 12th Eu-ropean Software Engineering Conference & 17th ACMSIGSOFT Symposium on Foundations of Software Engineering, Amsterdam, 2009. 111–120
Google Scholar
Xuan J F, Jiang H, Ren Z L, et al. Developer prioritization in bug repositories. In: Proceedings of 34th International Conference on Software Engineering, Zurich, 2012. 25–35
Google Scholar
Lotufo R, Czarnecki K. Improving Bug Report Comprehension. Technical Report GSDLAB-TR 2012-09-01, University of Waterloo, 2012
Google Scholar
Stamatatos E. A survey of modern authorship attribution methods. J Amer Soc Inf Sci Technol, 2009, 60: 538–556
Article Google Scholar
Stamatatos E, Fakotakis N, Kokkinakis G. Computer-based authorship attribution without lexical measures. Comput Hum, 2001, 35: 193–214
Article Google Scholar
Zheng R, Li J X, Chen H C, et al. A framework for authorship identification of online messages: writing style features and classification techniques. J Amer Soc Inf Sci Technol, 2006, 57: 378–393
Article Google Scholar
Kothari J, Shevertalov M, Stehle E, et al. A probabilistic approach to source code authorship identification. In: Pro-ceedings of the 4th International Conference on Information Technology, Las Vegas, 2007. 243–248
Google Scholar
Lange R, Mancoridis S. Using code metric histograms and genetic algorithms to perform author identification for software forensics. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, London, 2007. 2082–2089
Google Scholar
Shevertalov M, Kothari J, Stehle E, et al. On the use of discretised source code metrics for author identification. In: Proceedings of the 1st International Symposium on Search Based Software Engineering, Windsor, 2009. 69–78
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Dalian University of Technology, Dalian, 116621, China
He Jiang, Jingxuan Zhang, Hongjing Ma, Najam Nazar & Zhilei Ren
Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian, 116621, China
He Jiang, Jingxuan Zhang, Hongjing Ma, Najam Nazar & Zhilei Ren
State Key Laboratory of Software Engineering, Wuhan University, Wuhan, 430072, China
He Jiang

Authors

He Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jingxuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongjing Ma
View author publications
You can also search for this author in PubMed Google Scholar
Najam Nazar
View author publications
You can also search for this author in PubMed Google Scholar
Zhilei Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to He Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, H., Zhang, J., Ma, H. et al. Mining authorship characteristics in bug repositories. Sci. China Inf. Sci. 60, 012107 (2017). https://doi.org/10.1007/s11432-014-0372-y

Download citation

Received: 08 July 2015
Accepted: 27 August 2015
Published: 23 November 2016
DOI: https://doi.org/10.1007/s11432-014-0372-y

Keywords

关键词

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining authorship characteristics in bug repositories

Abstract

创新点

Access this article

Similar content being viewed by others

Bug Triage Model Considering Cooperative and Sequential Relationship

Practical Duplicate Bug Reports Detection in a Large Web-Based Development Community

Towards characterizing bug fixes through dependency-level changes in Apache Java open source projects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

关键词

Navigation

Mining authorship characteristics in bug repositories

Abstract

创新点

Access this article

Similar content being viewed by others

Bug Triage Model Considering Cooperative and Sequential Relationship

Practical Duplicate Bug Reports Detection in a Large Web-Based Development Community

Towards characterizing bug fixes through dependency-level changes in Apache Java open source projects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

关键词

Search

Navigation