Source code fragment summarization with small-scale crowdsourcing based features

Nazar, Najam; Jiang, He; Gao, Guojun; Zhang, Tao; Li, Xiaochen; Ren, Zhilei

doi:10.1007/s11704-015-4409-2

Source code fragment summarization with small-scale crowdsourcing based features

Research Article
Published: 22 February 2016

Volume 10, pages 504–517, (2016)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Najam Nazar¹,
He Jiang^1,2,
Guojun Gao¹,
Tao Zhang³,
Xiaochen Li¹ &
…
Zhilei Ren¹

189 Accesses
37 Citations
Explore all metrics

Abstract

Recent studies have applied different approaches for summarizing software artifacts, and yet very few efforts have been made in summarizing the source code fragments available on web. This paper investigates the feasibility of generating code fragment summaries by using supervised learning algorithms.We hire a crowd of ten individuals from the same work place to extract source code features on a corpus of 127 code fragments retrieved from Eclipse and Net- Beans Official frequently asked questions (FAQs). Human annotators suggest summary lines. Our machine learning algorithms produce better results with the precision of 82% and performstatistically better than existing code fragment classifiers. Evaluation of algorithms on several statistical measures endorses our result. This result is promising when employing mechanisms such as data-driven crowd enlistment improve the efficacy of existing code fragment classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of Textual Similarity Techniques in Code Level Traceability

A Machine Learning Approach for Source Code Similarity via Graph-Focused Features

A review of automatic source code summarization

Article 07 October 2024

References

Haiduc S, Aponte J, Moreno L, Marcus A. On the use of automated text summarization techniques for summarizing source code. In: Proceedings of the 17th Working Conference on Reverse Engineering. 2010, 35–44
Google Scholar
Cutrell E, Guan ZW. What are you looking for?: an eye-tracking study of information usage in Web search. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2007, 407–416
Chapter Google Scholar
Ying A T T, Robillard M P. Code fragment summarization. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 2013, 655–658
Google Scholar
Haiduc S, Aponte J, Marcus A. Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 223–226
Google Scholar
Eddy B P, Robinson J A, Kraft N A, Carver J C. Evaluating source code summarization techniques: replication and expansion. In: Proceedings of the 21st IEEE International Conference on Program Comprehension. 2013, 13–22
Google Scholar
Moreno L, Aponte J. On the analysis of human and automatic summaries of source code. CLEI Electronic Journal, 2012, 15(2): 2
Google Scholar
Rastkar S, Murphy G C, Bradley A W J. Generating natural language summaries for crosscutting source code concerns. In: Proceedings of the 27th IEEE International Conference on Software Maintenance. 2011, 103–112
Google Scholar
Moreno L, Aponte J, Sridhara G, Marcus A, Pollock L, Vijay-Shanker K. Automatic generation of natural language summaries for Java classes. In: Proceedings of the 21st IEEE International Conference on Program Comprehension. 2013, 23–32
Google Scholar
Moreno L, Marcus A, Pollock L, Vijay-Shanker K. JSummarizer: an automatic generator of natural language summaries for Java classes. In: Proceedings of the 21st IEEE International Conference on Program Comprehension. 2013, 230–232
Google Scholar
Sridhara G, Hill E, Muppaneni D, Pollock L, Vijay-Shanker K. Towards automatically generating summary comments for Java methods. In: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering. 2010, 43–52
Chapter Google Scholar
Jiang H, Xuan J F, Ren Z L, Wu Y X, Wu X D. Misleading classification. Science China Information Sciences, 2014, 57(5): 1–17
MATH Google Scholar
Rastkar S, Murphy G C, Murray G. Summarizing software artifacts: a case study of bug reports. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 505–514
Google Scholar
Rastkar S, Murphy G C, Murray G. Automatic summarization of bug reports. IEEE Transactions on Software Engineering, 2014, 40(4): 366–380
Article Google Scholar
Mani S, Catherine R, Sinha V S, Dubey A. Ausum: approach for unsupervised bug report summarization. In: Proceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering. 2012, 1–11
Google Scholar
Radev D R, Jing H Y, Styś M, Tam D. Centroid-based summarization of multiple documents. Information Processing and Management, 2004, 40(6): 919–938
Article MATH Google Scholar
Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1998, 335–336
Google Scholar
Zhu X J, Goldberg A B, Gael J V, Andrzejewski D. Improving diversity in ranking using absorbing random walks. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 2007, 97–104
Google Scholar
Mei Q Z, Guo J, Radev D. Divrank: the interplay of prestige and diversity in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 1009–1018
Chapter Google Scholar
Lotufo R, Malik Z, Czarnecki K. Modelling the ‘Hurried’ bug report reading process to summarize bug reports. In: Proceedings of the 28th IEEE International Conference on Software Maintenance. 2012, 430–439
Google Scholar
Xuan J F, Jiang H, Hu Y, Ren Z L, Zou W Q, Luo Z X, Wu X D. Towards effective bug triage with software data reduction techniques. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(1): 264–280
Article Google Scholar
Xuan J F, Jiang H, Ren Z L, Luo Z X. Solving the large scale next release problem with a backbone-based multilevel algorithm. IEEE Transactions on Software Engineering, 2012, 38(5): 1195–1212
Article Google Scholar
Lloret E, Plaza L, Aker A. Analyzing the capabilities of crowdsourcing services for text summarization. Language Resources and Evaluation, 2013, 47(2): 337–369
Article Google Scholar
Hong S G, Shin S, Yi M Y. Contextual keyword extraction by building sentences with crowdsourcing. Multimedia Tools Applications, 2014, 68(2): 401–412
Article Google Scholar
Mizuyama H, Yamashita K, Hitomi K, Anse M. A prototype crowdsourcing approach for document summarization service. Sustainable Production and Service Supply Chains. 2013, 415: 435–442
Article Google Scholar
Carletta J. Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics, 1996, 22(2): 249–254
Google Scholar
Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960, 20(1): 37
Article Google Scholar
Zhao Y X, Zhu Q H. Evaluation on crowdsourcing research: current status and future direction. Information Systems Frontiers, 2014, 16(3): 417–434
Article Google Scholar
Howe J. The rise of crowdsourcing. Wired Magazine, 2006, 14(6): 1–4
MathSciNet Google Scholar
Greengard S. Following the crowd. Communications of the ACM, 2011, 54(2): 20–22
Article Google Scholar
Riedl C, Blohm I, Leimeister J M, Krcmar H. Rating scales for collective intelligence in innovation communities: why quick and easy decision making does not get it right. In: Proceedings of the International Conference on Information Systems. 2010, 52
Google Scholar
Whitla P. Crowdsourcing and its application in marketing activities. Contemporary Management Research, 2009, 5(1): 15–28
Article Google Scholar
Hsueh P Y, Melville P, Sindhwani V. Data quality from crowdsourcing: a study of annotation selection criteria. In: Proceedings of the NAACL HLT 2009 workshop on active learning for natural language processing. 2009, 27–35
Chapter Google Scholar
Allahbakhsh M, Benatallah B, Ignjatovic A, Motahari-Nezhad H R, Bertino E, Dustdar S. Quality control in crowdsourcing systems: issues and directions. IEEE Internet Computing, 2013, 17(2): 76–81
Article Google Scholar
Lofi C, Selke J, Balke W T. Information extraction meets crowdsourcing: a promising couple. Datenbank-Spektrum, 2012, 12(2): 109–120
Article Google Scholar
Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27
Article Google Scholar
Fawcett T. Roc graphs: notes and practical considerations for researchers. Machine Learning, 2004, 31: 1–38
MathSciNet Google Scholar
Hassan S, Rafi M, Shaikh M S. Comparing SVM and naive bayes classifiers for text categorization with wikitology as knowledge enrichment. In: Proceedings of 2011 IEEE 14th International Multitopic Conference. 2011, 31–34
Chapter Google Scholar
Jaakkola T, Diekhans M, Haussler D. Using the fisher kernel method to detect remote protein homologies. In: Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology. 1999, 149–158
Google Scholar
Chen Y W, Lin C J. Combining SVMs with various feature selection strategies. Studies in Fuzziness and Soft Computing, 2006, 207: 315–324
Article Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, School of Software, Dalian University of Technology, Dalian, 116621, China
Najam Nazar, He Jiang, Guojun Gao, Xiaochen Li & Zhilei Ren
State Key Laboratory of Software Engineering, Wuhan University, Wuhan, 430072, China
He Jiang
Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
Tao Zhang

Authors

Najam Nazar
View author publications
Search author on:PubMed Google Scholar
He Jiang
View author publications
Search author on:PubMed Google Scholar
Guojun Gao
View author publications
Search author on:PubMed Google Scholar
Tao Zhang
View author publications
Search author on:PubMed Google Scholar
Xiaochen Li
View author publications
Search author on:PubMed Google Scholar
Zhilei Ren
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to He Jiang.

Additional information

Najam Nazar received his BS degree in Computer Science from University of the Punjab, Lahore, Pakistan in 2005 and MS degree in Software Engineering from Chalmers University of Technology, Sweden in 2010. He is currently working towards his PhD degree in Software Engineering at Dalian University of Technology, China. His current research interest includes mining software repositories, data mining, natural language processing, machine learning, software product lines, and agile methodologies.

He Jiang received the PhD degree in computer science from the University of Science and Technology of China, China. He is currently a Professor in Dalian University of Technology, China. His current research interests include computational intelligence and its applications in software engineering and data mining. He is also a member of the ACM and the CCF.

Guojun Gao received his Bachelor’s Degree in Software Engineering from School of Software, Dalian University of Technology, China in 2014. Currently, he is pursuing MS degree in Software Engineering from the same university. His research interests include the defects prediction, detection in software engineering.

Tao Zhang received the BE, ME degrees in Automation and Software Engineering from Northeastern University, China, in 2005 and 2008, respectively. He received the PhD degree in Computer Science from University of Seoul, South Korea in 2013. He was a research professor at the University of Seoul, South Korea from 2013 to 2014. Currently, he is a postdoctoral fellow at the Hong Kong Polytechnic University, China. His research interest includes mining software maintenance, security and privacy for mobile apps, and recommendation systems.

Xiaochen Li received the BS degree in software engineering from the Dalian University of Technology, China in 2015. He is currently a PhD candidate in Dalian University of Technology. His current research interest is mining software repositories in software engineering.

Zhilei Ren received the BS degree in Software Engineering and the PhD degree in computational mathematics from the Dalian University of Technology, China in 2007 and 2013, respectively. He is currently a lecturer in Dalian University of Technology. His current research interests include evolutionary computation and its applications in software engineering. He is a member of the ACM and the CCF.

Electronic supplementary material

Supplementary material, approximately 921 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nazar, N., Jiang, H., Gao, G. et al. Source code fragment summarization with small-scale crowdsourcing based features. Front. Comput. Sci. 10, 504–517 (2016). https://doi.org/10.1007/s11704-015-4409-2

Download citation

Received: 12 September 2014
Accepted: 15 October 2015
Published: 22 February 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11704-015-4409-2

Keywords

Profiles

Najam Nazar View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Source code fragment summarization with small-scale crowdsourcing based features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluation of Textual Similarity Techniques in Code Level Traceability

A Machine Learning Approach for Source Code Similarity via Graph-Focused Features

A review of automatic source code summarization

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 921 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now