Skip to main content
Log in

Source code fragment summarization with small-scale crowdsourcing based features

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Recent studies have applied different approaches for summarizing software artifacts, and yet very few efforts have been made in summarizing the source code fragments available on web. This paper investigates the feasibility of generating code fragment summaries by using supervised learning algorithms.We hire a crowd of ten individuals from the same work place to extract source code features on a corpus of 127 code fragments retrieved from Eclipse and Net- Beans Official frequently asked questions (FAQs). Human annotators suggest summary lines. Our machine learning algorithms produce better results with the precision of 82% and performstatistically better than existing code fragment classifiers. Evaluation of algorithms on several statistical measures endorses our result. This result is promising when employing mechanisms such as data-driven crowd enlistment improve the efficacy of existing code fragment classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Haiduc S, Aponte J, Moreno L, Marcus A. On the use of automated text summarization techniques for summarizing source code. In: Proceedings of the 17th Working Conference on Reverse Engineering. 2010, 35–44

    Google Scholar 

  2. Cutrell E, Guan ZW. What are you looking for?: an eye-tracking study of information usage in Web search. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2007, 407–416

    Chapter  Google Scholar 

  3. Ying A T T, Robillard M P. Code fragment summarization. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 2013, 655–658

    Google Scholar 

  4. Haiduc S, Aponte J, Marcus A. Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 223–226

    Google Scholar 

  5. Eddy B P, Robinson J A, Kraft N A, Carver J C. Evaluating source code summarization techniques: replication and expansion. In: Proceedings of the 21st IEEE International Conference on Program Comprehension. 2013, 13–22

    Google Scholar 

  6. Moreno L, Aponte J. On the analysis of human and automatic summaries of source code. CLEI Electronic Journal, 2012, 15(2): 2

    Google Scholar 

  7. Rastkar S, Murphy G C, Bradley A W J. Generating natural language summaries for crosscutting source code concerns. In: Proceedings of the 27th IEEE International Conference on Software Maintenance. 2011, 103–112

    Google Scholar 

  8. Moreno L, Aponte J, Sridhara G, Marcus A, Pollock L, Vijay-Shanker K. Automatic generation of natural language summaries for Java classes. In: Proceedings of the 21st IEEE International Conference on Program Comprehension. 2013, 23–32

    Google Scholar 

  9. Moreno L, Marcus A, Pollock L, Vijay-Shanker K. JSummarizer: an automatic generator of natural language summaries for Java classes. In: Proceedings of the 21st IEEE International Conference on Program Comprehension. 2013, 230–232

    Google Scholar 

  10. Sridhara G, Hill E, Muppaneni D, Pollock L, Vijay-Shanker K. Towards automatically generating summary comments for Java methods. In: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering. 2010, 43–52

    Chapter  Google Scholar 

  11. Jiang H, Xuan J F, Ren Z L, Wu Y X, Wu X D. Misleading classification. Science China Information Sciences, 2014, 57(5): 1–17

    MATH  Google Scholar 

  12. Rastkar S, Murphy G C, Murray G. Summarizing software artifacts: a case study of bug reports. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 505–514

    Google Scholar 

  13. Rastkar S, Murphy G C, Murray G. Automatic summarization of bug reports. IEEE Transactions on Software Engineering, 2014, 40(4): 366–380

    Article  Google Scholar 

  14. Mani S, Catherine R, Sinha V S, Dubey A. Ausum: approach for unsupervised bug report summarization. In: Proceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering. 2012, 1–11

    Google Scholar 

  15. Radev D R, Jing H Y, Styś M, Tam D. Centroid-based summarization of multiple documents. Information Processing and Management, 2004, 40(6): 919–938

    Article  MATH  Google Scholar 

  16. Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1998, 335–336

    Google Scholar 

  17. Zhu X J, Goldberg A B, Gael J V, Andrzejewski D. Improving diversity in ranking using absorbing random walks. In: Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. 2007, 97–104

    Google Scholar 

  18. Mei Q Z, Guo J, Radev D. Divrank: the interplay of prestige and diversity in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 1009–1018

    Chapter  Google Scholar 

  19. Lotufo R, Malik Z, Czarnecki K. Modelling the ‘Hurried’ bug report reading process to summarize bug reports. In: Proceedings of the 28th IEEE International Conference on Software Maintenance. 2012, 430–439

    Google Scholar 

  20. Xuan J F, Jiang H, Hu Y, Ren Z L, Zou W Q, Luo Z X, Wu X D. Towards effective bug triage with software data reduction techniques. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(1): 264–280

    Article  Google Scholar 

  21. Xuan J F, Jiang H, Ren Z L, Luo Z X. Solving the large scale next release problem with a backbone-based multilevel algorithm. IEEE Transactions on Software Engineering, 2012, 38(5): 1195–1212

    Article  Google Scholar 

  22. Lloret E, Plaza L, Aker A. Analyzing the capabilities of crowdsourcing services for text summarization. Language Resources and Evaluation, 2013, 47(2): 337–369

    Article  Google Scholar 

  23. Hong S G, Shin S, Yi M Y. Contextual keyword extraction by building sentences with crowdsourcing. Multimedia Tools Applications, 2014, 68(2): 401–412

    Article  Google Scholar 

  24. Mizuyama H, Yamashita K, Hitomi K, Anse M. A prototype crowdsourcing approach for document summarization service. Sustainable Production and Service Supply Chains. 2013, 415: 435–442

    Article  Google Scholar 

  25. Carletta J. Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics, 1996, 22(2): 249–254

    Google Scholar 

  26. Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960, 20(1): 37

    Article  Google Scholar 

  27. Zhao Y X, Zhu Q H. Evaluation on crowdsourcing research: current status and future direction. Information Systems Frontiers, 2014, 16(3): 417–434

    Article  Google Scholar 

  28. Howe J. The rise of crowdsourcing. Wired Magazine, 2006, 14(6): 1–4

    MathSciNet  Google Scholar 

  29. Greengard S. Following the crowd. Communications of the ACM, 2011, 54(2): 20–22

    Article  Google Scholar 

  30. Riedl C, Blohm I, Leimeister J M, Krcmar H. Rating scales for collective intelligence in innovation communities: why quick and easy decision making does not get it right. In: Proceedings of the International Conference on Information Systems. 2010, 52

    Google Scholar 

  31. Whitla P. Crowdsourcing and its application in marketing activities. Contemporary Management Research, 2009, 5(1): 15–28

    Article  Google Scholar 

  32. Hsueh P Y, Melville P, Sindhwani V. Data quality from crowdsourcing: a study of annotation selection criteria. In: Proceedings of the NAACL HLT 2009 workshop on active learning for natural language processing. 2009, 27–35

    Chapter  Google Scholar 

  33. Allahbakhsh M, Benatallah B, Ignjatovic A, Motahari-Nezhad H R, Bertino E, Dustdar S. Quality control in crowdsourcing systems: issues and directions. IEEE Internet Computing, 2013, 17(2): 76–81

    Article  Google Scholar 

  34. Lofi C, Selke J, Balke W T. Information extraction meets crowdsourcing: a promising couple. Datenbank-Spektrum, 2012, 12(2): 109–120

    Article  Google Scholar 

  35. Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27

    Article  Google Scholar 

  36. Fawcett T. Roc graphs: notes and practical considerations for researchers. Machine Learning, 2004, 31: 1–38

    MathSciNet  Google Scholar 

  37. Hassan S, Rafi M, Shaikh M S. Comparing SVM and naive bayes classifiers for text categorization with wikitology as knowledge enrichment. In: Proceedings of 2011 IEEE 14th International Multitopic Conference. 2011, 31–34

    Chapter  Google Scholar 

  38. Jaakkola T, Diekhans M, Haussler D. Using the fisher kernel method to detect remote protein homologies. In: Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology. 1999, 149–158

    Google Scholar 

  39. Chen Y W, Lin C J. Combining SVMs with various feature selection strategies. Studies in Fuzziness and Soft Computing, 2006, 207: 315–324

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to He Jiang.

Additional information

Najam Nazar received his BS degree in Computer Science from University of the Punjab, Lahore, Pakistan in 2005 and MS degree in Software Engineering from Chalmers University of Technology, Sweden in 2010. He is currently working towards his PhD degree in Software Engineering at Dalian University of Technology, China. His current research interest includes mining software repositories, data mining, natural language processing, machine learning, software product lines, and agile methodologies.

He Jiang received the PhD degree in computer science from the University of Science and Technology of China, China. He is currently a Professor in Dalian University of Technology, China. His current research interests include computational intelligence and its applications in software engineering and data mining. He is also a member of the ACM and the CCF.

Guojun Gao received his Bachelor’s Degree in Software Engineering from School of Software, Dalian University of Technology, China in 2014. Currently, he is pursuing MS degree in Software Engineering from the same university. His research interests include the defects prediction, detection in software engineering.

Tao Zhang received the BE, ME degrees in Automation and Software Engineering from Northeastern University, China, in 2005 and 2008, respectively. He received the PhD degree in Computer Science from University of Seoul, South Korea in 2013. He was a research professor at the University of Seoul, South Korea from 2013 to 2014. Currently, he is a postdoctoral fellow at the Hong Kong Polytechnic University, China. His research interest includes mining software maintenance, security and privacy for mobile apps, and recommendation systems.

Xiaochen Li received the BS degree in software engineering from the Dalian University of Technology, China in 2015. He is currently a PhD candidate in Dalian University of Technology. His current research interest is mining software repositories in software engineering.

Zhilei Ren received the BS degree in Software Engineering and the PhD degree in computational mathematics from the Dalian University of Technology, China in 2007 and 2013, respectively. He is currently a lecturer in Dalian University of Technology. His current research interests include evolutionary computation and its applications in software engineering. He is a member of the ACM and the CCF.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nazar, N., Jiang, H., Gao, G. et al. Source code fragment summarization with small-scale crowdsourcing based features. Front. Comput. Sci. 10, 504–517 (2016). https://doi.org/10.1007/s11704-015-4409-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-015-4409-2

Keywords

Navigation