Automatic test report augmentation to assist crowdsourced testing

Chen, Xin; Jiang, He; Chen, Zhenyu; He, Tieke; Nie, Liming

doi:10.1007/s11704-018-7308-5

Automatic test report augmentation to assist crowdsourced testing

Research Article
Published: 17 June 2019

Volume 13, pages 943–959, (2019)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Xin Chen^1,2,
He Jiang^2,3,
Zhenyu Chen⁴,
Tieke He⁴ &
…
Liming Nie⁵

142 Accesses
15 Citations
1 Altmetric
Explore all metrics

Abstract

In crowdsourced mobile application testing, workers are often inexperienced in and unfamiliar with software testing. Meanwhile, workers edit test reports in descriptive natural language on mobile devices. Thus, these test reports generally lack important details and challenge developers in understanding the bugs. To improve the quality of inspected test reports, we issue a new problem of test report augmentation by leveraging the additional useful information contained in duplicate test reports. In this paper, we propose a new framework named test report augmentation framework (TRAF) towards resolving the problem. First, natural language processing (NLP) techniques are adopted to preprocess the crowdsourced test reports. Then, three strategies are proposed to augment the environments, inputs, and descriptions of the inspected test reports, respectively. Finally, we visualize the augmented test reports to help developers distinguish the added information. To evaluate TRAF, we conduct experiments over five industrial datasets with 757 crowdsourced test reports. Experimental results show that TRAF can recommend relevant inputs to augment the inspected test reports with 98.49% in terms of NDCG and 88.65% in terms of precision on average, and identify valuable sentences from the descriptions of duplicates to augment the inspected test reports with 83.58% in terms of precision, 77.76% in terms of recall, and 78.72% in terms of F-measure on average. Meanwhile, empirical evaluation also demonstrates that augmented test reports can help developers understand and fix bugs better.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fake opinion detection: how similar are crowdsourced datasets to real data?

Article 28 March 2020

Natural Language Processing for Requirements Traceability

Improving agile requirements: the Quality User Story framework and tool

Article Open access 01 April 2016

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Wang J, Wang S, Cui Q, Wang Q. Local-based active classification of test report to assist crowdsourced testing. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 2016, 190–201
Google Scholar
Nebeling M, Speicher M, Grossniklaus M, Norrie M C. Crowdsourced Web site evaluation with crowdstudy. In: Proceedings of International Conference on Web Engineering. 2012, 494–497
Google Scholar
Chen Z, Luo B. Quasi-crowdsourcing testing for educational projects. In: Proceedings of the 36th ACM International Conference on Software Engineering. 2014, 272–275
Google Scholar
Mao K, Capra L, Harman M, Jia Y. A survey of the use of crowdsourcing in software engineering. Journal of Systems and Software, 2017, 126: 57–84
Article Google Scholar
Feng Y, Chen Z, Jones J A, Fang C R, Xu B W. Test report prioritization to assist crowdsourced testing. In: Proceedings of ACM SIGSOFT Symposium on the Foundation of Software Engineering/European Software Engineering Conference. 2015, 225–236
Google Scholar
Wang J, Cui Q, Wang Q, Wang S. Towards effectively test report classification to assist crowdsourced testing. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 2016, 6
Google Scholar
Feng Y, Jones J A, Chen Z, Fang C R. Multi-objective test report prioritization using image understanding. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 2016, 202–213
Google Scholar
Bettenburg N, Premraj R, Zimmermann T, Kin S. Duplicate bug reports considered harmful really? In: Proceedings of the IEEE International Conference on Software Maintenance. 2008, 337–345
Google Scholar
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T. What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2008, 308–318
Google Scholar
Zimmermann T, Premraj R, Bettenburg N, Just S, Schröter A, Weiss C. What makes a good bug report? IEEE Transactions on Software Engineering, 2010, 36(5): 618–643
Article Google Scholar
Runeson P, Alexandersson M, Nyholm O. Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th IEEE International Conference on Software Engineering. 2007, 499–510
Google Scholar
Kaushik N, Tahvildari L. A comparative study of the performance of IR models on duplicate bug detection. In: Proceedings of the 16th European Conference on Software Maintenance and Reengineering. 2012, 159–168
Google Scholar
Tian Y, Sun C, Lo D. Improved duplicate bug report identification. In: Proceedings of the 16th European Conference on Software Maintenance and Reengineering. 2012, 385–390
Google Scholar
Aggarwal K, Timbers F, Rutgers T, Hindle A, Greiner R, Stroulia E. Detecting duplicate bug reports with software engineering domain knowledge. Journal of Software: Evolution and Process, 2017, 29(3): e1821
Book Google Scholar
Thung F, Kochhar P S, Lo D. DupFinder: integrated tool support for duplicate bug report detection. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. 2014, 871–874
Google Scholar
Liu K, Tan H B K, Zhang H. Has this bug been reported? In: Proceedings of the 20th IEEE Working Conference on Reverse Engineering. 2013, 82–91
Google Scholar
Zhang T, Chen J, Jiang H, Luo X P, Xia X. Bug report enrichment with application of automated fixer recommendation. In: Proceedings of the 25th IEEE International Conference on Program Comprehension. 2017, 230–240
Google Scholar
Jiang H, Chen X, He T K, Chen Z Y, Li X C. Fuzzy clustering of crowdsourced test reports for apps. ACM Transactions on Internet Technology, 2018, 18(2): 18
Article Google Scholar
Chen X, Jiang H, Li X C, He T K, Chen Z Y. Automated quality assessment for crowdsourced test reports of mobile applications. In: Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering. 2018, 368–379
Google Scholar
Shutova E, Sun L, Korhonen A. Metaphor identification using verb and noun clustering. In: Proceedings of the 23rd International Conference on Computational Linguistics. 2010, 1002–1010
Google Scholar
Wang X, Zhang L, Xie T, Anvik J, Sun J. An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th ACM International Conference on Software Engineering. 2008, 461–470
Google Scholar
Cao J, Wu Z, Wu J. Scaling up cosine interesting pattern discovery: a depth-first method. Information Sciences, 2014, 266: 31–46.
Article Google Scholar
Salton G, McGill M. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983
MATH Google Scholar
Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 1988, 24(5): 513–523
Article Google Scholar
Inouye D, Kalita J K. Comparing twitter summarization algorithms for multiple post summaries. In: Proceedings of the 3rd IEEE Inernational Conference on Social Computing and IEEE Inernational Conference on Privacy, Security, Risk and Trust. 2011, 298–306
Google Scholar
Rastkar S, Murphy G C, Murray G. Automatic summarization of bug reports. IEEE Transaction Software Engineering, 2014, 40(4): 366–380
Article Google Scholar
Hiew L. Assisted detection of duplicate bug reports. Doctor Dissertation, University of British Columbia, 2006
Google Scholar
Järvelin K, Kekäläinen J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 2002, 20(4): 422–446
Article Google Scholar
Deshpande M, Karypis G. Item-based top-n recommendation algorithms. ACM Transactions on Information Systems, 2004, 22(1): 143–177
Article Google Scholar
Wang Y, Wang L, Li Y, He D, Liu T Y. A theoretical analysis of NDCG type ranking measures. In: Proceedings of Annual Conference on Learning Theory. 2013, 25–54
Google Scholar
Nayeem M T, Chali Y. Paraphrastic fusion for abstractive multisentence compression generation. In: Proceedings of the 2007 ACM Conference on Information and Knowledge Management. 2017, 2223–2226
Google Scholar
Salman I, Misirli A T, Juristo N. Are students representatives of professionals in software engineering experiments? In: Proceedings of the 37th International Conference on Software Engineering. 2015, 666–676
Google Scholar
Zhou X, Wan X, Xiao J. Cminer: opinion extraction and summarization for chinese microblogs. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(7): 1650–1663
Article Google Scholar
Howe J. The rise of crowdsourcing. Wired Magazine, 2006, 14(6): 1–4
Google Scholar
Liu D, Bias R G, Lease M, Bias R G. Crowdsourcing for usability testing. Proceedings of the American Society for Information Science and Technology, 2012, 49(1): 1–10
Google Scholar
Dolstra E, Vliegendhart R, Pouwelse J. Crowdsourcing gui tests. In: Proceedings of the 6th IEEE International Conference on Software Testing, Verification and Validation. 2013, 332–341
Google Scholar
Pastore F, Mariani L, Fraser G. Crowdoracles: can the crowd solve the oracle problem. In: Proceedings of International Conference on Software Testing, Verification and Validation. 2013, 342–351
Google Scholar
Yan M, Sun H, Liu X. iTest: testing software with mobile crowdsourcing. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies. 2014, 19–24
Google Scholar
Wu G, Cao Y, Chen W, Wei J, Zhong H, Huang T. AppCheck: a crowdsourced testing service for android applications. In: Proceedings of IEEE International Conference on Web Services. 2017: 253–260
Google Scholar
Cai Y, Zhang J, Cao L, Liu J. A deployable sampling strategy for data race detection. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2016, 810–821
Google Scholar
Wei L, Liu Y, Cheung S C. OASIS: prioritizing static analysis warnings for Android apps based on app user reviews. In: Proceedings of the 11st ACM Joint Meeting on Foundations of Software Engineering. 2017, 672–682
Google Scholar
Wang J, Cui Q, Wang S, Wang Q. Domain adaptation for test report classification in crowdsourced testing. In: Proceedings of the 39th IEEE International Conference on Software Engineering: Software Engineering in Practice Track. 2017, 83–92
Google Scholar
Guo S, Chen R, Li H. Using knowledge transfer and rough set to predict the severity of android test reports via text mining. Symmetry, 2017, 9(8): 161
Article Google Scholar
Nazar N, Jiang H, Gao G, Zhang T, Li X C, Ren Z L. Source code fragment summarization with small-scale crowdsourcing based features. Frontiers of Computer Science, 2016, 10(3): 504–517
Article Google Scholar
Sun C, Lo D, Khoo S C, Jiang J. Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering. 2011, 253–262
Google Scholar
Sun C, Lo D, Wang X, Jing J, Khoo S C. A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 45–54
Google Scholar
Nguyen A T, Nguyen T T, Nguyen T N, Lo D, Sun C. Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 2012, 70–79
Google Scholar

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (Grant Nos. 61370144, 61722202, 61403057, and 61772107), and Jiangsu Prospective Project of Industry- University-Research (BY2015069-03). Besides, the authors would thank the three graduate students who devote their efforts for the data annotation.

Author information

Authors and Affiliations

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China
Xin Chen
School of Software, Dalian University of Technology, Dalian, 116621, China
Xin Chen & He Jiang
Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian, 116621, China
He Jiang
School of Software, Nanjing University, Nanjing, 210093, China
Zhenyu Chen & Tieke He
School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou, 310018, China
Liming Nie

Authors

Xin Chen
View author publications
Search author on:PubMed Google Scholar
He Jiang
View author publications
Search author on:PubMed Google Scholar
Zhenyu Chen
View author publications
Search author on:PubMed Google Scholar
Tieke He
View author publications
Search author on:PubMed Google Scholar
Liming Nie
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to He Jiang.

Additional information

Xin Chen received the PhD degree in software engineering in 2018 from the School of Software, Dalian University of Technology, China. He is currently a lecturer of Hangzhou Dianzi University. His research interests include mining software repositories and evolutionary computation. He is a member of the CCF and the ACM.

He Jiang received the PhD degree from University of Science and Technology of China. He is currently a professor and PhD supervisor, at the School of Software, Dalian University of Technology, China. He has published prolifically in refereed journals and conference proceedings, e.g., TKDE, TSE, TSC, TOIT, TSMCB, TCYB, ICSE, and SANER. Prof. Jiang is a member of the CCF and the ACM. His current research interests include search based software engineering, mining software repositories, and evolutionary computation.

Zhenyu Chen is a professor at Software Institute, Nanjing University, China. He got his BS and PhD degrees in mathematics from Nanjing University, China in 2001 and 2006, respectively. His research interests include intelligent software engineering and mining software repositories. He is a member of the CCF and the ACM.

He Tieke is currently a research assistant at Software Institute, Nanjing University, China. He got his BS, MS, and PhD degrees in software engineering from Nanjing University, China in 2010, 2012 and 2017, respectively. His research interests include recommender systems and knowledge graph.

Liming Nie received the PhD degree in computer application technology from Dalian University of Technology in 2017. He is currently a lecturer with Zhejiang Sci-Tech University, China. His current research interests include intelligent software development and its application. Dr. Nie is a member of the ACM and the CCF.

Electronic supplementary material

Supplementary material, approximately 278 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, X., Jiang, H., Chen, Z. et al. Automatic test report augmentation to assist crowdsourced testing. Front. Comput. Sci. 13, 943–959 (2019). https://doi.org/10.1007/s11704-018-7308-5

Download citation

Received: 30 August 2017
Accepted: 13 July 2018
Published: 17 June 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11704-018-7308-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic test report augmentation to assist crowdsourced testing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fake opinion detection: how similar are crowdsourced datasets to real data?

Natural Language Processing for Requirements Traceability

Improving agile requirements: the Quality User Story framework and tool

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 278 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now