A systemic framework for crowdsourced test report quality assessment

Chen, Xin; Jiang, He; Li, Xiaochen; Nie, Liming; Yu, Dongjin; He, Tieke; Chen, Zhenyu

doi:10.1007/s10664-019-09793-8

A systemic framework for crowdsourced test report quality assessment

Published: 27 February 2020

Volume 25, pages 1382–1418, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Xin Chen¹,
He Jiang²,
Xiaochen Li²,
Liming Nie³,
Dongjin Yu¹,
Tieke He⁴ &
…
Zhenyu Chen⁴

957 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

In crowdsourced mobile application testing, crowd workers perform test tasks for developers and submit test reports to report the observed abnormal behaviors. These test reports usually provide important information to improve the quality of software. However, due to the poor expertise of workers and the inconvenience of editing on mobile devices, some test reports usually lack necessary information for understanding and reproducing the revealed bugs. Sometimes developers have to spend a significant part of available resources to handle the low-quality test reports, thus severely reducing the inspection efficiency. In this paper, to help developers determine whether a test report should be selected for inspection within limited resources, we issue a new problem of test report quality assessment. Aiming to model the quality of test reports, we propose a new framework named TERQAF. First, we systematically summarize some desirable properties to characterize expected test reports and define a set of measurable indicators to quantify these properties. Then, we determine the numerical values of indicators according to the contained contents of test reports. Finally, we train a classifier by using logistic regression to predict the quality of test reports. To validate the effectiveness of TERQAF, we conduct extensive experiments over five crowdsourced test report datasets. Experimental results show that TERQAF can achieve 85.18% in terms of Macro-average Precision (MacroP), 75.87% in terms of Macro-average Recall (MacroR), and 80.01% in terms of Macro-average F-measure (MacroF) on average in test report quality assessment. Meanwhile, the empirical results also demonstrate that test report quality assessment can help developers handle test reports more efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research Progress in the Processing of Crowdsourced Test Reports

Quality assessment of crowdsourced test cases

Article 10 August 2020

Yuan Zhao, Yang Feng, … Zhenyu Chen

Automatic test report augmentation to assist crowdsourced testing

Article 17 June 2019

Xin Chen, He Jiang, … Liming Nie

Notes

References

Aceto G, Ciuonzo D, Montieri A, Pescapè A (2018) Multi-classification approaches for classifying mobile app traffic. J Netw Comput Appl 103:131–145
Article Google Scholar
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report?. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, ser. FSE’08. ACM, pp 308–318
Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful... really?. In: 24th IEEE international conference on software maintenance, ser. ICSM’08, pp 337–345
Carlson N, Laplante PA (2014) The NASA automated requirements measurement tool: a reconstruction. ISSE 10(2):77–91
Google Scholar
Chen Z, Luo B (2014) Quasi-crowdsourcing testing for educational projects. In: Companion proceedings of the 36th international conference on software engineering, ser ICSE’14. ACM, pp 272–275
Chen X, Jiang H, Li X, He T, Chen Z (2018) Automated quality assessment for crowdsourced test reports of mobile applications. In: 25th international conference on software analysis, evolution and reengineering, SANER 2018, Campobasso, Italy, March 20-23, 2018. IEEE, pp 368–379
Chen X, Jiang H, Chen Z, He T, Nie L (2019) Automatic test report augmentation to assist crowdsourced testing. Frontiers of Computer Science (print)(5)
Cui Q, Wang S, Wang J, Hu Y, Wang Q, Li M (2017) Multi-objective crowd worker selection in crowdsourced testing. In: The 29th international conference on software engineering and knowledge engineering, Wyndham Pittsburgh University Center, Pittsburgh, PA, USA, July 5-7, 2017, pp 218–223
de Sousa TC, Almeida JR Jr, Viana S, Pavón J (2010) Automatic analysis of requirements consistency with the B method. ACM SIGSOFT Software Engineering Notes 35(2):1–4
Article Google Scholar
Denoeux T (2018) Logistic regression revisited: Belief function analysis. In: Belief functions: theory and applications - 5th international conference, BELIEF 2018, Compiégne, France, September 17-21, 2018, Proceedings, pp 57–64
Dolstra E, Vliegendhart R, Pouwelse JA (2013) Crowdsourcing gui tests. In: Sixth IEEE international conference on software testing, verification and validation, ser. ICST’13. IEEE, pp 332–341
Feng Y, Chen Z, Jones JA, Fang C, Xu B (2015) Test report prioritization to assist crowdsourced testing. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE’15. ACM, pp 225–236
Feng Y, Jones JA, Chen Z, Fang C (2016) Multi-objective test report prioritization using image understanding. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ser ASE’16. ACM, pp 202–213
Flesch R (1948) A new readability yardstick. Journal of Applied Psychology 32 (3):221
Article Google Scholar
Gao R, Wang Y, Feng Y, Chen Z, Wong WE (2018) Successes, challenges, and rethinking – an industrial investigation on crowdsourced mobile application testing. Empir Softw Eng 2:1–25
Google Scholar
Génova G, Fuentes JM, Morillo JL, Hurtado O, Moreno V (2013) A framework to measure and improve the quality of textual requirements. Requir Eng 18(1):25–41
Article Google Scholar
Gomide VH, Valle PA, Ferreira JO, Barbosa JR, Da Rocha AF, Barbosa T (2014) Affective crowdsourcing applied to usability testing. Int J Comput Sci Inf Technol 5(1):575–579
Google Scholar
Guaiani F, Muccini H (2015) Crowd and laboratory testing, can they co-exist? an exploratory study. In: 2nd IEEE/ACM international workshop on crowdsourcing in software engineering, ser. CSI-SE’15. ACM/IEEE, pp 32–37
Guo S, Chen R, Li H (2017) Using knowledge transfer and rough set to predict the severity of android test reports via text mining. Symmetry 9(8):161
Article Google Scholar
Guo W (2010) Research on readability formula of chinese text for foreign students. Ph.D. dissertation, Shanghai Jiao Tong University
Heck P, Zaidman A (2016) A systematic literature review on quality criteria for agile requirements specifications. Softw Qual J: 1–34
Férnandez HJ (1959) Medidas sencillas de lecturabilidad. Consigna (214):29–32
Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: 22nd IEEE/ACM international conference on automated software engineering (ASE 2007), ser ASE’07. ACM, pp 34–43
Howe J (2006) The rise of crowdsourcing. Wired Magazine 14(6):1–4
Google Scholar
Hsu H, Chang YI, Chen R (2019) Greedy active learning algorithm for logistic regression models. Computational Statistics & Data Analysis 129:119–134
Article MathSciNet Google Scholar
Jiang H, Chen X, He T, Chen Z, Li X (2018) Fuzzy clustering of crowdsourced test reports for apps. ACM Trans Internet Techn 18(2):18:1?18:28
Article Google Scholar
Jiang H, Zhang J, Li X, Ren Z, Lo D (2016) A more accurate model for finding tutorial segments explaining apis. In: IEEE 23rd international conference on software analysis, evolution, and reengineering, SANER 2016, Suita, Osaka, Japan, March 14-18, 2016, vol 1, pp 157–167
Joorabchi ME, MirzaAghaei M, Mesbah A (2014) Works for me! characterizing non-reproducible bug reports. In: 11th working conference on mining software repositories, MSR 2014, Proceedings, ser. MSE?14, pp 62–71
Kiyavitskaya N, Zeni N, Mich L, Berry DM (2008) Requirements for tools for ambiguity identification and measurement in natural language requirements specifications. Requir Eng 13(3):207–239
Article Google Scholar
Ko AJ, Myers BA, Chau DH (2006) A linguistic analysis of how people describe software problems. In: 2006 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2006), ser. VL/HCC?06. IEEE Computer Society, pp 127–134
Leicht N, Knop N, Müller-Bloch C, Leimeister JM (2016) When is crowdsourcing advantageous? the case of crowdsourced software testing. In: 24th European conference on information systems, ECIS 2016, Istanbul, Turkey, June 12-15, 2016, p Research Paper 60
Liu D, Lease M, Kuipers R, Bias RG (2012) Crowdsourcing for usability testing. American Society for Information Science and Technology 49(1):332–341
Google Scholar
Liu Z, Gao X, Long X (2010) Adaptive random testing of mobile application. In: International conference on computer engineering and technology, pp V2–297 – V2–301
Mao K, Capra L, Harman M, Jia Y (2015) A survey of the use of crowdsourcing in software engineering. RN 15(01)
Nazar N, Jiang H, Gao G, Zhang T, Li X, Ren Z (2016) Source code fragment summarization with small-scale crowdsourcing based features. Frontiers of Computer Science 10(3):504–517
Article Google Scholar
Nebeling M, Speicher M, Grossniklaus M, Norrie MC (2012) Crowdsourced web site evaluation with crowdstudy. In: Proceedings of 12th International Conference on Web Engineering, ser ICWE’12. Springer, pp 494–497
None (2014) Itc guidelines on quality control in scoring, test analysis, and reporting of test scores. Int J Test 14(3):195–217
Article Google Scholar
Parra E, Dimou C, Morillo JL, Moreno V, Fraga A (2015) A methodology for the classification of quality of requirements using machine learning techniques. Information & Software Technology 67:180–195
Article Google Scholar
Perry WE (2006) Effective methods for software testing, 3rd edn. Wiley, Hoboken
Google Scholar
Petrosyan G, Robillard MP, Mori RD (2015) Discovering information explaining API types using text classification. In: 37th IEEE/ACM international conference on software engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, vol 1, pp 869–879
Popescu D, Rugaber S, Medvidovic N, Berry DM (2007) Reducing ambiguities in requirements specifications via automatically created object-oriented models. In: Innovations for requirement analysis. From stakeholders? needs to formal designs, pp 103–124
Google Scholar
Rastkar S, Murphy GC, Murray G (2014) Automatic summarization of bug reports. IEEE Trans Software Eng 40(4):366–380
Article Google Scholar
Rosenberg L, Hammer T (1999) A methodology for writing high quality requirement specifications and for evaluating existing ones. NASA Goddard space flight center software assurance technology center
Sardinha A, Chitchyan R, Weston N, Greenwood P, Rashid A (2013) Ea-analyzer: automating conflict detection in a large set of textual aspect-oriented requirements. Autom Softw Eng 20(1): 111–135
Article Google Scholar
Starov O (2013) Cloud platform for research crowdsourcing in mobile testing. East Carolina University
Thakurta R (2013) A framework for prioritization of quality requirements for inclusion in a software project. Softw Qual J 21(4):573–597
Article Google Scholar
Vliegendhart R, Dolstra E, Pouwelse J (2012) Crowdsourced user interface testing for multimedia applications. In: ACM multimedia 2012 workshop on crowdsourcing for multimedia, pp 21–22
Wang J, Cui Q, Wang Q, Wang S (2016) Towards effectively test report classification to assist crowdsourced testing. In: Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement. ACM, pp 6:1–6:10
Wang J, Cui Q, Wang S, Wang Q (2017) Domain adaptation for test report classification in crowdsourced testing. In: 39th IEEE/ACM international conference on software engineering: software engineering in practice track, ICSE-SEIP 2017, Buenos Aires, Argentina, May 20-28, 2017. IEEE, pp 83–92
Wang J, Wang S, Cui Q, Wang Q (2016) Local-based active classification of test report to assist crowdsourced testing. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ser ASE’16. ACM, pp 190–201
Wilson W, Rosenberg L, Hyatt L (1996) Automated quality analysis of natural language requirement specifications in proc. In: Fourteenth annual pacific northwest software quality conference Portland OR
Wu C, Chen K, Chang Y, Lei C (2013) Crowdsourcing multimedia qoe evaluation: a trusted framework. IEEE Trans Multimed 15(5):1121–1137
Article Google Scholar
Yang S-j (1970) A readability formula for Chinese language. University of Wisconsin–Madison
Zhang T, Gao JZ, Cheng J (2017) Crowdsourced testing services for mobile apps. In: in 2017 IEEE symposium on service-oriented system engineering, SOSE 2017, San Francisco, CA, USA, April 6-9, 2017, pp 75–80
Zhang X, Chen Z, Fang C, Liu Z (2016) Guiding the crowds for android testing. In: Proceedings of the 38th international conference on software engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016 - Companion Volume, pp 752–753
Zimmermann T, Premraj R, Bettenburg N, Just S, Schröter A, Weiss C (2010) What makes a good bug report. IEEE Trans Software Eng 36(5):618–643
Article Google Scholar
Zogaj S, Bretschneider U, Leimeister JM (2014) Managing crowdsourced software testing: a case study based insight on the challenges of a crowdsourcing intermediary. Journal of Business Economics 84(3):375–405
Article Google Scholar

Download references

Acknowledgments

We greatly thank the developers who devote their precious time on evaluating and inspecting the quality of test reports. We would thank José M. Fuentes who provides help for us to conduct this work. This work is partially supported by the National Key Research and Development Program of China under grant no. 2018YF-B1003900, and the National Natural Science Foundation of China under Grants No. 61902096, 61972359, 61370144, 61722202, 61403057, and 61772107.

Author information

Authors and Affiliations

College of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
Xin Chen & Dongjin Yu
School of Software, Dalian University of Technology, Dalian, China
He Jiang & Xiaochen Li
School of Information Science and Technology, Zhejiang Sci-tech University, Hangzhou, China
Liming Nie
School of Software, Nanjing University, Nanjing, China
Tieke He & Zhenyu Chen

Authors

Xin Chen
View author publications
You can also search for this author in PubMed Google Scholar
He Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Li
View author publications
You can also search for this author in PubMed Google Scholar
Liming Nie
View author publications
You can also search for this author in PubMed Google Scholar
Dongjin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Tieke He
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Chen.

Additional information

Communicated by: Massimiliano Di Penta and David D. Shepherd

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Software Analysis, Evolution and Reengineering (SANER)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, X., Jiang, H., Li, X. et al. A systemic framework for crowdsourced test report quality assessment. Empir Software Eng 25, 1382–1418 (2020). https://doi.org/10.1007/s10664-019-09793-8

Download citation

Published: 27 February 2020
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10664-019-09793-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A systemic framework for crowdsourced test report quality assessment

Abstract

Access this article

Similar content being viewed by others

Research Progress in the Processing of Crowdsourced Test Reports

Quality assessment of crowdsourced test cases

Automatic test report augmentation to assist crowdsourced testing

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A systemic framework for crowdsourced test report quality assessment

Abstract

Access this article

Similar content being viewed by others

Research Progress in the Processing of Crowdsourced Test Reports

Quality assessment of crowdsourced test cases

Automatic test report augmentation to assist crowdsourced testing

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation