Skip to main content
Log in

A systemic framework for crowdsourced test report quality assessment

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

In crowdsourced mobile application testing, crowd workers perform test tasks for developers and submit test reports to report the observed abnormal behaviors. These test reports usually provide important information to improve the quality of software. However, due to the poor expertise of workers and the inconvenience of editing on mobile devices, some test reports usually lack necessary information for understanding and reproducing the revealed bugs. Sometimes developers have to spend a significant part of available resources to handle the low-quality test reports, thus severely reducing the inspection efficiency. In this paper, to help developers determine whether a test report should be selected for inspection within limited resources, we issue a new problem of test report quality assessment. Aiming to model the quality of test reports, we propose a new framework named TERQAF. First, we systematically summarize some desirable properties to characterize expected test reports and define a set of measurable indicators to quantify these properties. Then, we determine the numerical values of indicators according to the contained contents of test reports. Finally, we train a classifier by using logistic regression to predict the quality of test reports. To validate the effectiveness of TERQAF, we conduct extensive experiments over five crowdsourced test report datasets. Experimental results show that TERQAF can achieve 85.18% in terms of Macro-average Precision (MacroP), 75.87% in terms of Macro-average Recall (MacroR), and 80.01% in terms of Macro-average F-measure (MacroF) on average in test report quality assessment. Meanwhile, the empirical results also demonstrate that test report quality assessment can help developers handle test reports more efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://kikbug.net

  2. https://www.ltp-cloud.com/download/

  3. https://en.wikipedia.org/wiki/Software_bug

  4. http://www.ltp-cloud.com/

  5. http://ictclas.nlpir.org/

  6. http://www.oschina.net/p/ikanalyzer

  7. https://download.csdn.net/download/u010721054/9494800

  8. http://www.pgyer.com/y44v

  9. http://music.163.com

  10. https://guang.taobao.com

  11. http://mooctest.org/

References

  • Aceto G, Ciuonzo D, Montieri A, Pescapè A (2018) Multi-classification approaches for classifying mobile app traffic. J Netw Comput Appl 103:131–145

    Article  Google Scholar 

  • Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report?. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, ser. FSE’08. ACM, pp 308–318

  • Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful... really?. In: 24th IEEE international conference on software maintenance, ser. ICSM’08, pp 337–345

  • Carlson N, Laplante PA (2014) The NASA automated requirements measurement tool: a reconstruction. ISSE 10(2):77–91

    Google Scholar 

  • Chen Z, Luo B (2014) Quasi-crowdsourcing testing for educational projects. In: Companion proceedings of the 36th international conference on software engineering, ser ICSE’14. ACM, pp 272–275

  • Chen X, Jiang H, Li X, He T, Chen Z (2018) Automated quality assessment for crowdsourced test reports of mobile applications. In: 25th international conference on software analysis, evolution and reengineering, SANER 2018, Campobasso, Italy, March 20-23, 2018. IEEE, pp 368–379

  • Chen X, Jiang H, Chen Z, He T, Nie L (2019) Automatic test report augmentation to assist crowdsourced testing. Frontiers of Computer Science (print)(5)

  • Cui Q, Wang S, Wang J, Hu Y, Wang Q, Li M (2017) Multi-objective crowd worker selection in crowdsourced testing. In: The 29th international conference on software engineering and knowledge engineering, Wyndham Pittsburgh University Center, Pittsburgh, PA, USA, July 5-7, 2017, pp 218–223

  • de Sousa TC, Almeida JR Jr, Viana S, Pavón J (2010) Automatic analysis of requirements consistency with the B method. ACM SIGSOFT Software Engineering Notes 35(2):1–4

    Article  Google Scholar 

  • Denoeux T (2018) Logistic regression revisited: Belief function analysis. In: Belief functions: theory and applications - 5th international conference, BELIEF 2018, Compiégne, France, September 17-21, 2018, Proceedings, pp 57–64

  • Dolstra E, Vliegendhart R, Pouwelse JA (2013) Crowdsourcing gui tests. In: Sixth IEEE international conference on software testing, verification and validation, ser. ICST’13. IEEE, pp 332–341

  • Feng Y, Chen Z, Jones JA, Fang C, Xu B (2015) Test report prioritization to assist crowdsourced testing. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE’15. ACM, pp 225–236

  • Feng Y, Jones JA, Chen Z, Fang C (2016) Multi-objective test report prioritization using image understanding. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ser ASE’16. ACM, pp 202–213

  • Flesch R (1948) A new readability yardstick. Journal of Applied Psychology 32 (3):221

    Article  Google Scholar 

  • Gao R, Wang Y, Feng Y, Chen Z, Wong WE (2018) Successes, challenges, and rethinking – an industrial investigation on crowdsourced mobile application testing. Empir Softw Eng 2:1–25

    Google Scholar 

  • Génova G, Fuentes JM, Morillo JL, Hurtado O, Moreno V (2013) A framework to measure and improve the quality of textual requirements. Requir Eng 18(1):25–41

    Article  Google Scholar 

  • Gomide VH, Valle PA, Ferreira JO, Barbosa JR, Da Rocha AF, Barbosa T (2014) Affective crowdsourcing applied to usability testing. Int J Comput Sci Inf Technol 5(1):575–579

    Google Scholar 

  • Guaiani F, Muccini H (2015) Crowd and laboratory testing, can they co-exist? an exploratory study. In: 2nd IEEE/ACM international workshop on crowdsourcing in software engineering, ser. CSI-SE’15. ACM/IEEE, pp 32–37

  • Guo S, Chen R, Li H (2017) Using knowledge transfer and rough set to predict the severity of android test reports via text mining. Symmetry 9(8):161

    Article  Google Scholar 

  • Guo W (2010) Research on readability formula of chinese text for foreign students. Ph.D. dissertation, Shanghai Jiao Tong University

  • Heck P, Zaidman A (2016) A systematic literature review on quality criteria for agile requirements specifications. Softw Qual J: 1–34

  • Férnandez HJ (1959) Medidas sencillas de lecturabilidad. Consigna (214):29–32

  • Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: 22nd IEEE/ACM international conference on automated software engineering (ASE 2007), ser ASE’07. ACM, pp 34–43

  • Howe J (2006) The rise of crowdsourcing. Wired Magazine 14(6):1–4

    Google Scholar 

  • Hsu H, Chang YI, Chen R (2019) Greedy active learning algorithm for logistic regression models. Computational Statistics & Data Analysis 129:119–134

    Article  MathSciNet  Google Scholar 

  • Jiang H, Chen X, He T, Chen Z, Li X (2018) Fuzzy clustering of crowdsourced test reports for apps. ACM Trans Internet Techn 18(2):18:1?18:28

    Article  Google Scholar 

  • Jiang H, Zhang J, Li X, Ren Z, Lo D (2016) A more accurate model for finding tutorial segments explaining apis. In: IEEE 23rd international conference on software analysis, evolution, and reengineering, SANER 2016, Suita, Osaka, Japan, March 14-18, 2016, vol 1, pp 157–167

  • Joorabchi ME, MirzaAghaei M, Mesbah A (2014) Works for me! characterizing non-reproducible bug reports. In: 11th working conference on mining software repositories, MSR 2014, Proceedings, ser. MSE?14, pp 62–71

  • Kiyavitskaya N, Zeni N, Mich L, Berry DM (2008) Requirements for tools for ambiguity identification and measurement in natural language requirements specifications. Requir Eng 13(3):207–239

    Article  Google Scholar 

  • Ko AJ, Myers BA, Chau DH (2006) A linguistic analysis of how people describe software problems. In: 2006 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2006), ser. VL/HCC?06. IEEE Computer Society, pp 127–134

  • Leicht N, Knop N, Müller-Bloch C, Leimeister JM (2016) When is crowdsourcing advantageous? the case of crowdsourced software testing. In: 24th European conference on information systems, ECIS 2016, Istanbul, Turkey, June 12-15, 2016, p Research Paper 60

  • Liu D, Lease M, Kuipers R, Bias RG (2012) Crowdsourcing for usability testing. American Society for Information Science and Technology 49(1):332–341

    Google Scholar 

  • Liu Z, Gao X, Long X (2010) Adaptive random testing of mobile application. In: International conference on computer engineering and technology, pp V2–297 – V2–301

  • Mao K, Capra L, Harman M, Jia Y (2015) A survey of the use of crowdsourcing in software engineering. RN 15(01)

  • Nazar N, Jiang H, Gao G, Zhang T, Li X, Ren Z (2016) Source code fragment summarization with small-scale crowdsourcing based features. Frontiers of Computer Science 10(3):504–517

    Article  Google Scholar 

  • Nebeling M, Speicher M, Grossniklaus M, Norrie MC (2012) Crowdsourced web site evaluation with crowdstudy. In: Proceedings of 12th International Conference on Web Engineering, ser ICWE’12. Springer, pp 494–497

  • None (2014) Itc guidelines on quality control in scoring, test analysis, and reporting of test scores. Int J Test 14(3):195–217

    Article  Google Scholar 

  • Parra E, Dimou C, Morillo JL, Moreno V, Fraga A (2015) A methodology for the classification of quality of requirements using machine learning techniques. Information & Software Technology 67:180–195

    Article  Google Scholar 

  • Perry WE (2006) Effective methods for software testing, 3rd edn. Wiley, Hoboken

    Google Scholar 

  • Petrosyan G, Robillard MP, Mori RD (2015) Discovering information explaining API types using text classification. In: 37th IEEE/ACM international conference on software engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, vol 1, pp 869–879

  • Popescu D, Rugaber S, Medvidovic N, Berry DM (2007) Reducing ambiguities in requirements specifications via automatically created object-oriented models. In: Innovations for requirement analysis. From stakeholders? needs to formal designs, pp 103–124

    Google Scholar 

  • Rastkar S, Murphy GC, Murray G (2014) Automatic summarization of bug reports. IEEE Trans Software Eng 40(4):366–380

    Article  Google Scholar 

  • Rosenberg L, Hammer T (1999) A methodology for writing high quality requirement specifications and for evaluating existing ones. NASA Goddard space flight center software assurance technology center

  • Sardinha A, Chitchyan R, Weston N, Greenwood P, Rashid A (2013) Ea-analyzer: automating conflict detection in a large set of textual aspect-oriented requirements. Autom Softw Eng 20(1): 111–135

    Article  Google Scholar 

  • Starov O (2013) Cloud platform for research crowdsourcing in mobile testing. East Carolina University

  • Thakurta R (2013) A framework for prioritization of quality requirements for inclusion in a software project. Softw Qual J 21(4):573–597

    Article  Google Scholar 

  • Vliegendhart R, Dolstra E, Pouwelse J (2012) Crowdsourced user interface testing for multimedia applications. In: ACM multimedia 2012 workshop on crowdsourcing for multimedia, pp 21–22

  • Wang J, Cui Q, Wang Q, Wang S (2016) Towards effectively test report classification to assist crowdsourced testing. In: Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement. ACM, pp 6:1–6:10

  • Wang J, Cui Q, Wang S, Wang Q (2017) Domain adaptation for test report classification in crowdsourced testing. In: 39th IEEE/ACM international conference on software engineering: software engineering in practice track, ICSE-SEIP 2017, Buenos Aires, Argentina, May 20-28, 2017. IEEE, pp 83–92

  • Wang J, Wang S, Cui Q, Wang Q (2016) Local-based active classification of test report to assist crowdsourced testing. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ser ASE’16. ACM, pp 190–201

  • Wilson W, Rosenberg L, Hyatt L (1996) Automated quality analysis of natural language requirement specifications in proc. In: Fourteenth annual pacific northwest software quality conference Portland OR

  • Wu C, Chen K, Chang Y, Lei C (2013) Crowdsourcing multimedia qoe evaluation: a trusted framework. IEEE Trans Multimed 15(5):1121–1137

    Article  Google Scholar 

  • Yang S-j (1970) A readability formula for Chinese language. University of Wisconsin–Madison

  • Zhang T, Gao JZ, Cheng J (2017) Crowdsourced testing services for mobile apps. In: in 2017 IEEE symposium on service-oriented system engineering, SOSE 2017, San Francisco, CA, USA, April 6-9, 2017, pp 75–80

  • Zhang X, Chen Z, Fang C, Liu Z (2016) Guiding the crowds for android testing. In: Proceedings of the 38th international conference on software engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016 - Companion Volume, pp 752–753

  • Zimmermann T, Premraj R, Bettenburg N, Just S, Schröter A, Weiss C (2010) What makes a good bug report. IEEE Trans Software Eng 36(5):618–643

    Article  Google Scholar 

  • Zogaj S, Bretschneider U, Leimeister JM (2014) Managing crowdsourced software testing: a case study based insight on the challenges of a crowdsourcing intermediary. Journal of Business Economics 84(3):375–405

    Article  Google Scholar 

Download references

Acknowledgments

We greatly thank the developers who devote their precious time on evaluating and inspecting the quality of test reports. We would thank José M. Fuentes who provides help for us to conduct this work. This work is partially supported by the National Key Research and Development Program of China under grant no. 2018YF-B1003900, and the National Natural Science Foundation of China under Grants No. 61902096, 61972359, 61370144, 61722202, 61403057, and 61772107.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Chen.

Additional information

Communicated by: Massimiliano Di Penta and David D. Shepherd

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Software Analysis, Evolution and Reengineering (SANER)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Jiang, H., Li, X. et al. A systemic framework for crowdsourced test report quality assessment. Empir Software Eng 25, 1382–1418 (2020). https://doi.org/10.1007/s10664-019-09793-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-019-09793-8

Keywords

Navigation