Abstract
These days, over three billion users rely on mobile applications (a.k.a. apps) on a daily basis to access high-speed connectivity and all kinds of services it enables, from social to emergency needs. Having high-quality apps is therefore a vital requirement for developers to keep staying on the market and acquire new users. For this reason, the research community has been devising automated strategies to better test these applications. Despite the effort spent so far, most developers write their test cases manually without the adoption of any tool. Nevertheless, we still observe a lack of knowledge on the quality of these manually written tests: an enhanced understanding of this aspect may provide evidence-based findings on the current status of testing in the wild and point out future research directions to better support the daily activities of mobile developers. We perform a large-scale empirical study targeting 1,693 open-source Android apps and aiming at assessing (1) the extent to which these apps are actually tested, (2) how well-designed are the available tests, (3) what is their effectiveness, and (4) how well manual tests can reduce the risk of having defects in production code. In addition, we conduct a focus group with 5 Android testing experts to discuss the findings achieved and gather insights into the next research avenues to undertake. The key results of our study show Android apps are poorly tested and the available tests have low (i) design quality, (ii) effectiveness, and (iii) ability to find defects in production code. Among the various suggestions, testing experts report the need for improved mechanisms to locate potential defects and deal with the complexity of creating tests that effectively exercise the production code.
Similar content being viewed by others
Notes
With respect to our previous conference paper (Pecorelli et al. 2020b), the number of apps considered decreased from 1,780 to 1,693 because 87 of them were not available anymore at the time of the journal extension.
A well-known security company targeting mobile apps: https://tinyurl.com/rdhrszc
References
Antoine J-Y, Villaneau J, Lefeuvre A (2014) Weighted krippendorff’s alpha is a more reliable metrics for multi-coders ordinal annotations: experimental studies on emotion opinion and coreference annotation
B. of Apps There are 12 million mobile developers worldwide, and nearly half develop for android first. https://goo.gl/RNCSHC
Balogh G, Gergely T, Beszédes Á, Gyimóthy T (2016) Are my unit tests in the right package?. In: 2016 IEEE 16th international working conference on source code analysis and manipulation (SCAM). IEEE, pp 137–146
Basili V R, Briand L C, Melo W L (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761
Bavota G, Qusef A, Oliveto R, De Lucia A, Binkley D (2012) An empirical analysis of the distribution of unit test smells and their impact on software maintenance. In: 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp 56–65
Bavota G, Linares-Vasquez M, Bernal-Cardenas C E, Di Penta M, Oliveto R, Poshyvanyk D (2014) The impact of api change-and fault-proneness on the user ratings of android apps. IEEE Trans Softw Eng 41(4):384–407
Bavota G, Qusef A, Oliveto R, De Lucia A, Binkley D (2015) Are test smells really harmful? An empirical study. Empir Softw Eng 20(4):1052–1094
Beller M, Gousios G, Panichella A, Zaidman A (2015) When, how, and why developers (do not) test in their ides. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, ser. ESEC/FSE 2015. [Online]. Available: http://doi.acm.org/10.1145/2786805.2786843. ACM, New York, pp 179–190
Beller M, Gousios G, Panichella A, Proksch S, Amann S, Zaidman A (2017) Developer testing in the ide: patterns, beliefs, and behavior. IEEE Trans Softw Eng 45(3):261–284
Buse R P, Weimer W R (2010) Learning a metric for code readability. IEEE Trans Softw Eng 36(4):546–558
Catolino G (2018) Does source code quality reflect the ratings of apps?. In: Proceedings of the 5th international conference on mobile software engineering and systems. ACM, pp 43–44
Catolino G, Di Nucci D, Ferrucci F (2019a) Cross-project just-in-time bug prediction for mobile apps: an empirical assessment. In: 2019 IEEE/ACM 6th international conference on mobile software engineering and systems (MOBILESoft). IEEE, pp 99–110
Catolino G, Palomba F, Zaidman A, Ferrucci F (2019b) How the experience of development teams relates to assertion density of test classes. In: 2019 IEEE 35th international conference on software maintenance and evolution (ICSME). IEEE, p. to appear
Catolino G, Palomba F, Zaidman A, Ferrucci F (2019c) Not all bugs are the same: understanding, characterizing, and classifying bug types. J Syst Softw 152:165–181
Chen M -H, Lyu M R, Wong W E (2001) Effect of code coverage on software reliability measurement. IEEE Trans Reliab 50(2):165–170
Chidamber S R, Kemerer C F (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Choudhary S R, Gorla A, Orso A (2015) Automated test input generation for android: are we there yet?. In: 2015 30th IEEE/ACM international conference on automated software engineering ASE. IEEE, pp 429–440
Cleary M, Horsfall J, Hayter M (2014) Data collection and sampling in qualitative research: does size matter? J Adv Nurs 70(3):473–475
Counsell S, Swift S, Crampton J (2006) The interpretation and utility of three cohesion metrics for object-oriented design. ACM Trans Softw Eng Methodol (TOSEM) 15(2):123–149
Creswell J W (1999) Mixed-method research: introduction and application. In: Handbook of educational policy. Elsevier, pp 455–472
Cruz L, Abreu R, Lo D (2019) To the attention of mobile software developers: guess what, test your app! Empir Softw Eng 1–31
D’Ambros M, Bacchelli A, Lanza M (2010) On the impact of design flaws on software defects. In: 2010 10th international conference on quality software. IEEE, pp 23–31
Das T, Di Penta M, Malavolta I (2016) A quantitative and qualitative investigation of performance-related commits in android apps. In: 2016 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 443–447
Di Nucci D, Palomba F, Prota A, Panichella A, Zaidman A, De Lucia A (2017) Software-based energy profiling of android apps: simple, efficient and reliable?. In: 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 103–114
Di Nucci D, Palomba F, De Rosa G, Bavota G, Oliveto R, De Lucia A (2018) A developer centered bug prediction model. IEEE Trans Softw Eng 44(1):5–24
Draper N R, Smith H (2014) Applied regression analysis, vol 326. Wiley, New York
Eck M, Palomba F, Castelluccio M, Bacchelli A (2019) Understanding flaky tests: the developer’s perspective, p to appear
Etzkorn L H, Gholston S E, Fortune J L, Stein C E, Utley D, Farrington P A, Cox G W (2004) A comparison of cohesion metrics for object-oriented systems. Inf Softw Technol 46(10):677–687
Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM 2003. Proceedings. International conference on software maintenance, 2003. IEEE, pp 23–32
Fowler M, Beck K (1999) Refactoring: improving the design of existing code. Addison-Wesley Professional
Fraser G, Arcuri A (2011) Evosuite: automatic test suite generation for object-oriented software. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, ser. ESEC/FSE ’11. [Online]. Available: http://doi.acm.org/10.1145/2025113.2025179. ACM, New York, pp 416–419
Fregnan E, Baum T, Palomba F, Bacchelli A (2018) A survey on software coupling relations and tools. Inf Softw Technol 107:159–178
Gao J, Tsai W-T, Paul R, Bai X, Uehara T (2014) Mobile testing-as-a-service (mtaas)–infrastructures, issues, solutions and needs. In: 2014 IEEE 15th international symposium on high-assurance systems engineering. IEEE, pp 158–167
Garousi V, Küçük B (2018) Smells in software test code: a survey of knowledge in industry and academia. J Syst Softw 138:52–81
Geiger F-X, Malavolta I (2018) Datasets of android applications: a literature review. arXiv:1809.10069
Geiger F -X, Malavolta I, Pascarella L, Palomba F, Di Nucci D, Bacchelli A (2018) A graph-based dataset of commit history of real-world android apps. In: Proceedings of the 15th international conference on mining software repositories. ACM, pp 30–33
Gilbert P, Chun B-G, Cox LP, Jung J (2011) Vision: automated security validation of mobile apps at app markets. In: Proceedings of the second international workshop on mobile cloud computing and services. ACM, pp 21–26
Gopinath R, Jensen C, Groce A (2014) Code coverage for suite evaluation by developers. In: Proceedings of the 36th international conference on software engineering. ACM, pp 72–82
Gopinath R, Ahmed I, Alipour M A, Jensen C, Groce A (2017) Mutation reduction strategies considered harmful. IEEE Trans Reliab 66(3):854–874
Grano G, Ciurumelea A, Panichella S, Palomba F, Gall H C (2018a) Exploring the integration of user feedback in automated testing of android applications. In: 2018 IEEE 25th international conference on software analysis, evolution and reengineering. IEEE, pp 72–83
Grano G, Scalabrino S, Oliveto R, Gall H (2018b) An empirical investigation on the readability of manual and generated test cases. In: Proceedings of the 26th international conference on program comprehension
Grano G, Palomba F, Di Nucci D, De Lucia A, Gall H C (2019) Scented since the beginning: on the diffuseness of test smells in automatically generated test code. J Syst Softw 156:312–327
Grano G, De Iaco C, Palomba F, Gall H C (2020) Pizza versus pinsa: on the perception and measurability of unit test code quality. In: 2020 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 336–347
Graves T L, Karr A F, Marron J S, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661
Greiler M, Van Deursen A., Storey M -A (2013) Automated detection of test fixture strategies and smells. In: Software testing, verification and validation (ICST), pp 322–331
Halekoh U, Højsgaard S, Yan J et al (2006) The r package geepack for generalized estimating equations. J Stat Softw 15(2):1–11
Hall T, Zhang M, Bowes D, Sun Y (2014) Some code smells have a significant but small effect on faults. ACM Trans Softw Eng Methodol (TOSEM) 23 (4):33
Hassan A E (2009) Predicting faults using the complexity of code changes. In: IEEE 31st international conference on software engineering, 2009. ICSE 2009. IEEE, pp 78–88
Henderson-Sellers B, Constantine L L, Graham I M (1996) Coupling and cohesion (towards a valid metrics suite for object-oriented analysis and design). Object Oriented Syst 3(3):143–158
Hindle A, Wilson A, Rasmussen K, Barlow E J, Campbell J C, Romansky S (2014) Greenminer: a hardware based mining software repositories software energy consumption framework. In: Proceedings of the 11th working conference on mining software repositories, pp 12–21
Iannone E, Pecorelli F, Di Nucci D, Palomba F, De Lucia A (2020) Refactoring android-specific energy smells: a plugin for android studio. In: Proceedings of the 28th international conference on program comprehension, pp 451–455
Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678
Joorabchi M E, Mesbah A, Kruchten P (2013) Real challenges in mobile app development. In: 2013 ACM/IEEE international symposium on empirical software engineering and measurement. IEEE, pp 15–24
Kamei Y, Shihab E, Adams B, Hassan A E, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773
Khomh F, Di Penta M, Guéhéneuc Y -G, Antoniol G (2012) An exploratory study of the impact of antipatterns on class change-and fault-proneness. Empir Softw Eng 17(3):243–275
Kim S, Zimmermann T, Whitehead E J Jr, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th international conference on software engineering. IEEE Computer Society, pp 489–498
Kim S, Whitehead E J, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196
Kim H, Choi B, Wong WE (2009) Performance testing of mobile applications at the unit test level. In: 2009 Third IEEE international conference on secure software integration and reliability improvement. IEEE, pp 171–180
Khalid H, Shihab E, Nagappan M, Hassan A E (2014) What do mobile app users complain about? IEEE Softw 32(3):70–77
Kochhar P S, Thung F, Nagappan N, Zimmermann T, Lo D (2015) Understanding the test automation culture of app developers. In: 2015 IEEE 8th international conference on software testing, verification and validation ICST. IEEE, pp 1–10
Koru A G, Zhang D, El Emam K, Liu H (2009) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304
Krippendorff K (2018) Content analysis: an introduction to its methodologys. Sage publications
Krutz DE, Mirakhorli M, Malachowsky SA, Ruiz A, Peterson J, Filipski A, Smith J (2015) A dataset of open-source android applications. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 522–525
Kudrjavets G, Nagappan N, Ball T (2006) Assessing the relationship between software assertions and faults: an empirical investigation. In: 2006 17th International symposium on software reliability engineering. IEEE, pp 204–212
Laaber C, Leitner P (2018) An evaluation of open-source software microbenchmark suites for continuous performance assessment. In: Proceedings of the 15th international conference on mining software repositories. ACM, pp 119–130
Leicht N, Blohm I, Leimeister J M (2017) Leveraging the power of the crowd for software testing. IEEE Softw 34(2):62–69
Lin J -W, Salehnamadi N, Malek S (2020) Test automation in open-source android apps: a large-scale empirical study. In: 2020 35th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 1078–1089
Linares-Vásquez M, Klock S, McMillan C, Sabané A, Poshyvanyk D, Guéhéneuc Y -G (2014) Domain matters: bringing further evidence of the relationships among anti-patterns, application domains, and quality-related metrics in java mobile apps. In: Proceedings of the 22nd international conference on program comprehension, pp 232–243
Linares-Vásquez M, Moran K, Poshyvanyk D (2017a) Continuous, evolutionary and large-scale: a new perspective for automated mobile app testing. In: 2017 IEEE International conference on software maintenance and evolution ICSME. IEEE, pp 399–410
Linares-Vásquez M, Bernal-Cárdenas C, Moran K, Poshyvanyk D (2017b) How do developers test android applications?. In: 2017 IEEE international conference on software maintenance and evolution ICSME. IEEE, pp 613–622
Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 643–653
Machiry A, Tahiliani R, Naik M (2013) Dynodroid: an input generation system for android apps. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 224–234
Marick B, et al. (1999) How to misuse code coverage. In: Proceedings of the 16th international conference on testing computer software, pp 16–18
Mateus B G, Martinez M (2019) An empirical study on quality of android applications written in kotlin language. Empir Softw Eng 24(6):3356–3393
Mao K, Harman M, Jia Y (2016) Sapienz: multi-objective automated testing for android applications. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 94–105
Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2016) A survey of app store analysis for software engineering. IEEE Trans Softw Eng 43 (9):817–847
McIlroy S, Ali N, Hassan A E (2016) Fresh apps: an empirical study of frequently-updated mobile apps in the google play store. Empir Softw Eng 21(3):1346–1370
Mesbah A, Prasad M R (2011) Automated cross-browser compatibility testing. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 561–570
Meszaros G (2007) xUnit test patterns: refactoring test code. Pearson Education
Minelli R, Lanza M (2013) Software analytics for mobile applications–insights & lessons learned. In: 2013 17Th European conference on software maintenance and reengineering. IEEE, pp 144–153
Moha N, Gueheneuc Y -G, Duchien L, Le Meur A -F (2010) Decor: a method for the specification and detection of code and design smells. IEEE Trans Softw Eng 36(1):20–36
Mojica I J, Adams B, Nagappan M, Dienst S, Berger T, Hassan A E (2013) A large-scale empirical study on software reuse in mobile apps. IEEE Softw 31(2):78–86
Moonen L (2001) Generating robust parsers using island grammars. In: Proceedings of the eighth working conference on reverse engineering, WCRE’01, Stuttgart, Germany, October 2–5, 2001, p 13
Morales R, Saborido R, Khomh F, Chicano F, Antoniol G (2016) Anti-patterns and the energy efficiency of android applications. arXiv:1610.05711
Muccini H, Di Francesco A, Esposito P (2012) Software testing of mobile applications: challenges and future research directions. In: Proceedings of the 7th international workshop on automation of software test. IEEE Press, pp 29–35
Myers G J, Sandler C, Badgett T (2011) The art of software testing. Wiley, New York
Nagappan M, Shihab E (2016) Future trends in software engineering research for mobile apps. In: 2016 IEEE 23rd International conference on software analysis, evolution, and reengineering (SANER), vol 5. IEEE, pp 21–32
Nagappan N, Williams L, Vouk M, Osborne J (2005) Early estimation of software quality using in-process testing metrics: a controlled case study. ACM SIGSOFT Softw Eng Notes 30(4):1–7
Nagappan N, Maximilien E M, Bhat T, Williams L (2008) Realizing quality improvement through test driven development: results and experiences of four industrial teams. Empir Softw Eng 13(3):289–302
Nagappan N, Zeller A, Zimmermann T, Herzig K, Murphy B (2010) Change bursts as defect predictors. In: 2010 IEEE 21St international symposium on software reliability engineering. IEEE, pp 309–318
Nayebi M, Adams B, Ruhe G (2016) Release practices for mobile apps–what do users and developers think?. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (saner), vol 1. IEEE, pp 552–562
Nelder J A, Wedderburn R W (1972) Generalized linear models. J Ro Stat Soc: Ser A (General) 135(3):370–384
N. Y. Times (2020) How covid19 has changed social interactions. https://www.nytimes.com/interactive/2020/04/07/technology/coronavirus-internet-use.html
O’Brien R M (2007) A caution regarding rules of thumb for variance inflation factors. Qual Quant 41(5):673–690
Palomba F, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D, De Lucia A (2015) Mining version histories for detecting code smells. IEEE Trans Softw Eng 41(5):462–489
Palomba F, Panichella A, Zaidman A, Oliveto R, De Lucia A (2016a) Automatic test case generation: what if test code quality matters?. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 130–141
Palomba F, Di Nucci D, Panichella A, Oliveto R, De Lucia A (2016b) On the diffusion of test smells in automatically generated test code: an empirical study. In: Proceedings of the 9th international workshop on search-based software testing. ACM, pp 5–14
Palomba F, Salza P, Ciurumelea A, Panichella S, Gall H, Ferrucci F, De Lucia A (2017a) Recommending and localizing change requests for mobile apps based on user reviews. In: Proceedings of the 39th international conference on software engineering. IEEE Press, pp 106–117
Palomba F, Zaidman A, Oliveto R, De Lucia A (2017b) An exploratory study on the relationship between changes and refactoring. In: 2017 IEEE/ACM 25th International conference on program comprehension (ICPC). IEEE, pp 176–185
Palomba F, Zanoni M, Fontana F A, De Lucia A, Oliveto R (2017c) Toward a smell-aware bug prediction model. IEEE Trans Softw Eng 45 (2):194–218
Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2017d) On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empir Softw Eng 23(3):1188–1221
Palomba F, Linares-Vásquez M, Bavota G, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2018a) Crowdsourcing user reviews to support the evolution of mobile apps. J Syst Softw 137:143–162
Palomba F, Zaidman A, De Lucia A (2018b) Automatic test smell detection using information retrieval techniques. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 311–322
Palomba F, Panichella A, Zaidman A, Oliveto R, De Lucia A (2018c) The scent of a smell: an extensive comparison between textual and structural smells. IEEE Trans Softw Eng 44(10):977–1000
Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2018d) A large-scale empirical study on the lifecycle of code smell co-occurrences. Inf Softw Technol 99:1–10
Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2018e) On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empir Softw Eng 23(3):1188–1221
Palomba F, Di Nucci D, Panichella A, Zaidman A, De Lucia A (2019) On the impact of code smells on the energy consumption of mobile applications. Inf Softw Technol 105:43–55
Panichella S, Panichella A, Beller M, Zaidman A, Gall H C (2016) The impact of test case summaries on bug fixing performance: an empirical investigation. In: Proceedings of the 38th international conference on software engineering, pp 547–558
Pecorelli F, Palomba F, Di Nucci D, De Lucia A (2019) Comparing heuristic and machine learning approaches for metric-based code smell detection
Pecorelli F, Palomba F, De Lucia A (2020a) The relation of test-related factors to software quality: a case study on apache systems. Empir Softw Eng xxx, no. xxx, p xxx
Pecorelli F, Catolino G, Ferrucci F, De Lucia A, Palomba F (2020b) Testing of mobile applications in the wild: a large-scale empirical study on android apps. In: Proceedings of the 28th international conference on program comprehension, pp 296–307
Pecorelli F, Catolino G, Ferrucci F, De Lucia A, Palomba F (2021) Software testing and android applications: a large-scale empirical study—online appendix. https://github.com/sesa-lab/onlineappendices/tree/main/EMSE21-mobileapps
Peruma A, Almalki K, Newman C D, Mkaouer M W, Ouni A, Palomba F (2019) On the distribution of test smells in open source android applications: an exploratory study. In: CASCON, pp 193–202
Peruma A, Newman C D, Mkaouer M W, Ouni A, Palomba F (2020) An exploratory study on the refactoring of unit test files in android applications. In: Conference on software engineering workshops (ICSEW’20)
Pham R, Kiesling S, Liskin O, Singer L, Schneider K (2014) Enablers, inhibitors, and perceptions of testing in novice software teams. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 30–40
Pidgeon N, Henwood K (2004) Grounded theory. na
Rossi P H, Wright J D, Anderson AB (2013) Handbook of survey research. Academic Press, New York
Salza P, Palomba F, Di Nucci D, De Lucia A, Ferrucci F (2019) Third-party libraries in mobile apps. Empir Softw Eng 25(3):2341–2377
Shapiro S S, Wilk M B (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611
Silva D B, Endo A T, Eler M M, Durelli V H (2016) An analysis of automated tests for mobile android applications. In: 2016 XLII Latin American computing conference CLEI. IEEE, pp 1–9
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: ACM Sigsoft software engineering notes, vol 30, no 4. ACM, pp 1–5
Spadini D, Palomba F, Zaidman A, Bruntink M, Bacchelli A (2018) On the relation of test smells to software code quality. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 1–12
Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101
Spinellis D (2005) Tool writing: a forgotten art? (software tools). IEEE Softw 22(4):9–11
Statista (2020) Number of smartphone users worldwide. https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/
Tamburri D A, Palomba F, Kazman R (2020) Success and failure in software engineering: a followup systematic literature review. IEEE Trans Eng Manag 68(2):599–611
Tufano M, Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2016) An empirical investigation into the nature of test smells. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pp 4–15
Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2017) When and why your code starts to smell bad (and whether the smells go away). IEEE Trans Softw Eng 43(11):1063–1088
Ujhazi B, Ferenc R, Poshyvanyk D, Gyimothy T (2010) New conceptual coupling and cohesion metrics for object-oriented systems. In: 2010 10th IEEE working conference on source code analysis and manipulation. IEEE, pp 33–42
Vahabzadeh A, Fard A M, Mesbah A (2015) An empirical study of bugs in test code. In: 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 101–110
Van Deursen A, Moonen L, van den Bergh A, Kok G (2001) Refactoring test code. In: Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP2001), pp 92–95
Van Rompaey B, Demeyer S (2009) Establishing traceability links between unit test cases and units under test. In: 2009 13th European conference on software maintenance and reengineering. IEEE, pp 209–218
Van Rompaey B, Du Bois B, Demeyer S, Rieger M (2007) On the detection of test smells: a metrics-based approach for general fixture and eager test. IEEE Trans Softw Eng 33(12):800–817
Wasserman T (2010) Software engineering issues for mobile application development
Wei Y, Meyer B, Oriol M (2012) Is branch coverage a good measure of testing effectiveness?. In: Empirical software engineering and verification. Springer, pp 194–212
Wei L, Liu Y, Cheung S -C (2016) Taming android fragmentation: characterizing and detecting compatibility issues for android apps. In: 2016 31st IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 226–237
Wilkinson S (1998) Focus group methodology: a review. Int J Social Res Methodol 1(3):181–203
Yang J, Zhikhartsev A, Liu Y, Tan L (2017) Better test cases for better automated program repair. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 831–841
Yu CS, Treude C, Aniche M (2019) Comprehending test code: an empirical study. p to appear
Zazworka N, Shaw M A, Shull F, Seaman C (2011) Investigating the impact of design debt on software quality. In: Proceedings of the 2nd workshop on managing technical debt. ACM, pp 17–23
Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Third international workshop on predictor models in software engineering (PROMISE’07: ICSE workshops 2007). IEEE, pp 9–9
Zhang J, Sagar S, Shihab E (2013) The evolution of mobile apps: an exploratory study. In: Proceedings of the 2013 international workshop on software development lifecycle for mobile. ACM, pp 1–8
Acknowledgments
Gemma is partially supported by the European Commission grant no. 825040 (RADON). Fabio gratefully acknowledges the support of the Swiss National Science Foundation through the SNF Projects No. PZ00P2_186090 (TED). The authors would like to thank the Associate Editor and the anonymous Reviewers for their insightful comments provided during the peer-review process, which were instrumental to improve the quality of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Yann-Gaél Guéhéneuc, Shinpei Hayashi and Michel R. V. Chaudron
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: International Conference on Program Comprehension (ICPC)
Rights and permissions
About this article
Cite this article
Pecorelli, F., Catolino, G., Ferrucci, F. et al. Software testing and Android applications: a large-scale empirical study. Empir Software Eng 27, 31 (2022). https://doi.org/10.1007/s10664-021-10059-5
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-10059-5