Skip to main content
Log in

Software testing and Android applications: a large-scale empirical study

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

These days, over three billion users rely on mobile applications (a.k.a. apps) on a daily basis to access high-speed connectivity and all kinds of services it enables, from social to emergency needs. Having high-quality apps is therefore a vital requirement for developers to keep staying on the market and acquire new users. For this reason, the research community has been devising automated strategies to better test these applications. Despite the effort spent so far, most developers write their test cases manually without the adoption of any tool. Nevertheless, we still observe a lack of knowledge on the quality of these manually written tests: an enhanced understanding of this aspect may provide evidence-based findings on the current status of testing in the wild and point out future research directions to better support the daily activities of mobile developers. We perform a large-scale empirical study targeting 1,693 open-source Android apps and aiming at assessing (1) the extent to which these apps are actually tested, (2) how well-designed are the available tests, (3) what is their effectiveness, and (4) how well manual tests can reduce the risk of having defects in production code. In addition, we conduct a focus group with 5 Android testing experts to discuss the findings achieved and gather insights into the next research avenues to undertake. The key results of our study show Android apps are poorly tested and the available tests have low (i) design quality, (ii) effectiveness, and (iii) ability to find defects in production code. Among the various suggestions, testing experts report the need for improved mechanisms to locate potential defects and deal with the complexity of creating tests that effectively exercise the production code.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. https://f-droid.org

  2. https://github.com

  3. With respect to our previous conference paper (Pecorelli et al. 2020b), the number of apps considered decreased from 1,780 to 1,693 because 87 of them were not available anymore at the time of the journal extension.

  4. A well-known security company targeting mobile apps: https://tinyurl.com/rdhrszc

  5. https://github.com/arturdm/jacoco-android-gradle-plugin

  6. Link: https://pydriller.readthedocs.io/.

  7. https://www.zoom.us/en/

  8. https://cucumber.io

  9. https://f-droid.org/en/packages/name.gdr.acastus_photon/

  10. https://pitest.org

References

  • Antoine J-Y, Villaneau J, Lefeuvre A (2014) Weighted krippendorff’s alpha is a more reliable metrics for multi-coders ordinal annotations: experimental studies on emotion opinion and coreference annotation

  • B. of Apps There are 12 million mobile developers worldwide, and nearly half develop for android first. https://goo.gl/RNCSHC

  • Balogh G, Gergely T, Beszédes Á, Gyimóthy T (2016) Are my unit tests in the right package?. In: 2016 IEEE 16th international working conference on source code analysis and manipulation (SCAM). IEEE, pp 137–146

  • Basili V R, Briand L C, Melo W L (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761

    Article  Google Scholar 

  • Bavota G, Qusef A, Oliveto R, De Lucia A, Binkley D (2012) An empirical analysis of the distribution of unit test smells and their impact on software maintenance. In: 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp 56–65

  • Bavota G, Linares-Vasquez M, Bernal-Cardenas C E, Di Penta M, Oliveto R, Poshyvanyk D (2014) The impact of api change-and fault-proneness on the user ratings of android apps. IEEE Trans Softw Eng 41(4):384–407

    Article  Google Scholar 

  • Bavota G, Qusef A, Oliveto R, De Lucia A, Binkley D (2015) Are test smells really harmful? An empirical study. Empir Softw Eng 20(4):1052–1094

    Article  Google Scholar 

  • Beller M, Gousios G, Panichella A, Zaidman A (2015) When, how, and why developers (do not) test in their ides. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, ser. ESEC/FSE 2015. [Online]. Available: http://doi.acm.org/10.1145/2786805.2786843. ACM, New York, pp 179–190

  • Beller M, Gousios G, Panichella A, Proksch S, Amann S, Zaidman A (2017) Developer testing in the ide: patterns, beliefs, and behavior. IEEE Trans Softw Eng 45(3):261–284

    Article  Google Scholar 

  • Buse R P, Weimer W R (2010) Learning a metric for code readability. IEEE Trans Softw Eng 36(4):546–558

    Article  Google Scholar 

  • Catolino G (2018) Does source code quality reflect the ratings of apps?. In: Proceedings of the 5th international conference on mobile software engineering and systems. ACM, pp 43–44

  • Catolino G, Di Nucci D, Ferrucci F (2019a) Cross-project just-in-time bug prediction for mobile apps: an empirical assessment. In: 2019 IEEE/ACM 6th international conference on mobile software engineering and systems (MOBILESoft). IEEE, pp 99–110

  • Catolino G, Palomba F, Zaidman A, Ferrucci F (2019b) How the experience of development teams relates to assertion density of test classes. In: 2019 IEEE 35th international conference on software maintenance and evolution (ICSME). IEEE, p. to appear

  • Catolino G, Palomba F, Zaidman A, Ferrucci F (2019c) Not all bugs are the same: understanding, characterizing, and classifying bug types. J Syst Softw 152:165–181

    Article  Google Scholar 

  • Chen M -H, Lyu M R, Wong W E (2001) Effect of code coverage on software reliability measurement. IEEE Trans Reliab 50(2):165–170

    Article  Google Scholar 

  • Chidamber S R, Kemerer C F (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  • Choudhary S R, Gorla A, Orso A (2015) Automated test input generation for android: are we there yet?. In: 2015 30th IEEE/ACM international conference on automated software engineering ASE. IEEE, pp 429–440

  • Cleary M, Horsfall J, Hayter M (2014) Data collection and sampling in qualitative research: does size matter? J Adv Nurs 70(3):473–475

    Article  Google Scholar 

  • Counsell S, Swift S, Crampton J (2006) The interpretation and utility of three cohesion metrics for object-oriented design. ACM Trans Softw Eng Methodol (TOSEM) 15(2):123–149

    Article  Google Scholar 

  • Creswell J W (1999) Mixed-method research: introduction and application. In: Handbook of educational policy. Elsevier, pp 455–472

  • Cruz L, Abreu R, Lo D (2019) To the attention of mobile software developers: guess what, test your app! Empir Softw Eng 1–31

  • D’Ambros M, Bacchelli A, Lanza M (2010) On the impact of design flaws on software defects. In: 2010 10th international conference on quality software. IEEE, pp 23–31

  • Das T, Di Penta M, Malavolta I (2016) A quantitative and qualitative investigation of performance-related commits in android apps. In: 2016 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 443–447

  • Di Nucci D, Palomba F, Prota A, Panichella A, Zaidman A, De Lucia A (2017) Software-based energy profiling of android apps: simple, efficient and reliable?. In: 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 103–114

  • Di Nucci D, Palomba F, De Rosa G, Bavota G, Oliveto R, De Lucia A (2018) A developer centered bug prediction model. IEEE Trans Softw Eng 44(1):5–24

    Article  Google Scholar 

  • Draper N R, Smith H (2014) Applied regression analysis, vol 326. Wiley, New York

    Google Scholar 

  • Eck M, Palomba F, Castelluccio M, Bacchelli A (2019) Understanding flaky tests: the developer’s perspective, p to appear

  • Etzkorn L H, Gholston S E, Fortune J L, Stein C E, Utley D, Farrington P A, Cox G W (2004) A comparison of cohesion metrics for object-oriented systems. Inf Softw Technol 46(10):677–687

    Article  Google Scholar 

  • Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM 2003. Proceedings. International conference on software maintenance, 2003. IEEE, pp 23–32

  • Fowler M, Beck K (1999) Refactoring: improving the design of existing code. Addison-Wesley Professional

  • Fraser G, Arcuri A (2011) Evosuite: automatic test suite generation for object-oriented software. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, ser. ESEC/FSE ’11. [Online]. Available: http://doi.acm.org/10.1145/2025113.2025179. ACM, New York, pp 416–419

  • Fregnan E, Baum T, Palomba F, Bacchelli A (2018) A survey on software coupling relations and tools. Inf Softw Technol 107:159–178

    Article  Google Scholar 

  • Gao J, Tsai W-T, Paul R, Bai X, Uehara T (2014) Mobile testing-as-a-service (mtaas)–infrastructures, issues, solutions and needs. In: 2014 IEEE 15th international symposium on high-assurance systems engineering. IEEE, pp 158–167

  • Garousi V, Küçük B (2018) Smells in software test code: a survey of knowledge in industry and academia. J Syst Softw 138:52–81

    Article  Google Scholar 

  • Geiger F-X, Malavolta I (2018) Datasets of android applications: a literature review. arXiv:1809.10069

  • Geiger F -X, Malavolta I, Pascarella L, Palomba F, Di Nucci D, Bacchelli A (2018) A graph-based dataset of commit history of real-world android apps. In: Proceedings of the 15th international conference on mining software repositories. ACM, pp 30–33

  • Gilbert P, Chun B-G, Cox LP, Jung J (2011) Vision: automated security validation of mobile apps at app markets. In: Proceedings of the second international workshop on mobile cloud computing and services. ACM, pp 21–26

  • Gopinath R, Jensen C, Groce A (2014) Code coverage for suite evaluation by developers. In: Proceedings of the 36th international conference on software engineering. ACM, pp 72–82

  • Gopinath R, Ahmed I, Alipour M A, Jensen C, Groce A (2017) Mutation reduction strategies considered harmful. IEEE Trans Reliab 66(3):854–874

    Article  Google Scholar 

  • Grano G, Ciurumelea A, Panichella S, Palomba F, Gall H C (2018a) Exploring the integration of user feedback in automated testing of android applications. In: 2018 IEEE 25th international conference on software analysis, evolution and reengineering. IEEE, pp 72–83

  • Grano G, Scalabrino S, Oliveto R, Gall H (2018b) An empirical investigation on the readability of manual and generated test cases. In: Proceedings of the 26th international conference on program comprehension

  • Grano G, Palomba F, Di Nucci D, De Lucia A, Gall H C (2019) Scented since the beginning: on the diffuseness of test smells in automatically generated test code. J Syst Softw 156:312–327

    Article  Google Scholar 

  • Grano G, De Iaco C, Palomba F, Gall H C (2020) Pizza versus pinsa: on the perception and measurability of unit test code quality. In: 2020 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 336–347

  • Graves T L, Karr A F, Marron J S, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661

    Article  Google Scholar 

  • Greiler M, Van Deursen A., Storey M -A (2013) Automated detection of test fixture strategies and smells. In: Software testing, verification and validation (ICST), pp 322–331

  • Halekoh U, Højsgaard S, Yan J et al (2006) The r package geepack for generalized estimating equations. J Stat Softw 15(2):1–11

    Article  Google Scholar 

  • Hall T, Zhang M, Bowes D, Sun Y (2014) Some code smells have a significant but small effect on faults. ACM Trans Softw Eng Methodol (TOSEM) 23 (4):33

    Article  Google Scholar 

  • Hassan A E (2009) Predicting faults using the complexity of code changes. In: IEEE 31st international conference on software engineering, 2009. ICSE 2009. IEEE, pp 78–88

  • Henderson-Sellers B, Constantine L L, Graham I M (1996) Coupling and cohesion (towards a valid metrics suite for object-oriented analysis and design). Object Oriented Syst 3(3):143–158

    Google Scholar 

  • Hindle A, Wilson A, Rasmussen K, Barlow E J, Campbell J C, Romansky S (2014) Greenminer: a hardware based mining software repositories software energy consumption framework. In: Proceedings of the 11th working conference on mining software repositories, pp 12–21

  • Iannone E, Pecorelli F, Di Nucci D, Palomba F, De Lucia A (2020) Refactoring android-specific energy smells: a plugin for android studio. In: Proceedings of the 28th international conference on program comprehension, pp 451–455

  • Jia Y, Harman M (2011) An analysis and survey of the development of mutation testing. IEEE Trans Softw Eng 37(5):649–678

    Article  Google Scholar 

  • Joorabchi M E, Mesbah A, Kruchten P (2013) Real challenges in mobile app development. In: 2013 ACM/IEEE international symposium on empirical software engineering and measurement. IEEE, pp 15–24

  • Kamei Y, Shihab E, Adams B, Hassan A E, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773

    Article  Google Scholar 

  • Khomh F, Di Penta M, Guéhéneuc Y -G, Antoniol G (2012) An exploratory study of the impact of antipatterns on class change-and fault-proneness. Empir Softw Eng 17(3):243–275

    Article  Google Scholar 

  • Kim S, Zimmermann T, Whitehead E J Jr, Zeller A (2007) Predicting faults from cached history. In: Proceedings of the 29th international conference on software engineering. IEEE Computer Society, pp 489–498

  • Kim S, Whitehead E J, Zhang Y (2008) Classifying software changes: clean or buggy? IEEE Trans Softw Eng 34(2):181–196

    Article  Google Scholar 

  • Kim H, Choi B, Wong WE (2009) Performance testing of mobile applications at the unit test level. In: 2009 Third IEEE international conference on secure software integration and reliability improvement. IEEE, pp 171–180

  • Khalid H, Shihab E, Nagappan M, Hassan A E (2014) What do mobile app users complain about? IEEE Softw 32(3):70–77

    Article  Google Scholar 

  • Kochhar P S, Thung F, Nagappan N, Zimmermann T, Lo D (2015) Understanding the test automation culture of app developers. In: 2015 IEEE 8th international conference on software testing, verification and validation ICST. IEEE, pp 1–10

  • Koru A G, Zhang D, El Emam K, Liu H (2009) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304

    Article  Google Scholar 

  • Krippendorff K (2018) Content analysis: an introduction to its methodologys. Sage publications

  • Krutz DE, Mirakhorli M, Malachowsky SA, Ruiz A, Peterson J, Filipski A, Smith J (2015) A dataset of open-source android applications. In: Proceedings of the 12th working conference on mining software repositories. IEEE Press, pp 522–525

  • Kudrjavets G, Nagappan N, Ball T (2006) Assessing the relationship between software assertions and faults: an empirical investigation. In: 2006 17th International symposium on software reliability engineering. IEEE, pp 204–212

  • Laaber C, Leitner P (2018) An evaluation of open-source software microbenchmark suites for continuous performance assessment. In: Proceedings of the 15th international conference on mining software repositories. ACM, pp 119–130

  • Leicht N, Blohm I, Leimeister J M (2017) Leveraging the power of the crowd for software testing. IEEE Softw 34(2):62–69

    Article  Google Scholar 

  • Lin J -W, Salehnamadi N, Malek S (2020) Test automation in open-source android apps: a large-scale empirical study. In: 2020 35th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 1078–1089

  • Linares-Vásquez M, Klock S, McMillan C, Sabané A, Poshyvanyk D, Guéhéneuc Y -G (2014) Domain matters: bringing further evidence of the relationships among anti-patterns, application domains, and quality-related metrics in java mobile apps. In: Proceedings of the 22nd international conference on program comprehension, pp 232–243

  • Linares-Vásquez M, Moran K, Poshyvanyk D (2017a) Continuous, evolutionary and large-scale: a new perspective for automated mobile app testing. In: 2017 IEEE International conference on software maintenance and evolution ICSME. IEEE, pp 399–410

  • Linares-Vásquez M, Bernal-Cárdenas C, Moran K, Poshyvanyk D (2017b) How do developers test android applications?. In: 2017 IEEE international conference on software maintenance and evolution ICSME. IEEE, pp 613–622

  • Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 643–653

  • Machiry A, Tahiliani R, Naik M (2013) Dynodroid: an input generation system for android apps. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering. ACM, pp 224–234

  • Marick B, et al. (1999) How to misuse code coverage. In: Proceedings of the 16th international conference on testing computer software, pp 16–18

  • Mateus B G, Martinez M (2019) An empirical study on quality of android applications written in kotlin language. Empir Softw Eng 24(6):3356–3393

    Article  Google Scholar 

  • Mao K, Harman M, Jia Y (2016) Sapienz: multi-objective automated testing for android applications. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 94–105

  • Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2016) A survey of app store analysis for software engineering. IEEE Trans Softw Eng 43 (9):817–847

    Article  Google Scholar 

  • McIlroy S, Ali N, Hassan A E (2016) Fresh apps: an empirical study of frequently-updated mobile apps in the google play store. Empir Softw Eng 21(3):1346–1370

    Article  Google Scholar 

  • Mesbah A, Prasad M R (2011) Automated cross-browser compatibility testing. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 561–570

  • Meszaros G (2007) xUnit test patterns: refactoring test code. Pearson Education

  • Minelli R, Lanza M (2013) Software analytics for mobile applications–insights & lessons learned. In: 2013 17Th European conference on software maintenance and reengineering. IEEE, pp 144–153

  • Moha N, Gueheneuc Y -G, Duchien L, Le Meur A -F (2010) Decor: a method for the specification and detection of code and design smells. IEEE Trans Softw Eng 36(1):20–36

    Article  MATH  Google Scholar 

  • Mojica I J, Adams B, Nagappan M, Dienst S, Berger T, Hassan A E (2013) A large-scale empirical study on software reuse in mobile apps. IEEE Softw 31(2):78–86

    Article  Google Scholar 

  • Moonen L (2001) Generating robust parsers using island grammars. In: Proceedings of the eighth working conference on reverse engineering, WCRE’01, Stuttgart, Germany, October 2–5, 2001, p 13

  • Morales R, Saborido R, Khomh F, Chicano F, Antoniol G (2016) Anti-patterns and the energy efficiency of android applications. arXiv:1610.05711

  • Muccini H, Di Francesco A, Esposito P (2012) Software testing of mobile applications: challenges and future research directions. In: Proceedings of the 7th international workshop on automation of software test. IEEE Press, pp 29–35

  • Myers G J, Sandler C, Badgett T (2011) The art of software testing. Wiley, New York

    Google Scholar 

  • Nagappan M, Shihab E (2016) Future trends in software engineering research for mobile apps. In: 2016 IEEE 23rd International conference on software analysis, evolution, and reengineering (SANER), vol 5. IEEE, pp 21–32

  • Nagappan N, Williams L, Vouk M, Osborne J (2005) Early estimation of software quality using in-process testing metrics: a controlled case study. ACM SIGSOFT Softw Eng Notes 30(4):1–7

    Article  Google Scholar 

  • Nagappan N, Maximilien E M, Bhat T, Williams L (2008) Realizing quality improvement through test driven development: results and experiences of four industrial teams. Empir Softw Eng 13(3):289–302

    Article  Google Scholar 

  • Nagappan N, Zeller A, Zimmermann T, Herzig K, Murphy B (2010) Change bursts as defect predictors. In: 2010 IEEE 21St international symposium on software reliability engineering. IEEE, pp 309–318

  • Nayebi M, Adams B, Ruhe G (2016) Release practices for mobile apps–what do users and developers think?. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (saner), vol 1. IEEE, pp 552–562

  • Nelder J A, Wedderburn R W (1972) Generalized linear models. J Ro Stat Soc: Ser A (General) 135(3):370–384

    Article  Google Scholar 

  • N. Y. Times (2020) How covid19 has changed social interactions. https://www.nytimes.com/interactive/2020/04/07/technology/coronavirus-internet-use.html

  • O’Brien R M (2007) A caution regarding rules of thumb for variance inflation factors. Qual Quant 41(5):673–690

    Article  Google Scholar 

  • Palomba F, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D, De Lucia A (2015) Mining version histories for detecting code smells. IEEE Trans Softw Eng 41(5):462–489

    Article  Google Scholar 

  • Palomba F, Panichella A, Zaidman A, Oliveto R, De Lucia A (2016a) Automatic test case generation: what if test code quality matters?. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 130–141

  • Palomba F, Di Nucci D, Panichella A, Oliveto R, De Lucia A (2016b) On the diffusion of test smells in automatically generated test code: an empirical study. In: Proceedings of the 9th international workshop on search-based software testing. ACM, pp 5–14

  • Palomba F, Salza P, Ciurumelea A, Panichella S, Gall H, Ferrucci F, De Lucia A (2017a) Recommending and localizing change requests for mobile apps based on user reviews. In: Proceedings of the 39th international conference on software engineering. IEEE Press, pp 106–117

  • Palomba F, Zaidman A, Oliveto R, De Lucia A (2017b) An exploratory study on the relationship between changes and refactoring. In: 2017 IEEE/ACM 25th International conference on program comprehension (ICPC). IEEE, pp 176–185

  • Palomba F, Zanoni M, Fontana F A, De Lucia A, Oliveto R (2017c) Toward a smell-aware bug prediction model. IEEE Trans Softw Eng 45 (2):194–218

    Article  Google Scholar 

  • Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2017d) On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empir Softw Eng 23(3):1188–1221

    Article  Google Scholar 

  • Palomba F, Linares-Vásquez M, Bavota G, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2018a) Crowdsourcing user reviews to support the evolution of mobile apps. J Syst Softw 137:143–162

    Article  Google Scholar 

  • Palomba F, Zaidman A, De Lucia A (2018b) Automatic test smell detection using information retrieval techniques. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 311–322

  • Palomba F, Panichella A, Zaidman A, Oliveto R, De Lucia A (2018c) The scent of a smell: an extensive comparison between textual and structural smells. IEEE Trans Softw Eng 44(10):977–1000

    Article  Google Scholar 

  • Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2018d) A large-scale empirical study on the lifecycle of code smell co-occurrences. Inf Softw Technol 99:1–10

    Article  Google Scholar 

  • Palomba F, Bavota G, Di Penta M, Fasano F, Oliveto R, De Lucia A (2018e) On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empir Softw Eng 23(3):1188–1221

    Article  Google Scholar 

  • Palomba F, Di Nucci D, Panichella A, Zaidman A, De Lucia A (2019) On the impact of code smells on the energy consumption of mobile applications. Inf Softw Technol 105:43–55

    Article  Google Scholar 

  • Panichella S, Panichella A, Beller M, Zaidman A, Gall H C (2016) The impact of test case summaries on bug fixing performance: an empirical investigation. In: Proceedings of the 38th international conference on software engineering, pp 547–558

  • Pecorelli F, Palomba F, Di Nucci D, De Lucia A (2019) Comparing heuristic and machine learning approaches for metric-based code smell detection

  • Pecorelli F, Palomba F, De Lucia A (2020a) The relation of test-related factors to software quality: a case study on apache systems. Empir Softw Eng xxx, no. xxx, p xxx

  • Pecorelli F, Catolino G, Ferrucci F, De Lucia A, Palomba F (2020b) Testing of mobile applications in the wild: a large-scale empirical study on android apps. In: Proceedings of the 28th international conference on program comprehension, pp 296–307

  • Pecorelli F, Catolino G, Ferrucci F, De Lucia A, Palomba F (2021) Software testing and android applications: a large-scale empirical study—online appendix. https://github.com/sesa-lab/onlineappendices/tree/main/EMSE21-mobileapps

  • Peruma A, Almalki K, Newman C D, Mkaouer M W, Ouni A, Palomba F (2019) On the distribution of test smells in open source android applications: an exploratory study. In: CASCON, pp 193–202

  • Peruma A, Newman C D, Mkaouer M W, Ouni A, Palomba F (2020) An exploratory study on the refactoring of unit test files in android applications. In: Conference on software engineering workshops (ICSEW’20)

  • Pham R, Kiesling S, Liskin O, Singer L, Schneider K (2014) Enablers, inhibitors, and perceptions of testing in novice software teams. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 30–40

  • Pidgeon N, Henwood K (2004) Grounded theory. na

  • Rossi P H, Wright J D, Anderson AB (2013) Handbook of survey research. Academic Press, New York

    Google Scholar 

  • Salza P, Palomba F, Di Nucci D, De Lucia A, Ferrucci F (2019) Third-party libraries in mobile apps. Empir Softw Eng 25(3):2341–2377

    Article  Google Scholar 

  • Shapiro S S, Wilk M B (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611

    Article  MathSciNet  MATH  Google Scholar 

  • Silva D B, Endo A T, Eler M M, Durelli V H (2016) An analysis of automated tests for mobile android applications. In: 2016 XLII Latin American computing conference CLEI. IEEE, pp 1–9

  • Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: ACM Sigsoft software engineering notes, vol 30, no 4. ACM, pp 1–5

  • Spadini D, Palomba F, Zaidman A, Bruntink M, Bacchelli A (2018) On the relation of test smells to software code quality. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 1–12

  • Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101

    Article  Google Scholar 

  • Spinellis D (2005) Tool writing: a forgotten art? (software tools). IEEE Softw 22(4):9–11

    Article  Google Scholar 

  • Statista (2020) Number of smartphone users worldwide. https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/

  • Tamburri D A, Palomba F, Kazman R (2020) Success and failure in software engineering: a followup systematic literature review. IEEE Trans Eng Manag 68(2):599–611

    Article  Google Scholar 

  • Tufano M, Palomba F, Bavota G, Di Penta M, Oliveto R, De Lucia A, Poshyvanyk D (2016) An empirical investigation into the nature of test smells. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering, pp 4–15

  • Tufano M, Palomba F, Bavota G, Oliveto R, Di Penta M, De Lucia A, Poshyvanyk D (2017) When and why your code starts to smell bad (and whether the smells go away). IEEE Trans Softw Eng 43(11):1063–1088

    Article  Google Scholar 

  • Ujhazi B, Ferenc R, Poshyvanyk D, Gyimothy T (2010) New conceptual coupling and cohesion metrics for object-oriented systems. In: 2010 10th IEEE working conference on source code analysis and manipulation. IEEE, pp 33–42

  • Vahabzadeh A, Fard A M, Mesbah A (2015) An empirical study of bugs in test code. In: 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 101–110

  • Van Deursen A, Moonen L, van den Bergh A, Kok G (2001) Refactoring test code. In: Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP2001), pp 92–95

  • Van Rompaey B, Demeyer S (2009) Establishing traceability links between unit test cases and units under test. In: 2009 13th European conference on software maintenance and reengineering. IEEE, pp 209–218

  • Van Rompaey B, Du Bois B, Demeyer S, Rieger M (2007) On the detection of test smells: a metrics-based approach for general fixture and eager test. IEEE Trans Softw Eng 33(12):800–817

    Article  Google Scholar 

  • Wasserman T (2010) Software engineering issues for mobile application development

  • Wei Y, Meyer B, Oriol M (2012) Is branch coverage a good measure of testing effectiveness?. In: Empirical software engineering and verification. Springer, pp 194–212

  • Wei L, Liu Y, Cheung S -C (2016) Taming android fragmentation: characterizing and detecting compatibility issues for android apps. In: 2016 31st IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 226–237

  • Wilkinson S (1998) Focus group methodology: a review. Int J Social Res Methodol 1(3):181–203

    Article  Google Scholar 

  • Yang J, Zhikhartsev A, Liu Y, Tan L (2017) Better test cases for better automated program repair. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 831–841

  • Yu CS, Treude C, Aniche M (2019) Comprehending test code: an empirical study. p to appear

  • Zazworka N, Shaw M A, Shull F, Seaman C (2011) Investigating the impact of design debt on software quality. In: Proceedings of the 2nd workshop on managing technical debt. ACM, pp 17–23

  • Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Third international workshop on predictor models in software engineering (PROMISE’07: ICSE workshops 2007). IEEE, pp 9–9

  • Zhang J, Sagar S, Shihab E (2013) The evolution of mobile apps: an exploratory study. In: Proceedings of the 2013 international workshop on software development lifecycle for mobile. ACM, pp 1–8

Download references

Acknowledgments

Gemma is partially supported by the European Commission grant no. 825040 (RADON). Fabio gratefully acknowledges the support of the Swiss National Science Foundation through the SNF Projects No. PZ00P2_186090 (TED). The authors would like to thank the Associate Editor and the anonymous Reviewers for their insightful comments provided during the peer-review process, which were instrumental to improve the quality of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabiano Pecorelli.

Additional information

Communicated by: Yann-Gaél Guéhéneuc, Shinpei Hayashi and Michel R. V. Chaudron

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: International Conference on Program Comprehension (ICPC)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pecorelli, F., Catolino, G., Ferrucci, F. et al. Software testing and Android applications: a large-scale empirical study. Empir Software Eng 27, 31 (2022). https://doi.org/10.1007/s10664-021-10059-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-10059-5

Keywords

Navigation