Skip to main content

Advertisement

Log in

Tracking bad updates in mobile apps: a search-based approach

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The rapid growth of the mobile applications development industry raises several new challenges to developers as they need to respond quickly to the users’ needs in a world of continuous changes. Indeed, mobile apps undergo frequent updates to introduce new features, fix reported issues or adapt to new technological or environment changes. Hence, introducing changes in this context is risky and can harmfully impact the application rating and competitiveness. Thus, ensuring that the application updates are deployed in a controlled way is of crucial importance. To better support mobile applications evolution and cut-off the costs of users dissatisfaction, we propose in this paper, AppTracker, a novel approach to automatically track bad release updates in Android applications (i.e., releases with higher percentage of negative reviews relative to the prior releases). We formulate the problem as a three-class classification problem to label the apps updates as bad, neutral or good. To solve this problem, we evolve bad release detection rules using Multi-Objective Genetic Programming (MOGP) based on the adaptation of the Non-dominated Sorting Genetic Algorithm (NSGA-II). In particular, the search process aims to provide the optimal trade-off between two conflicting objectives to deal with the considered classes. We evaluate our approach and investigate the performance of both within-project and cross-project validation scenarios on a benchmark of 50,700 updates from 1,717 free Android apps from Google Play Store. The statistical tests revealed that our approach achieves a clear advantage over machine learning approaches (e.g., random forest, decision tree, etc.) with significant improvements of 18% and 6% in terms of F1-score within-project and cross-project validations, respectively. Furthermore, the features analysis reveals that (1) the previous updates ratings and (2) the APK size are the most important features for both within and cross-project scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://play.google.com/

  2. https://play.google.com/store/apps/details?id=com.mobilemotion.dubsmash

  3. https://github.com/iBotPeaches/Apktool

  4. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html

  5. http://moeaframework.org/

References

  • Ahasanuzzaman M, Hassan S, Bezemer C-P, Hassan A E (2020) A longitudinal study of popular ad libraries in the google play store. Empir Softw Eng 25(1):824–858

    Article  Google Scholar 

  • Ahasanuzzaman M, Hassan S, Hassan A E (2020) Studying ad library integration strategies of top free-to-download apps. IEEE Trans Softw Eng

  • Akdeniz (2013) Google play crawler. available online:. https://github.com/Akdeniz/google-play-crawler, Accessed: 2021-03-1

  • Almarimi N, Ouni A, Chouchen M, Saidani I, Mkaouer MW (2020) On the detection of community smells using genetic programming-based ensemble classifier chain. In: 15th ACM international conference on global software engineering, pp 43–54

  • AppAnnie (2020) App annie. available online:. https://www.appannie.com/en/, Accessed: 2020-04-01

  • Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 33rd international conference on software engineering (ICSE), pp 1–10

  • Arcuri A, Fraser G (2011) On parameter tuning in search based software engineering. In: International symposium on search based software engineering. Springer, pp 33–47

  • Armstrong R A (2014) When to use the b onferroni correction. Ophthalmic Physiol Opt 34(5):502–508

    Article  Google Scholar 

  • Assi M, Hassan S, Tian Y, Zou Y (2021) Featcompare: Feature comparison for competing mobile apps leveraging user reviews. Empir Softw Eng 26 (5):94

    Article  Google Scholar 

  • Bhowan U, Zhang M, Johnston M (2010) Genetic programming for classification with unbalanced data. In: European conference on genetic programming, pp 1–13

  • Branco P, Torgo L, Ribeiro R P (2017) Relevance-based evaluation metrics for multi-class imbalanced domains. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 698–710

  • Breiman L (2001) Random forests. Machine Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Calciati P, Gorla A (2017) How do apps evolve in their permission requests? a preliminary study. In: IEEE/ACM 14th international conference on mining software repositories (MSR), pp 37–41

  • Calciati P, Kuznetsov K, Bai X, Gorla A (2018) What did really change with the new release of the app?. In: 15th international conference on mining software repositories (MSR), pp 142–152

  • Catolino G, Di Nucci D, Ferrucci F (2019) Cross-project just-in-time bug prediction for mobile apps: an empirical assessment. In: International conference on mobile software engineering and systems, pp 99–110

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  • Chen Q, Chen C, Hassan S, Xing Z, Xia X, Hassan A E (2021) How should I improve the UI of my app?: A study of user reviews of popular apps in the google play. ACM Trans Softw Eng Methodol (TOSEM) 30(3):37:1–37:38

    Article  Google Scholar 

  • Chen T, He T, Benesty M, Khotilovich V, Tang Y (2015) Xgboost: extreme gradient boosting. R package version 0.4-2, 1–4

  • Chen Z, Lu S (2007) A genetic programming approach for classification of textures based on wavelet analysis. In: 2007 IEEE international symposium on intelligent signal processing. IEEE, pp 1–6

  • Chicco D, Jurman G (2020) The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1):1–13

    Article  Google Scholar 

  • Ciurumelea A, Schaufelbühl A, Panichella S, Gall HC (2017) Analyzing reviews and code of mobile apps for better release planning. In: 24th IEEE international conference on software analysis, evolution and reengineering (SANER), pp 91–102

  • Darwish SM, EL-Zoghabi AA, Ebaid DB (2015) A novel system for document classification using genetic programming. J Adv Inform Technol, 6(4)

  • Dataset for bad releases detection (2021) Available at : https://github.com/stilab-ets/AppTracker

  • Deb K, Pratap A, Agarwal S, Meyarivan TAMT (2002). In: A fast and elitist multiobjective genetic algorithm: NSGA-II, vol 6, pp 182–197

  • Domínguez-Álvarez D, Gorla A (2019) Release practices for ios and android apps. In: ACM SIGSOFT International Workshop on App Market Analytics, pp 15–18

  • Eberius J, Braunschweig K, Hentsch M, Thiele M, Ahmadov A, Lehner W (2015) Building the dresden web table corpus: A classification approach. In: 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC). IEEE, pp 41–50

  • Espejo PG, Ventura S, Herrera F (2009) A survey on the application of genetic programming to classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40(2):121–144

    Article  Google Scholar 

  • Evans BP, Xue B, Zhang M (2019) What’s inside the black-box? a genetic programming method for interpreting complex machine learning models. In: Proceedings of the genetic and evolutionary computation conference, pp 1012–1020

  • Fisher A, Rudin C, Dominici F (2019) All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J Mach Learn Res 20(177):1–81

    MathSciNet  MATH  Google Scholar 

  • Gui J, Nagappan M, Halfond WGJ (2017) What aspects of mobile ads do users care about? an empirical study of mobile in-app ad reviews. arXiv:1702.07681

  • Hadka D Moea framework. http://moeaframework.org/, Accessed: 2020-12-01

  • Hamdi O, Ouni A, AlOmar EA, Cinnéide MO, Mkaouer MW (2021) An empirical study on the impact of refactoring on quality metrics in android applications. In: IEEE/ACM 8th international conference on mobile software engineering and systems (MobileSoft), pp 28–39

  • Hamdi O, Ouni A, Cinnéide MO, Mkaouer MW (2021) A longitudinal study of the impact of refactoring in android applications. Inf Softw Technol 140:106699

    Article  Google Scholar 

  • Harman M, Jia Y, Zhang Y (2012) App store mining and analysis: Msr for app stores. In: IEEE working conference on mining software repositories (MSR), pp 108–111

  • Harman M, Jones B F (2001) Search-based software engineering. Inform Softw Technol 43(14):833–839

    Article  Google Scholar 

  • Harman M, Mansouri SA, Zhang Y (2012) Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR) 45(1):11

    Article  Google Scholar 

  • Harman M, McMinn P, De Souza JT, Yoo S (2010) Search based software engineering: Techniques, taxonomy, tutorial. In: Empirical software engineering and verification. Springer, pp 1–59

  • Hassan MM, Ullah S, Hossain MS, Alelaiwi A (2020) An end-to-end deep learning model for human activity recognition from highly sparse body sensor data in internet of medical things environment. The Journal of Supercomputing, 1–14

  • Hassan S, Bezemer C-P, Hassan AE (2018) Studying bad updates of top free-to-download apps in the google play store. IEEE Trans Softw Eng

  • Hassan S, Shang W, Hassan AE (2017) An empirical study of emergency updates for top android mobile apps. Empir Softw Eng 22(1):505–546

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, Berlin

    Book  MATH  Google Scholar 

  • Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Mining Know Manag Process 5(2):1

    Article  Google Scholar 

  • Hu H, Wang S, Bezemer C-P, Hassan AE (2019) Studying the consistency of star ratings and reviews of popular free hybrid android and ios apps. Empir Softw Eng 24(1):7–32

    Article  Google Scholar 

  • Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 159–170

  • Kabinna S, Bezemer C-P, Shang W, Syer MD, Hassan AE (2018) Examining the stability of logging statements. Empir Softw Eng 23(1):290–333

    Article  Google Scholar 

  • Kessentini M, Ouni A (2017) Detecting android smells using multi-objective genetic programming. In: Proceedings of the 4th international conference on mobile software engineering and systems, pp 122–132

  • Kessentini W, Kessentini M, Sahraoui H, Bechikh S, Ouni A (2014) A cooperative parallel search-based software engineering approach for code-smells detection. IEEE Trans Softw Eng 40(9):841–861

    Article  Google Scholar 

  • Khalid H, Shihab E, Nagappan M, Hassan A E (2014) What do mobile app users complain about?. IEEE Softw 32(3):70–77

    Article  Google Scholar 

  • Kishore JK, Patnaik LM, Mani V, Agrawal VK (2000) Application of genetic programming for multicategory pattern classification. IEEE Trans Evolution Comput 4(3):242–258

    Article  Google Scholar 

  • Klepper S, Krusche S, Peters S, Bruegge B, Alperowitz L (2015) Introducing continuous delivery of mobile apps in a corporate environment: A case study. In: 2015 IEEE/ACM 2nd international workshop on rapid continuous software engineering. IEEE, pp 5–11

  • learn S (2006) Scikit-learn classification and regression models. https://scikit-learn.org/stable/supervised_learning, Accessed: 2021-01-10

  • learn S (2006) Scikit-learn multiclass-classification. https://scikit-learn.org/stable/modules/multiclass.html#multiclass-classification, Accessed: 2021-01-10

  • Li H, Shang W, Zou Y, Hassan AE (2017) Towards just-in-time suggestions for log changes. Empir Softw Eng 22(4):1831–1865

    Article  Google Scholar 

  • Loveard T, Ciesielski V (2001) Representing classification problems in genetic programming. In: Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546), vol 2. IEEE, pp 1070–1077

  • Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? on automatically classifying app reviews. In: 2015 IEEE 23rd international requirements engineering conference (RE). IEEE, pp 116–125

  • Martens D, Maalej W (2019) Release early, release often, and watch your users’ emotions: Lessons from emotional patterns. IEEE Softw 36(5):32–37

    Article  Google Scholar 

  • Martin W, Sarro F, Harman M (2016) Causal impact analysis for app releases in google play. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 435–446

  • Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2016) A survey of app store analysis for software engineering. IEEE Trans Softw Eng 43 (9):817–847

    Article  Google Scholar 

  • Mkaouer W, Kessentini M, Shaout A, Koligheu P, Bechikh S, Deb K, Ouni A (2015) Many-objective software remodularization using nsga-iii. ACM Trans Softw Eng Methodol (TOSEM) 24(3):17

    Article  Google Scholar 

  • Nayebi M, Adams B, Ruhe G (2016) Release practices for mobile apps – what do users and developers think?. In: IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1, pp 552–562

  • Nayebi M, Farahi H, Ruhe G (2017) Which version should be released to app store?. In: ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp 324–333

  • Nejati S, Gay G (2019) 11th international symposium search-based software engineering. vol 11664

  • Noei E, Syer M D, Zou Y, Hassan A E, Keivanloo I (2017) A study of the relation of mobile device attributes with the user-perceived quality of android apps. Empir Softw Eng 22(6):3088–3116

    Article  Google Scholar 

  • Openja M, Adams B, Khomh F (2020) Analysis of modern release engineering topics:–a large-scale study using stackoverflow–. In: IEEE international conference on software maintenance and evolution (ICSME), pp 104–114

  • Ouni A (2020) Search based software engineering: challenges, opportunities and recent applications. In: Genetic and evolutionary computation conference (GECCO), pp 1114–1146

  • Ouni A, Kessentini M, Inoue K, Cinnéide MO (2015) Search-based web service antipatterns detection. IEEE Trans Serv Comput 10(4):603–617

    Article  Google Scholar 

  • Ouni A, Kessentini M, Sahraoui H, Boukadoum M (2013) Maintainability defects detection and correction: a multi-objective approach. Autom Softw Eng 20(1):47–79

    Article  Google Scholar 

  • Ouni A, Kessentini M, Sahraoui H, Hamdi M S (2012) Search-based refactoring: Towards semantics preservation. In: IEEE international conference on software maintenance (ICSM), pp 347–356

  • Ouni A, Kessentini M, Sahraoui H, Inoue K, Deb K (2016) Multi-criteria code refactoring using search-based software engineering: An industrial case study. ACM Trans Softw Eng Methodol (TOSEM) 25(3):23

    Article  Google Scholar 

  • Pagano D, Maalej W (2013) User feedback in the appstore: An empirical study. In: 21st IEEE international requirements engineering conference (RE), pp 125–134

  • Palomba F, Linares-Vasquez M, Bavota G, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2015) User reviews matter! tracking crowdsourced reviews to support evolution of successful apps. In: IEEE international conference on software maintenance and evolution (ICSME), pp 291–300

  • Palomba F, Salza P, Ciurumelea A, Panichella S, Gall H, Ferrucci F, De Lucia A (2017) Recommending and localizing change requests for mobile apps based on user reviews. In: IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp 106–117

  • Panichella S, Di Sorbo A, Guzman E, Visaggio CA, Canfora G, Gall HC (2015) How can i improve my app? classifying user reviews for software maintenance and evolution. In: 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 281–290

  • Panichella S, Di Sorbo A, Guzman E, Visaggio CA, Canfora G, Gall HC (2016) Ardoc: App reviews development oriented classifier. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 1023–1027

  • Qiu F, Yan M, Xia X, Wang X, Fan Y, Hassan A E, Lo D (2020) Jito: a tool for just-in-time defect identification and localization. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 1586–1590

  • Rocha A, Goldenstein SK (2013) Multiclass from binary: Expanding one-versus-all, one-versus-one and ecoc-based approaches. IEEE Trans Neural Netw Learn Syst 25(2):289–302

    Article  Google Scholar 

  • Royston P (1992) Approximating the shapiro-wilk w-test for non-normality. Stat Comput 2(3):117–119

    Article  Google Scholar 

  • Saidani I, Ouni A, Chouchen M, Mkaouer M W (2020) Predicting continuous integration build failures using evolutionary search. Inf Softw Technol 128:106392

    Article  Google Scholar 

  • Saidani I, Ouni A, Mkaouer W (2021) Detecting skipped commits in continuous integration using multi-objective evolutionary search. IEEE Trans Softw Eng

  • Sarro F, Harman M, Jia Y, Zhang Y (2018) Customer rating reactions can be predicted purely using app features. In: IEEE 26th international requirements engineering conference (RE), pp 76–87

  • Scalabrino S, Grano G, Di Nucci D, Oliveto R, De Lucia A (2016) Search-based testing of procedural programs: Iterative single-target or multi-target approach?. In: International symposium on search based software engineering, pp 64–79

  • Scikit-learn.org (2006) Parameter estimation using grid search with scikit-learn. available online:. https://scikit-learn.org/stable/modules/grid_search.html, Accessed: 2020-12-01

  • Smart W, Zhang M (2005) Using genetic programming for multiclass classification by simultaneously solving component binary classification problems. In: European conference on genetic programming. Springer, pp 227–239

  • Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inform Process Manag 45(4):427–437

    Article  Google Scholar 

  • Su T, Fan L, Chen S, Liu Y, Xu L, Pu G, Su Z (2020) Why my app crashes understanding and benchmarking framework-specific exceptions of android apps. IEEE Trans Softw Eng

  • Tanha J, Abdi Y, Samadi N, Razzaghi N, Asadpour M (2020) Boosting methods for multi-class imbalanced data classification: an experimental review. J Big Data 7(1):1–47

    Article  Google Scholar 

  • Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, Matsumoto K (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1. IEEE, pp 812–823

  • Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. (1)

  • Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization for defect prediction models

  • Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Trans Softw Eng 45(7):683–711

    Article  Google Scholar 

  • Thomas SW, Hemmati H, Hassan AE, Blostein D (2014) Static test case prioritization using topic models. Empir Softw Eng 19(1):182–212

    Article  Google Scholar 

  • Tian Y, Nagappan M, Lo D, Hassan AE (2015) What are the characteristics of high-rated apps? a case study on free android applications. In: IEEE international conference on software maintenance and evolution (ICSME), pp 301–310

  • Vargha A, Delaney HD (2000) A critique and improvement of the cl common language effect size statistics of mcgraw and wong. J Educ Behav Stat 25 (2):101–132

    Google Scholar 

  • Villarroel L, Bavota G, Russo B, Oliveto R, Di Penta M (2016) Release planning of mobile apps based on user reviews. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 14–24

  • Wilcoxon F, Katti SK, Wilcox R A (1970) Critical values and probability levels for the wilcoxon rank sum test and the wilcoxon signed rank test. Select Table Math Stat 1:171–259

    MATH  Google Scholar 

  • XGBoost (2006) Xgboost python package. https://xgboost.readthedocs.io/en/latest/python/index.html, Accessed: 2021-01-10

  • Xia J, Li Y, Wang C (2017) An empirical study on the cross-project predictability of continuous integration outcomes. In: 14th Web information systems and applications conference (WISA), pp 234–239

  • Xia X, Shihab E, Kamei Y, Lo D, Wang X (2016) Predicting crashing releases of mobile applications. In: Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement, pp 1–10

  • Yan M, Xia X, Fan Y, Hassan AE, Lo D, Li S (2020) Just-in-time defect identification and localization: A two-phase framework. IEEE Trans Softw Eng

  • Yan M, Xia X, Fan Y, Lo D, Hassan AE, Zhang X (2020) Effort-aware just-in-time defect identification in practice: a case study at alibaba. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 1308–1319

  • Yang AZH, Hassan S, Zou Y, Hassan AE (2021) An empirical study on release notes patterns of popular apps in the google play store. Empir Softw Eng, 1–41

  • Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 157–168

  • Zar J H (2005) Spearman rank correlation. Encyclopedia Biostat. vol. 7

  • Zarif OE, da Costa DA, Hassan S, Zou Y (2020) On the relationship between user churn and software issues. In: 17th international conference on mining software repositories (MSR). ACM, pp 339–349

Download references

Acknowledgements

This research has been funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) RGPIN-2018-05960.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Islem Saidani.

Additional information

Communicated by: Aldeida Aleti, Annibale Panichella, Shin Yoo

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Advances in Search-Based Software Engineering (SSBSE)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saidani, I., Ouni, A., Ahasanuzzaman, M. et al. Tracking bad updates in mobile apps: a search-based approach. Empir Software Eng 27, 81 (2022). https://doi.org/10.1007/s10664-022-10125-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10125-6

Keywords

Navigation