Abstract
The rapid growth of the mobile applications development industry raises several new challenges to developers as they need to respond quickly to the users’ needs in a world of continuous changes. Indeed, mobile apps undergo frequent updates to introduce new features, fix reported issues or adapt to new technological or environment changes. Hence, introducing changes in this context is risky and can harmfully impact the application rating and competitiveness. Thus, ensuring that the application updates are deployed in a controlled way is of crucial importance. To better support mobile applications evolution and cut-off the costs of users dissatisfaction, we propose in this paper, AppTracker, a novel approach to automatically track bad release updates in Android applications (i.e., releases with higher percentage of negative reviews relative to the prior releases). We formulate the problem as a three-class classification problem to label the apps updates as bad, neutral or good. To solve this problem, we evolve bad release detection rules using Multi-Objective Genetic Programming (MOGP) based on the adaptation of the Non-dominated Sorting Genetic Algorithm (NSGA-II). In particular, the search process aims to provide the optimal trade-off between two conflicting objectives to deal with the considered classes. We evaluate our approach and investigate the performance of both within-project and cross-project validation scenarios on a benchmark of 50,700 updates from 1,717 free Android apps from Google Play Store. The statistical tests revealed that our approach achieves a clear advantage over machine learning approaches (e.g., random forest, decision tree, etc.) with significant improvements of 18% and 6% in terms of F1-score within-project and cross-project validations, respectively. Furthermore, the features analysis reveals that (1) the previous updates ratings and (2) the APK size are the most important features for both within and cross-project scenarios.












Similar content being viewed by others
References
Ahasanuzzaman M, Hassan S, Bezemer C-P, Hassan A E (2020) A longitudinal study of popular ad libraries in the google play store. Empir Softw Eng 25(1):824–858
Ahasanuzzaman M, Hassan S, Hassan A E (2020) Studying ad library integration strategies of top free-to-download apps. IEEE Trans Softw Eng
Akdeniz (2013) Google play crawler. available online:. https://github.com/Akdeniz/google-play-crawler, Accessed: 2021-03-1
Almarimi N, Ouni A, Chouchen M, Saidani I, Mkaouer MW (2020) On the detection of community smells using genetic programming-based ensemble classifier chain. In: 15th ACM international conference on global software engineering, pp 43–54
AppAnnie (2020) App annie. available online:. https://www.appannie.com/en/, Accessed: 2020-04-01
Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 33rd international conference on software engineering (ICSE), pp 1–10
Arcuri A, Fraser G (2011) On parameter tuning in search based software engineering. In: International symposium on search based software engineering. Springer, pp 33–47
Armstrong R A (2014) When to use the b onferroni correction. Ophthalmic Physiol Opt 34(5):502–508
Assi M, Hassan S, Tian Y, Zou Y (2021) Featcompare: Feature comparison for competing mobile apps leveraging user reviews. Empir Softw Eng 26 (5):94
Bhowan U, Zhang M, Johnston M (2010) Genetic programming for classification with unbalanced data. In: European conference on genetic programming, pp 1–13
Branco P, Torgo L, Ribeiro R P (2017) Relevance-based evaluation metrics for multi-class imbalanced domains. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, pp 698–710
Breiman L (2001) Random forests. Machine Learn 45(1):5–32
Calciati P, Gorla A (2017) How do apps evolve in their permission requests? a preliminary study. In: IEEE/ACM 14th international conference on mining software repositories (MSR), pp 37–41
Calciati P, Kuznetsov K, Bai X, Gorla A (2018) What did really change with the new release of the app?. In: 15th international conference on mining software repositories (MSR), pp 142–152
Catolino G, Di Nucci D, Ferrucci F (2019) Cross-project just-in-time bug prediction for mobile apps: an empirical assessment. In: International conference on mobile software engineering and systems, pp 99–110
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chen Q, Chen C, Hassan S, Xing Z, Xia X, Hassan A E (2021) How should I improve the UI of my app?: A study of user reviews of popular apps in the google play. ACM Trans Softw Eng Methodol (TOSEM) 30(3):37:1–37:38
Chen T, He T, Benesty M, Khotilovich V, Tang Y (2015) Xgboost: extreme gradient boosting. R package version 0.4-2, 1–4
Chen Z, Lu S (2007) A genetic programming approach for classification of textures based on wavelet analysis. In: 2007 IEEE international symposium on intelligent signal processing. IEEE, pp 1–6
Chicco D, Jurman G (2020) The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics 21(1):1–13
Ciurumelea A, Schaufelbühl A, Panichella S, Gall HC (2017) Analyzing reviews and code of mobile apps for better release planning. In: 24th IEEE international conference on software analysis, evolution and reengineering (SANER), pp 91–102
Darwish SM, EL-Zoghabi AA, Ebaid DB (2015) A novel system for document classification using genetic programming. J Adv Inform Technol, 6(4)
Dataset for bad releases detection (2021) Available at : https://github.com/stilab-ets/AppTracker
Deb K, Pratap A, Agarwal S, Meyarivan TAMT (2002). In: A fast and elitist multiobjective genetic algorithm: NSGA-II, vol 6, pp 182–197
Domínguez-Álvarez D, Gorla A (2019) Release practices for ios and android apps. In: ACM SIGSOFT International Workshop on App Market Analytics, pp 15–18
Eberius J, Braunschweig K, Hentsch M, Thiele M, Ahmadov A, Lehner W (2015) Building the dresden web table corpus: A classification approach. In: 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC). IEEE, pp 41–50
Espejo PG, Ventura S, Herrera F (2009) A survey on the application of genetic programming to classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40(2):121–144
Evans BP, Xue B, Zhang M (2019) What’s inside the black-box? a genetic programming method for interpreting complex machine learning models. In: Proceedings of the genetic and evolutionary computation conference, pp 1012–1020
Fisher A, Rudin C, Dominici F (2019) All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J Mach Learn Res 20(177):1–81
Gui J, Nagappan M, Halfond WGJ (2017) What aspects of mobile ads do users care about? an empirical study of mobile in-app ad reviews. arXiv:1702.07681
Hadka D Moea framework. http://moeaframework.org/, Accessed: 2020-12-01
Hamdi O, Ouni A, AlOmar EA, Cinnéide MO, Mkaouer MW (2021) An empirical study on the impact of refactoring on quality metrics in android applications. In: IEEE/ACM 8th international conference on mobile software engineering and systems (MobileSoft), pp 28–39
Hamdi O, Ouni A, Cinnéide MO, Mkaouer MW (2021) A longitudinal study of the impact of refactoring in android applications. Inf Softw Technol 140:106699
Harman M, Jia Y, Zhang Y (2012) App store mining and analysis: Msr for app stores. In: IEEE working conference on mining software repositories (MSR), pp 108–111
Harman M, Jones B F (2001) Search-based software engineering. Inform Softw Technol 43(14):833–839
Harman M, Mansouri SA, Zhang Y (2012) Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR) 45(1):11
Harman M, McMinn P, De Souza JT, Yoo S (2010) Search based software engineering: Techniques, taxonomy, tutorial. In: Empirical software engineering and verification. Springer, pp 1–59
Hassan MM, Ullah S, Hossain MS, Alelaiwi A (2020) An end-to-end deep learning model for human activity recognition from highly sparse body sensor data in internet of medical things environment. The Journal of Supercomputing, 1–14
Hassan S, Bezemer C-P, Hassan AE (2018) Studying bad updates of top free-to-download apps in the google play store. IEEE Trans Softw Eng
Hassan S, Shang W, Hassan AE (2017) An empirical study of emergency updates for top android mobile apps. Empir Softw Eng 22(1):505–546
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, Berlin
Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Mining Know Manag Process 5(2):1
Hu H, Wang S, Bezemer C-P, Hassan AE (2019) Studying the consistency of star ratings and reviews of popular free hybrid android and ios apps. Empir Softw Eng 24(1):7–32
Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 159–170
Kabinna S, Bezemer C-P, Shang W, Syer MD, Hassan AE (2018) Examining the stability of logging statements. Empir Softw Eng 23(1):290–333
Kessentini M, Ouni A (2017) Detecting android smells using multi-objective genetic programming. In: Proceedings of the 4th international conference on mobile software engineering and systems, pp 122–132
Kessentini W, Kessentini M, Sahraoui H, Bechikh S, Ouni A (2014) A cooperative parallel search-based software engineering approach for code-smells detection. IEEE Trans Softw Eng 40(9):841–861
Khalid H, Shihab E, Nagappan M, Hassan A E (2014) What do mobile app users complain about?. IEEE Softw 32(3):70–77
Kishore JK, Patnaik LM, Mani V, Agrawal VK (2000) Application of genetic programming for multicategory pattern classification. IEEE Trans Evolution Comput 4(3):242–258
Klepper S, Krusche S, Peters S, Bruegge B, Alperowitz L (2015) Introducing continuous delivery of mobile apps in a corporate environment: A case study. In: 2015 IEEE/ACM 2nd international workshop on rapid continuous software engineering. IEEE, pp 5–11
learn S (2006) Scikit-learn classification and regression models. https://scikit-learn.org/stable/supervised_learning, Accessed: 2021-01-10
learn S (2006) Scikit-learn multiclass-classification. https://scikit-learn.org/stable/modules/multiclass.html#multiclass-classification, Accessed: 2021-01-10
Li H, Shang W, Zou Y, Hassan AE (2017) Towards just-in-time suggestions for log changes. Empir Softw Eng 22(4):1831–1865
Loveard T, Ciesielski V (2001) Representing classification problems in genetic programming. In: Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546), vol 2. IEEE, pp 1070–1077
Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? on automatically classifying app reviews. In: 2015 IEEE 23rd international requirements engineering conference (RE). IEEE, pp 116–125
Martens D, Maalej W (2019) Release early, release often, and watch your users’ emotions: Lessons from emotional patterns. IEEE Softw 36(5):32–37
Martin W, Sarro F, Harman M (2016) Causal impact analysis for app releases in google play. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 435–446
Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2016) A survey of app store analysis for software engineering. IEEE Trans Softw Eng 43 (9):817–847
Mkaouer W, Kessentini M, Shaout A, Koligheu P, Bechikh S, Deb K, Ouni A (2015) Many-objective software remodularization using nsga-iii. ACM Trans Softw Eng Methodol (TOSEM) 24(3):17
Nayebi M, Adams B, Ruhe G (2016) Release practices for mobile apps – what do users and developers think?. In: IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1, pp 552–562
Nayebi M, Farahi H, Ruhe G (2017) Which version should be released to app store?. In: ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp 324–333
Nejati S, Gay G (2019) 11th international symposium search-based software engineering. vol 11664
Noei E, Syer M D, Zou Y, Hassan A E, Keivanloo I (2017) A study of the relation of mobile device attributes with the user-perceived quality of android apps. Empir Softw Eng 22(6):3088–3116
Openja M, Adams B, Khomh F (2020) Analysis of modern release engineering topics:–a large-scale study using stackoverflow–. In: IEEE international conference on software maintenance and evolution (ICSME), pp 104–114
Ouni A (2020) Search based software engineering: challenges, opportunities and recent applications. In: Genetic and evolutionary computation conference (GECCO), pp 1114–1146
Ouni A, Kessentini M, Inoue K, Cinnéide MO (2015) Search-based web service antipatterns detection. IEEE Trans Serv Comput 10(4):603–617
Ouni A, Kessentini M, Sahraoui H, Boukadoum M (2013) Maintainability defects detection and correction: a multi-objective approach. Autom Softw Eng 20(1):47–79
Ouni A, Kessentini M, Sahraoui H, Hamdi M S (2012) Search-based refactoring: Towards semantics preservation. In: IEEE international conference on software maintenance (ICSM), pp 347–356
Ouni A, Kessentini M, Sahraoui H, Inoue K, Deb K (2016) Multi-criteria code refactoring using search-based software engineering: An industrial case study. ACM Trans Softw Eng Methodol (TOSEM) 25(3):23
Pagano D, Maalej W (2013) User feedback in the appstore: An empirical study. In: 21st IEEE international requirements engineering conference (RE), pp 125–134
Palomba F, Linares-Vasquez M, Bavota G, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2015) User reviews matter! tracking crowdsourced reviews to support evolution of successful apps. In: IEEE international conference on software maintenance and evolution (ICSME), pp 291–300
Palomba F, Salza P, Ciurumelea A, Panichella S, Gall H, Ferrucci F, De Lucia A (2017) Recommending and localizing change requests for mobile apps based on user reviews. In: IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp 106–117
Panichella S, Di Sorbo A, Guzman E, Visaggio CA, Canfora G, Gall HC (2015) How can i improve my app? classifying user reviews for software maintenance and evolution. In: 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 281–290
Panichella S, Di Sorbo A, Guzman E, Visaggio CA, Canfora G, Gall HC (2016) Ardoc: App reviews development oriented classifier. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 1023–1027
Qiu F, Yan M, Xia X, Wang X, Fan Y, Hassan A E, Lo D (2020) Jito: a tool for just-in-time defect identification and localization. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 1586–1590
Rocha A, Goldenstein SK (2013) Multiclass from binary: Expanding one-versus-all, one-versus-one and ecoc-based approaches. IEEE Trans Neural Netw Learn Syst 25(2):289–302
Royston P (1992) Approximating the shapiro-wilk w-test for non-normality. Stat Comput 2(3):117–119
Saidani I, Ouni A, Chouchen M, Mkaouer M W (2020) Predicting continuous integration build failures using evolutionary search. Inf Softw Technol 128:106392
Saidani I, Ouni A, Mkaouer W (2021) Detecting skipped commits in continuous integration using multi-objective evolutionary search. IEEE Trans Softw Eng
Sarro F, Harman M, Jia Y, Zhang Y (2018) Customer rating reactions can be predicted purely using app features. In: IEEE 26th international requirements engineering conference (RE), pp 76–87
Scalabrino S, Grano G, Di Nucci D, Oliveto R, De Lucia A (2016) Search-based testing of procedural programs: Iterative single-target or multi-target approach?. In: International symposium on search based software engineering, pp 64–79
Scikit-learn.org (2006) Parameter estimation using grid search with scikit-learn. available online:. https://scikit-learn.org/stable/modules/grid_search.html, Accessed: 2020-12-01
Smart W, Zhang M (2005) Using genetic programming for multiclass classification by simultaneously solving component binary classification problems. In: European conference on genetic programming. Springer, pp 227–239
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inform Process Manag 45(4):427–437
Su T, Fan L, Chen S, Liu Y, Xu L, Pu G, Su Z (2020) Why my app crashes understanding and benchmarking framework-specific exceptions of android apps. IEEE Trans Softw Eng
Tanha J, Abdi Y, Samadi N, Razzaghi N, Asadpour M (2020) Boosting methods for multi-class imbalanced data classification: an experimental review. J Big Data 7(1):1–47
Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, Matsumoto K (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1. IEEE, pp 812–823
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. (1)
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization for defect prediction models
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2018) The impact of automated parameter optimization on defect prediction models. IEEE Trans Softw Eng 45(7):683–711
Thomas SW, Hemmati H, Hassan AE, Blostein D (2014) Static test case prioritization using topic models. Empir Softw Eng 19(1):182–212
Tian Y, Nagappan M, Lo D, Hassan AE (2015) What are the characteristics of high-rated apps? a case study on free android applications. In: IEEE international conference on software maintenance and evolution (ICSME), pp 301–310
Vargha A, Delaney HD (2000) A critique and improvement of the cl common language effect size statistics of mcgraw and wong. J Educ Behav Stat 25 (2):101–132
Villarroel L, Bavota G, Russo B, Oliveto R, Di Penta M (2016) Release planning of mobile apps based on user reviews. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 14–24
Wilcoxon F, Katti SK, Wilcox R A (1970) Critical values and probability levels for the wilcoxon rank sum test and the wilcoxon signed rank test. Select Table Math Stat 1:171–259
XGBoost (2006) Xgboost python package. https://xgboost.readthedocs.io/en/latest/python/index.html, Accessed: 2021-01-10
Xia J, Li Y, Wang C (2017) An empirical study on the cross-project predictability of continuous integration outcomes. In: 14th Web information systems and applications conference (WISA), pp 234–239
Xia X, Shihab E, Kamei Y, Lo D, Wang X (2016) Predicting crashing releases of mobile applications. In: Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement, pp 1–10
Yan M, Xia X, Fan Y, Hassan AE, Lo D, Li S (2020) Just-in-time defect identification and localization: A two-phase framework. IEEE Trans Softw Eng
Yan M, Xia X, Fan Y, Lo D, Hassan AE, Zhang X (2020) Effort-aware just-in-time defect identification in practice: a case study at alibaba. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 1308–1319
Yang AZH, Hassan S, Zou Y, Hassan AE (2021) An empirical study on release notes patterns of popular apps in the google play store. Empir Softw Eng, 1–41
Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 157–168
Zar J H (2005) Spearman rank correlation. Encyclopedia Biostat. vol. 7
Zarif OE, da Costa DA, Hassan S, Zou Y (2020) On the relationship between user churn and software issues. In: 17th international conference on mining software repositories (MSR). ACM, pp 339–349
Acknowledgements
This research has been funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) RGPIN-2018-05960.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Aldeida Aleti, Annibale Panichella, Shin Yoo
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Advances in Search-Based Software Engineering (SSBSE)
Rights and permissions
About this article
Cite this article
Saidani, I., Ouni, A., Ahasanuzzaman, M. et al. Tracking bad updates in mobile apps: a search-based approach. Empir Software Eng 27, 81 (2022). https://doi.org/10.1007/s10664-022-10125-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-022-10125-6