Abstract
Given the competitive mobile app market, developers must be fully aware of users’ needs, satisfy users’ requirements, combat apps of similar functionalities (i.e., competing apps), and thus stay ahead of the competition. While it is easy to track the overall user ratings of competing apps, such information fails to provide actionable insights for developers to improve their apps over the competing apps (AlSubaihin et al., IEEE Trans Softw Eng, 1–1, 2019). Thus, developers still need to read reviews from all their interested competing apps and summarize the advantages and disadvantages of each app. Such a manual process can be tedious and even infeasible with thousands of reviews posted daily. To help developers compare users’ opinions among competing apps on high-level features, such as the main functionalities and the main characteristics of an app, we propose a review analysis approach named FeatCompare. FeatCompare can automatically identify high-level features mentioned in user reviews without any manually annotated resource. Then, FeatCompare creates a comparative table that summarizes users’ opinions for each identified feature across competing apps. FeatCompare features a novel neural network-based model named G lobal-L ocal sensitive F eature E xtractor (GLFE), which extends Attention-based Aspect Extraction (ABAE), a state-of-the-art model for extracting high-level features from reviews. We evaluate the effectiveness of GLFE on 480 manually annotated reviews sampled from five groups of competing apps. Our experiment results show that GLFE achieves a precision of 79%-82% and recall of 74%-77% in identifying the high-level features associated with reviews and outperforms ABAE by 14.7% on average. We also conduct a case study to demonstrate the usage scenarios of FeatCompare. A survey with 107 mobile app developers shows that more than 70% of developers agree that FeatCompare is of great benefit.
Similar content being viewed by others
Notes
Note that all high-level features are hidden, meaning they are represented using embeddings. The semantic meaning of a high-level feature can be identified by searching the most representative words, the embeddings of which are close to the embedding of the high-level feature.
We normalize document vectors to unit Euclidean length.
References
Akdeniz Google Play Crawler. https://github.com/akdeniz/google-play-crawler, 2013 (Last accessed: March 2020
AlSubaihin A, Sarro F, Black S, Capra L, Harman M (2019) App store effects on software engineering practices. IEEE Trans Softw Eng 47 (2):300–319
AppAnnie App Annie. https://www.appannie.com/, 2016 (Last accessed: March 2020
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Carreõ LVG, Winbladh K (2013) Analysis of user comments: An approach for software requirements evolution. In: 2013 35Th international conference on software engineering (ICSE), pp 582–591
Chen N, Lin J, Hoi SCH, Xiao X, Zhang B (2014) AR-Miner: mining informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th International Conference on Software Engineering, ICSE ’14, pp 767–778
Chen Z, Mukherjee A, Liu B (2014) Aspect extraction with automated prior knowledge learning. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol 1, 06
Dalpiaz F, Parente M (2019) RE-SWOT: From user feedback to requirements via competitor analysis. In: Proceedings of the 25th International Working Conference on Requirements Engineering: Foundation for Software Quality, volume 11412 of REFSQ ’19, pp 55–70
Di Sorbo A, Panichella S, Alexandru CV, Visaggio CA, Canfora G (2017) SURF: Summarizer of user reviews feedback. In: Proceedings of the 39th International Conference on Software Engineering Companion, ICSE-C ’17, IEEE, pp 55–58
El Zarif O, Da Costa DA, Hassan S, Zou Y (2020) On the relationship between user churn and software issues. In: Proceedings of the 17th International Conference on Mining Software Repositories, MSR ’20. Association for Computing Machinery, New York, pp 339–349
eMarketer Number of apps available in leading app stores as of 4th quarter 2019. https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/, 2020 (Last accessed March 2020
Fu B, Lin J, Li L, Faloutsos C, Hong J, Sadeh N (2013) Why people hate your app: Making sense of user feedback in a mobile app store. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13, pp 1276–1284
Gao C, Zeng J, Lyu MR, King I (2018) Online app review analysis for identifying emerging issues. In: Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, pp 48–58
Gorla A, Tavecchia I, Gross F, Zeller A (2014) Checking app behavior against app descriptions. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014. Association for Computing Machinery, New York, pp 1025–1035
Gu X, Kim S (2015) “What parts of your apps are loved by users?” (T). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, ASE ’15, pp 760–770
Guzman E, Maalej W (2014) How do users like this feature? a fine grained sentiment analysis of app reviews. In: Proceedings of the 22nd International Requirements Engineering Conference, RE ’14, pp 153–162
Hassan S, Tantithamthavorn C, Bezemer C, Hassan AE (2020) Studying the dialogue between users and developers of free apps in the google play store. IEEE Trans Softw Eng 46(7):773–793
He R, Lee WS, Ng HT, Dahlmeier D (2017) An unsupervised neural attention model for aspect extraction. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL ’17, pp 388–397
Iacob C, Harrison R (2013) Retrieving and analyzing mobile apps feature requests from online reviews. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pp 41–44
Iacob C, Harrison R, Faily S (2013) Online reviews as first class artifacts in mobile app development. In: Proceedings of the 5th International Conference on Mobile Computing, Applications, and Services, MobiCASE ’13, pp 47–53
JLB DPK (2015) Adam: a method for stochastic optimization. In: 3Rd international conference for learning representations, San Diego
Johann T, Stanik C, A. M. A. B., Maalej W (2017) SAFE: A simple approach for feature extraction from app descriptions and app reviews. Proceedings of the 25th International Requirements Engineering Conference, RE ’17, pp 21–30
Keertipati S, Savarimuthu BTR, Licorish SA (2016) Approaches for prioritizing feature improvements extracted from app reviews. in: Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering, EASE ’16, pp 1–6
Kim S-M, Pantel P, Chklovski T, Pennacchiotti M (2006) Automatically assessing review helpfulness. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP ’06, pp 423–430
Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics 3:211–225
Li X, Jiang H, Liu D, Ren Z, Li G (2018) Unsupervised deep bug report summarization. In: Proceedings of the 26th Conference on Program Comprehension, ICPC ’18. Association for Computing Machinery, New York, pp 144–155
Li Y, Jia B, Guo Y, Chen X (2017) Mining user reviews for mobile app comparisons. Proceedings of the ACM on Interactive Mobile, Wearable and Ubiquitous Technologies 1(3):75,1–75,15
Lim SL, Bentley PJ, Kanakam N, Ishikawa F, Honiden S (2015) Investigating country differences in mobile app user behavior and challenges for software engineering. IEEE Trans Softw Eng 41(1):40–64
Lovric M (ed) (2011) International Encyclopedia of Statistical Science. Springer, Berlin
Lu M, Liang P (2017) Automatic classification of non-functional requirements from augmented app user reviews. In: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering, EASE’17, pp 344–353
Ma S, Wang S, Lo D, Deng RH, Sun C (2015) Active semi-supervised approach for checking app behavior against its description. In: 2015 IEEE 39Th annual computer software and applications conference, vol 2, pp 179–184
Man Y, Gao C, Lyu MR, Jiang J (2016) Experience report: Understanding cross-platform app issues from user reviews. In: Proceedings of the 27th IEEE International Symposium on Software Reliability Engineering, ISSRE’16, pp 138–149
Martin P 77% will not download a retail app rated lower than 3 stars. https://blog.testmunk.com/77-will-not-download-a-retail-app-rated-lower-than-3-stars/ Last accessed: July 2017
McHugh M (2012) Interrater reliability: The kappa statistic. Biochemia medica : časopis Hrvatskoga društva medicinskih biokemičara / HDMB 22:276–82, 10
McIlroy S, Ali N, Khalid H, Hassan AE (2016) Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empir Softw Eng 21(3):1067–1106
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Proceedings of the 1st International Conference on Learning Representations, ICLR’13, pp 1–12
Mukherjee A, Liu B (2012) Aspect extraction through semi-supervised modeling. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 339–348
Nayebi M, Adams B, Ruhe G (2016) Release practices for mobile apps – what do users and developers think?. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol 1, pp 552–562
Nayebi M, Farahi H, Ruhe G (2017) Which version should be released to app store?. In: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp 324–333
Noei E, da Costa DA, Zou Y (2018) Winning the app production rally. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’18, pp 283–294
Pagano D, Maalej W (2013) User feedback in the appstore: an empirical study. In: 2013 21St IEEE international requirements engineering conference (RE), pp 125–134
Panichella S, Sorbo AD, Guzman E, Visaggio CA, Canfora G, Gall HC (2015) How can i improve my app? classifying user reviews for software maintenance and evolution. In: Proceedings of the 31st International Conference on Software Maintenance and Evolution, ICSME ’15, pp 281–290
Pennington J, Socher R, Manning C (2014) GloVe: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, pp 1532–1543
Ramos J (2003) Using TF-IDF to determine word relevance in document queries. In: Proceedings of the 1st instructional Conference on Machine Learning, iCML ’03, pp 1–4
Rezaeinia SM, Rahmani R, Ghodsi A, Veisi H (2019) Sentiment analysis based on improved pre-trained word embeddings. Expert Syst Appl 117:139–147
Saldaña J (2015) The coding manual for qualitative researchers. Sage
Scalabrino S, Bavota G, Russo B, Penta MD, Oliveto R (2019) Listening to the crowd for the release planning of mobile apps. IEEE Trans Softw Eng 45(1):68–86
Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557–572
Shah FA, Sabanin Y, Pfahl D (2016) Feature-based evaluation of competing apps. In: Proceedings of the ACM International Workshop on App Market Analytics, WAMA ’16, pp 15–21
Shah FA, Sirts K, Pfahl D (2018) The impact of annotation guidelines and annotated data on extracting app features from app reviews. CoRR, arXiv:abs/1810.05187
Shah FA, Sirts K, Pfahl D (2019) Is the SAFE approach too simple for app feature extraction? a replication study. In: Proceedings of the 25th International Working Conference on Requirements Engineering: Foundation for Software Quality, REFSQ 19, pp 21–36
Shah FA, Sirts K, Pfahl D (2019) Using app reviews for competitive analysis: tool support. In: Proceedings of the 3rd ACM SIGSOFT International Workshop on App Market Analytics, WAMA ’19, pp 40–46
Vasa R, Hoon L, Mouzakis K, Noguchi A (2012) A preliminary analysis of mobile app user reviews. In: Proceedings of the 24th Australian Computer-Human Interaction Conference, OzCHI ’12, pp 241–244
Villarroel L, Bavota G, Russo B, Oliveto R, Penta MD (2016) Release planning of mobile apps based on user reviews. In: Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pp 14–24
Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, pp 1096–1103
Vu PM, Nguyen TT, Pham HV, Nguyen TT (2015) Mining user opinions in mobile app reviews: a keyword-based approach (t). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, ASE ’15, pp 749–759
Zhao X, Jiang J, Yan H, Li X (2010) Jointly modeling aspects and opinions with a maxent-LDA hybrid. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. ACL
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Federica Sarro
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: The Survey Questions
Appendix: The Survey Questions
Rights and permissions
About this article
Cite this article
Assi, M., Hassan, S., Tian, Y. et al. FeatCompare: Feature comparison for competing mobile apps leveraging user reviews. Empir Software Eng 26, 94 (2021). https://doi.org/10.1007/s10664-021-09988-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-09988-y