Improving the review classification of Google apps using combined feature embedding and deep convolutional neural network model

Aslam, Naila; Alzamzami, Ohoud; Xia, Kewen; Sadiq, Saima; Umer, Muhammad; Bisogni, Carmen; Ashraf, Imran

doi:10.1007/s12652-023-04529-5

Improving the review classification of Google apps using combined feature embedding and deep convolutional neural network model

Original Research
Published: 25 January 2023

Volume 14, pages 4257–4272, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Naila Aslam¹,
Ohoud Alzamzami²,
Kewen Xia¹,
Saima Sadiq³,
Muhammad Umer⁴,
Carmen Bisogni ORCID: orcid.org/0000-0003-1358-006X⁵ &
…
Imran Ashraf⁶

308 Accesses
Explore all metrics

This article has been updated

Abstract

Online reviews play an integral part in making mobile applications stand out from the large number of applications available on the Google Play store. Predominantly, users consider posted reviews for appropriate app selection. Manual categorization of such reviews is both inefficient and time-consuming. Therefore, automatic analysis of the sentiments of such reviews provides fast suggestions for new users and facilitates their selection of the appropriate app. However, data imbalance is a major challenge for performing class prediction of such reviews as their distribution is sparse and often leads to low accuracy. This work proposes a framework to overcome this limitation. Extensive experiments are performed using the original and balanced data with the synthetic minority oversampling technique (SMOTE) and adaptive synthetic sampling (ADASYN). Additionally, deep learning and machine learning models are evaluated using FastText, FastText Subword, global vector (GloVe), and their combinations for word representation. Baseline machine learning models, including random forest, extra tree classifier, gradient boosting, Naive Bayes, logistic regression (LR), stochastic gradient descent (SGD), and voting classifier (VC) that combines LR and SGD, are used for comparison. The outcomes show that the convolutional neural network using a combination of word embedding techniques produces the most accurate results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting the helpfulness score of online reviews using convolutional neural network

Article 23 February 2019

Sentiment Analysis by Using Supervised Machine Learning and Deep Learning Approaches

Simplifying the Classification of App Reviews Using Only Lexical Features

Data availability statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Change history

15 February 2023
Correct Email address of author Kewen Xia has been updated in original version

References

Aditsania A, Saonard AL, et al (2017) Handling imbalanced data in churn prediction using adasyn and backpropagation algorithm. In: 2017 3rd International Conference on science in information technology (ICSITech), IEEE, pp 533–536
Aggarwal CC (2018) Opinion mining and sentiment analysis. In: Machine learning for text. Springer, Cham, pp 413–434
Chapter MATH Google Scholar
Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: 2017 International Conference on engineering and technology (ICET), Ieee, pp 1–6
Araque O, Corcuera-Platas I, Sánchez-Rada JF et al (2017) Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Syst Appl 77:236–246
Article Google Scholar
Balogun AO, Basri S, Said JA et al (2019) Software defect prediction: analysis of class imbalance and performance stability. J Eng Sci Technol 14(6):3294–3308
Google Scholar
Banerjee I, Ling Y, Chen MC et al (2019) Comparative effectiveness of convolutional neural network (cnn) and recurrent neural network (rnn) architectures for radiology text report classification. Artif Intell Med 97:79–88
Article Google Scholar
Bar Y, Diamant I, Wolf L et al (2015) Chest pathology detection using deep learning with non-medical training. In: Proceedings–International Symposium on biomedical imaging, 2015, pp 294–297
Bottou L (2012) Stochastic gradient descent tricks. In: Neural networks: tricks of the trade. Springer, Berlin, Heidelberg, pp 421–436
Chapter Google Scholar
Castiglione A, Vijayakumar P, Nappi M et al (2021) Covid-19: Automatic detection of the novel coronavirus disease from ct images using an optimized convolutional neural network. IEEE Trans Ind Inform 17(9):6480–6488
Article Google Scholar
Chakraborty K, Bhatia S, Bhattacharyya S et al (2020) Sentiment analysis of covid-19 tweets by deep learning classifiers-a study to show how popularity is affecting accuracy in social media. Appl Soft Comput 97(106):754
Google Scholar
Chambua J, Niu Z, Yousif A et al (2018) Tensor factorization method based on review text semantic similarity for rating prediction. Expert Syst Appl 114:629–638
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO et al (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Ciurumelea A, Schaufelbühl A, Panichella S et al (2017) Analyzing reviews and code of mobile apps for better release planning. In: 2017 IEEE 24th International Conference on software analysis. evolution and reengineering (SANER), IEEE, pp 91–102
Dai L, Sheng B, Wu Q, et al (2017) Retinal microaneurysm detection using clinical report guided multi-sieving cnn. In: International Conference on medical image computing and computer-assisted intervention, vol 10435. Springer, Cham, pp 525–532
Google Scholar
Désir C, Petitjean C, Heutte L et al (2012) Classification of endomicroscopic images of the lung based on random subwindows and extra-trees. IEEE Trans Biomed Eng 59(9):2677–2683
Article Google Scholar
Dessi D, Helaoui R, Kumar V, et al (2021) Tf-idf vs word embeddings for morbidity identification in clinical notes: an initial study. arXiv preprint arXiv:2105.09632
Elmurngi E, Gherbi A (2018) Fake reviews detection on movie reviews through sentiment analysis using supervised learning techniques. Int J Adv Syst Meas 11(1 & 2):196–207
Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 19:1189–1232
MathSciNet MATH Google Scholar
Garcia LP, Duarte E (2020) Infodemic: excess quantity to the detriment of quality of information about COVID-19. Epidemiol Serv Saude 29(4):e2020186. https://doi.org/10.1590/S1679-49742020000400019
Article Google Scholar
González-Barcenas V, Rendón E, Alejo R, et al (2019) Addressing the big data multi-class imbalance problem with oversampling and deep learning neural networks. In: Iberian Conference on pattern recognition and image analysis, vol 11867. Springer, Cham, pp 216–224
Google Scholar
Hailong Z, Wenyan G, Bo J (2014) Machine learning and lexicon based methods for sentiment classification: a survey. In: 2014 11th Web Information System and Application Conference, IEEE, pp 262–265
He H, Bai Y, Garcia EA, et al (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on neural networks (IEEE world congress on computational intelligence), IEEE, pp 1322–1328
He H, Zhang W, Zhang S (2018) A novel ensemble method for credit scoring: adaption of different imbalance ratios. Expert Syst Appl 98:105–117
Article Google Scholar
Ishaq A, Umer M, Mushtaq MF et al (2021) Extensive hotel reviews classification using long short term memory. J Ambient Intell Humaniz Comput 12(10):9375–9385
Article Google Scholar
Joulin A, Grave E, Bojanowski P, et al (2016) Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651
Kaur A, Kaur K (2018) Systematic literature review of mobile application development and testing effort estimation. J King Saud Univ-Comput Inform Sci, pp 452–455
Korkmaz M, Güney S, Yiğiter Ş (2012) The importance of logistic regression implementations in the turkish livestock sector and logistic regression implementations/fields. Harran Tarım ve Gıda Bilimleri Dergisi 16(2):25–36
Google Scholar
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Neural Inform Process Syst 25:84–90
Google Scholar
Kumar V, Recupero DR, Riboni D et al (2020) Ensembling classical machine learning and deep learning approaches for morbidity identification from clinical notes. IEEE Access 9:7107–7126
Article Google Scholar
Kunaefi A, Aritsugi M (2021) Extracting arguments based on user decisions in app reviews. IEEE Access 9:45,078-45,094
Article Google Scholar
Leung KM (2007) Naive Bayesian classifier. Polytechnic University Department of Computer Science/Finance and Risk Engineering, pp 123–156
Google Scholar
Liu B et al (2010) Sentiment analysis and subjectivity. Handb Nat Lang Process 2(2010):627–666
Google Scholar
Luca M (2016) Reviews, reputation, and revenue: the case of yelp. com. Com (March 15, 2016) Harvard Business School NOM Unit Working Paper (12-016)
Lx Luo (2019) Network text sentiment analysis method combining lda text representation and gru-cnn. Pers Ubiquit Comput 23(3):405–412
Google Scholar
Luo Y, Xu X (2019) Predicting the helpfulness of online restaurant reviews using different machine learning algorithms: A case study of yelp. Sustainability 11(19):5254
Article Google Scholar
Maalej W, Kurtanović Z, Nabil H et al (2016) On the automatic classification of app reviews. Requirements Eng 21(3):311–331
Article Google Scholar
Monett D, Stolte H (2016) Predicting star ratings based on annotated reviews of mobile apps. In: 2016 Federated Conference on Computer Science and Information Systems (FedCSIS). Gdansk, Poland, pp 421–428
Ning X, Yac L, Wang X et al (2020) Rating prediction via generative convolutional neural networks based regression. Pattern Recogn Lett 132:12–20
Article Google Scholar
Panichella S, Di Sorbo A, Guzman E, et al (2015) How can i improve my app? classifying user reviews for software maintenance and evolution. In: 2015 IEEE International Conference on software maintenance and evolution (ICSME), IEEE, pp 281–290
Park H, Kj Kim (2020) Impact of word embedding methods on performance of sentiment analysis with machine learning techniques. J Korea Soc Comput Inform 25(8):181–188
Google Scholar
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543
Pereira S, Pinto A, Alves V et al (2016) Brain tumor segmentation using convolutional neural networks in mri images. IEEE Trans Med Imaging 35:1–1
Article Google Scholar
Qaiser S, Ali R (2018) Text mining: use of tf-idf to examine the relevance of words to documents. Int J Comput Appl 181(1):25–29
Google Scholar
Sadiq S, Mehmood A, Ullah S et al (2021a) Aggression detection through deep neural model on twitter. Futur Gener Comput Syst 114:120–129
Article Google Scholar
Sadiq S, Umer M, Ullah S et al (2021b) Discrepancy detection between actual user reviews and numeric ratings of google app store using deep learning. Expert Syst Appl 181(115):111
Google Scholar
Song S, Huang H, Ruan T (2019) Abstractive text summarization using lstm-cnn based deep learning. Multimed Tools Appl 78(1):857–875
Article Google Scholar
Spelmen VS, Porkodi R (2018) A review on handling imbalanced data. In: 2018 International Conference on current trends towards converging technologies (ICCTCT), IEEE, pp 1–11
Svetnik V, Liaw A, Tong C et al (2003) Random forest: a classification and regression tool for compound classification and qsar modeling. J Chem Inf Comput Sci 43(6):1947–1958
Article Google Scholar
Tian Y, Nagappan M, Lo D, et al (2015) What are the characteristics of high-rated apps? a case study on free android applications. In: 2015 IEEE International Conference on software maintenance and evolution (ICSME), IEEE, pp 301–310
Tsai CF, Lin WC, Hu YH et al (2019) Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf Sci 477:47–54
Article Google Scholar
Umer M (2021) Mumersabir/cais. GitHub https://github.com/MUmerSabir/CAIS. Accessed 02 Jan 2022
Umer M, Ashraf I, Mehmood A et al (2021) Predicting numeric ratings for google apps using text features and ensemble learning. ETRI J 43(1):95–108
Article Google Scholar
Villarroel L, Bavota G, Russo B, et al (2016) Release planning of mobile apps based on user reviews. In: 2016 IEEE/ACM 38th International Conference on software engineering (ICSE), IEEE, pp 14–24
Xiao Z, Xu X, Xing H et al (2021a) Rtfn: a robust temporal feature network for time series classification. Inf Sci 571:65–86
Article MathSciNet Google Scholar
Xiao Z, Xu X, Xing H, et al (2021b) Rnts: Robust neural temporal search for time series classification. In: 2021 International Joint Conference on neural networks (IJCNN), IEEE, pp 1–8
Xiao Z, Xu X, Xing H et al (2021) A federated learning system with enhanced feature extraction for human activity recognition. Knowl-Based Syst 229(107):338
Google Scholar
Yousaf A, Umer M, Sadiq S et al (2020) Emotion recognition by textual tweets classification using voting classifier (lr-sgd). IEEE Access 9:6289–6295
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. U1813222, No. 42075129), Hebei Province Natural Science Foundation (No. E2021202179), Key Research and Development Project from Hebei Province (No. 19210404D, No. 20351802D, No.21351803D).

Author information

Authors and Affiliations

School of Electronics and Information Engineering, Hebei University of Technology, Tianjin, China
Naila Aslam & Kewen Xia
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Ohoud Alzamzami
Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
Saima Sadiq
Department of Computer Science Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
Muhammad Umer
Department of Computer Science, University of Salerno, Fisciano, Italy
Carmen Bisogni
Information and Communication Engineering, Yeungnam University, Gyeongsan, Republic of Korea
Imran Ashraf

Authors

Naila Aslam
View author publications
You can also search for this author in PubMed Google Scholar
Ohoud Alzamzami
View author publications
You can also search for this author in PubMed Google Scholar
Kewen Xia
View author publications
You can also search for this author in PubMed Google Scholar
Saima Sadiq
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Umer
View author publications
You can also search for this author in PubMed Google Scholar
Carmen Bisogni
View author publications
You can also search for this author in PubMed Google Scholar
Imran Ashraf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kewen Xia or Carmen Bisogni.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Aslam, N., Alzamzami, O., Xia, K. et al. Improving the review classification of Google apps using combined feature embedding and deep convolutional neural network model. J Ambient Intell Human Comput 14, 4257–4272 (2023). https://doi.org/10.1007/s12652-023-04529-5

Download citation

Received: 14 April 2022
Accepted: 10 January 2023
Published: 25 January 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s12652-023-04529-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the review classification of Google apps using combined feature embedding and deep convolutional neural network model

Abstract

Access this article

Similar content being viewed by others

Predicting the helpfulness score of online reviews using convolutional neural network

Sentiment Analysis by Using Supervised Machine Learning and Deep Learning Approaches

Simplifying the Classification of App Reviews Using Only Lexical Features

Data availability statement

Change history

15 February 2023

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving the review classification of Google apps using combined feature embedding and deep convolutional neural network model

Abstract

Access this article

Similar content being viewed by others

Predicting the helpfulness score of online reviews using convolutional neural network

Sentiment Analysis by Using Supervised Machine Learning and Deep Learning Approaches

Simplifying the Classification of App Reviews Using Only Lexical Features

Data availability statement

Change history

15 February 2023

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation