Emerging topic identification from app reviews via adaptive online biterm topic modeling

Zhou, Wan; Wang, Yong; Gao, Cuiyun; Yang, Fei

doi:10.1631/FITEE.2100465

Emerging topic identification from app reviews via adaptive online biterm topic modeling

基于自适应在线双词主题模型的应用程序评论新兴主题识别

Research Article
Published: 11 April 2022

Volume 23, pages 678–691, (2022)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

222 Accesses
2 Citations
Explore all metrics

Abstract

Emerging topics in app reviews highlight the topics (e.g., software bugs) with which users are concerned during certain periods. Identifying emerging topics accurately, and in a timely manner, could help developers more effectively update apps. Methods for identifying emerging topics in app reviews based on topic models or clustering methods have been proposed in the literature. However, the accuracy of emerging topic identification is reduced because reviews are short in length and offer limited information. To solve this problem, an improved emerging topic identification (IETI) approach is proposed in this work. Specifically, we adopt natural language processing techniques to reduce noisy data, and identify emerging topics in app reviews using the adaptive online biterm topic model. Then we interpret the implicature of emerging topics through relevant phrases and sentences. We adopt the official app changelogs as ground truth, and evaluate IETI in six common apps. The experimental results indicate that IETI is more accurate than the baseline in identifying emerging topics, with improvements in the F1 score of 0.126 for phrase labels and 0.061 for sentence labels. Finally, we release the codes of IETI on Github (https://github.com/wanizhou/IETI).

摘要

应用程序评论中的新兴主题突出了用户在一定时期内关注的主题 (如软件漏洞) 。准确、及时地识别新兴主题能帮助开发者更有效地更新应用程序。已有文献基于主题模型或聚类方法识别应用程序评论中的新兴主题。然而, 由于评论文本长度较短, 提供的信息有限, 新兴主题识别准确率较低。为解决该问题, 提出一种改进的新兴主题识别方法 (IETI) 。首先采用自然语言处理技术减少评论文本中的噪音数据, 然后使用自适应在线双词主题模型识别评论中的新兴主题。最后利用新兴主题中相关的短语和句子解释新兴主题的含义。采用官方更新日志作为新兴主题的评估标准, 选择6个常见的应用程序对IETI进行评估。实验结果表明, IETI在识别新兴主题方面优于传统方法, 短语标签F1值增量为0.126, 句子标签F1值增量为0.061。我们在Github (https://github.com/wanizhou/IETI) 上发布了IETI的代码。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Source Code Topics Through Topic Model and Words Embedding

Topic modeling in software engineering research

Article Open access 06 September 2021

Extracting information and inferences from a large text corpus

Article 20 November 2022

References

AlSumait L, Barbará D, Domeniconi C, 2008. On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. Proc 8^th IEEE Int Conf on Data Mining, p.3–12. https://doi.org/10.1109/ICDM.2008.140
Aslam N, Ramay WY, Xia KW, et al., 2020. Convolutional neural network based classification of app reviews. IEEE Access, 8:185619–185628. https://doi.org/10.1109/ACCESS.2020.3029634
Article Google Scholar
Blei DM, Ng AY, Jordan MI, 2003. Latent Dirichlet allocation. J Mach Learn Res, 3:993–1022.
MATH Google Scholar
Calefato F, Lanubile F, Maiorano F, et al., 2018. Sentiment polarity detection for software development. Empir Softw Eng, 23(3):1352–1382. https://doi.org/10.1007/s10664-017-9546-9
Article Google Scholar
Chen N, Lin JL, Hoi SCH, et al., 2014. AR-miner: mining informative reviews for developers from mobile app marketplace. Proc 36^th Int Conf on Software Engineering, p.767–778. https://doi.org/10.1145/2568225.2568263
Cheng XQ, Yan XH, Lan YY, et al., 2014. BTM: topic modeling over short texts. IEEE Trans Knowl Data Eng, 26(12):2928–2941. https://doi.org/10.1109/TKDE.2014.2313872
Article Google Scholar
Choi HJ, Park CH, 2019. Emerging topic detection in Twitter stream based on high utility pattern mining. Expert Syst Appl, 115:27–36. https://doi.org/10.1016/j.eswa.2018.07.051
Article Google Scholar
Darbanibasmanj AA, Persaud A, Ruhi U, 2019. Application of machine learning to mining customer reviews. Proc 25^th Americas Conf on Information Systems, Article 21.
Devlin J, Chang MW, Lee K, et al., 2019. BERT: pre-training of deep bidirectional transformers for language understanding. Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.4171–4186. https://doi.org/10.18653/v1/N19-1423
Fan YM, Ma JX, 2014. Detection of emerging topics based on LDA and feature analysis of emerging topics. J China Soc Sci Techn Inform, 33(7):698–711 (in Chinese). https://doi.org/10.3772/j.issn.1000-0135.2014.07.003
Google Scholar
Gao CY, Xu H, Hu JJ, et al., 2015. AR-Tracker: track the dynamics of mobile apps via user review mining. IEEE Symp on Service-Oriented System Engineering, p.284–290. https://doi.org/10.1109/SOSE.2015.13
Gao CY, Zeng JC, Lyu MR, et al., 2018. Online app review analysis for identifying emerging issues. Proc 40^th Int Conf on Software Engineering, p.48–58. https://doi.org/10.1145/3180155.3180218
Gao CY, Zheng W, Deng Y, et al., 2019. Emerging app issue identification from user feedback: experience on WeChat. Proc 41^st Int Conf on Software Engineering: Software Engineering in Practice, p.279–288. https://doi.org/10.1109/ICSE-SEIP.2019.00040
Genc-Nayebi N, Abran A, 2017. A systematic literature review: opinion mining studies from mobile app store user reviews. J Syst Softw, 125:207–219. https://doi.org/10.1016/j.jss.2016.11.027
Article Google Scholar
Gu XD, Kim S, 2015. “What parts of your apps are loved by users?¹”. Proc 30^th IEEE/ACM Int Conf on Automated Software Engineering, p.760–770. https://doi.org/10.1109/ASE.2015.57
Guzman E, El-Haliby M, Bruegge B, 2015. Ensemble methods for app review classification: an approach for software evolution. Proc 30^th IEEE/ACM Int Conf on Automated Software Engineering, p.771–776. https://doi.org/10.1109/ASE.2015.88
Hadi MA, Fard FH, 2020. AOBTM: adaptive online biterm topic modeling for version sensitive short-texts analysis. IEEE Int Conf on Software Maintenance and Evolution, p.593–604. https://doi.org/10.1109/ICSME46990.2020.00062
Huang JJ, Peng M, Wang H, et al., 2017. A probabilistic method for emerging topic tracking in microblog stream. World Wide Web, 20(2):325–350. https://doi.org/10.1007/s11280-016-0390-4
Article Google Scholar
Jha N, Mahmoud A, 2019. Mining non-functional requirements from app store reviews. Empir Softw Eng, 24(6):3659–3695. https://doi.org/10.1007/s10664-019-09716-7
Article Google Scholar
Jin MM, Luo X, Zhu HL, et al., 2018. Combining deep learning and topic modeling for review understanding in context-aware recommendation. Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.1605–1614. https://doi.org/10.18653/v1/N18-1145
Li CL, Duan Y, Wang HR, et al., 2017. Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans Inform Syst, 36(2):11. https://doi.org/10.1145/3091108
Google Scholar
Li YC, Jia BX, Guo Y, et al., 2017. Mining user reviews for mobile app comparisons. Proc ACM Interact Mob Wear Ubiquit Technol, 1(3):75. https://doi.org/10.1145/3130935
Google Scholar
Liu YD, Li YW, Guo YH, et al., 2016. Stratify mobile app reviews: E-LDA model based on hot “entity” discovery. Proc 12^th Int Conf on Signal-Image Technology & Internet-Based Systems, p.581–588. https://doi.org/10.1109/SITIS.2016.97
Liu YZ, Liu L, Liu HX, et al., 2019. App store mining for iterative domain analysis: combine app descriptions with user reviews. Softw Pract Exp, 49(6):1013–1040. https://doi.org/10.1002/spe.2693
Article Google Scholar
Maalej W, Nabil H, 2015. Bug report, feature request, or simply praise? On automatically classifying app reviews. Proc 23^rd IEEE Int Requirements Engineering Conf, p.116–125. https://doi.org/10.1109/RE.2015.7320414
McIlroy S, Ali N, Hassan AE, 2016. Fresh apps: an empirical study of frequently-updated mobile apps in the Google Play Store. Empir Softw Eng, 21(3):1346–1370. https://doi.org/10.1007/s10664-015-9388-2
Article Google Scholar
Mei QZ, Shen XH, Zhai CX, 2007. Automatic labeling of multinomial topic models. Proc 13^th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.490–499. https://doi.org/10.1145/1281192.1281246
Nguyen TS, Lauw HW, Tsaparas P, 2015. Review synthesis for micro-review summarization. Proc 8^th ACM Int Conf on Web Search and Data Mining, p.169–178. https://doi.org/10.1145/2684822.2685321
Noei E, Zhang F, Zou Y, 2021. Too many user-reviews! What should app developers look at first? IEEE Trans Softw Eng, 47(2):367–378. https://doi.org/10.1109/TSE.2019.2893171
Article Google Scholar
Park DH, Liu MW, Zhai CX, et al., 2015. Leveraging user reviews to improve accuracy for mobile app retrieval. Proc 38^th Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.533–542. https://doi.org/10.1145/2766462.2767759
Rousseeuw PJ, Hubert M, 2011. Robust statistics for outlier detection. WIREs Data Min Knowl Discov, 1(1):73–79. https://doi.org/10.1002/widm.2
Article Google Scholar
Sarro F, Al-Subaihin AA, Harman M, et al., 2015. Feature lifecycles as they spread, migrate, remain, and die in App Stores. Proc 23^rd IEEE Int Requirements Engineering Conf, p.76–85. https://doi.org/10.1109/RE.2015.7320410
Su YQ, Wang YC, Yang WH, 2019. Mining and comparing user reviews across similar mobile apps. Proc 15^th Int Conf on Mobile Ad-Hoc and Sensor Networks, p.338–342. https://doi.org/10.1109/MSN48538.2019.00070
Verasakulvong E, Vateekul P, Piyatumrong A, et al., 2018. Online emerging topic detection on Twitter using random forest with stock indicator features. Proc 15^th Int Joint Conf on Computer Science and Software Engineering, p.1–6. https://doi.org/10.1109/JCSSE.2018.8457349
Vu PM, Pham HV, Nguyen TT, et al., 2016. Phrase-based extraction of user opinions in mobile app reviews. Proc 31^st IEEE/ACM Int Conf on Automated Software Engineering, p.726–731. https://doi.org/10.1145/2970276.2970365
Wang Z, Gu SM, Xu XW, 2018. GSLDA: LDA-based group spamming detection in product reviews. Appl Intell, 48(9):3094–3107. https://doi.org/10.1007/s10489-018-1142-1
Article Google Scholar
Zeng JC, Li J, Song Y, et al., 2018. Topic memory networks for short text classification. Conf on Empirical Methods in Natural Language Processing, p.3120–3131. https://doi.org/10.18653/v1/D18-1351

Download references

Author information

Authors and Affiliations

School of Information and Computer, Anhui Polytechnic University, Wuhu, 241000, China
Wan Zhou (周芄) & Yong Wang (王勇)
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210000, China
Yong Wang (王勇)
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518000, China
Cuiyun Gao (高翠芸)
Zhejiang Lab, Hangzhou, 310000, China
Fei Yang (杨非)

Authors

Wan Zhou (周芄)
View author publications
You can also search for this author in PubMed Google Scholar
Yong Wang (王勇)
View author publications
You can also search for this author in PubMed Google Scholar
Cuiyun Gao (高翠芸)
View author publications
You can also search for this author in PubMed Google Scholar
Fei Yang (杨非)
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Wan ZHOU and Yong WANG designed the research. Wan ZHOU processed the data and drafted the paper. Yong WANG, Cuiyun GAO, and Fei YANG helped organize the paper. Wan ZHOU and Yong WANG revised and finalized the paper.

Corresponding author

Correspondence to Yong Wang (王勇).

Additional information

Compliance with ethics guidelines

Wan ZHOU, Yong WANG, Cuiyun GAO, and Fei YANG declare that they have no conflict of interest.

Project supported by the Anhui Provincial Natural Science Foundation of China (No. 1908085MF183), the National Natural Science Foundation of China (Nos. 62002084 and 61976005), the Training Program for Young and Middle-Aged Top Talents of Anhui Polytechnic University, China (No. 201812), the Zhejiang Provincial Natural Science Foundation of China (No. LQ21F020004), the State Key Laboratory for Novel Software Technology (Nanjing University) Research Program, China (No. KFKT2019B23), the Open Research Fund of Anhui Key Laboratory of Detection Technology and Energy Saving Devices, Anhui Polytechnic University, China (No. DTESD2020B03), and the Stable Support Plan for Colleges and Universities in Shenzhen, China (No. GXWD20201230155427003-20200730101839009)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, W., Wang, Y., Gao, C. et al. Emerging topic identification from app reviews via adaptive online biterm topic modeling. Front Inform Technol Electron Eng 23, 678–691 (2022). https://doi.org/10.1631/FITEE.2100465

Download citation

Received: 30 September 2021
Accepted: 02 December 2021
Published: 11 April 2022
Issue Date: May 2022
DOI: https://doi.org/10.1631/FITEE.2100465

Key words

关键词

CLC number

TP311.5

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emerging topic identification from app reviews via adaptive online biterm topic modeling

Abstract

摘要

Access this article

Similar content being viewed by others

Mining Source Code Topics Through Topic Model and Words Embedding

Topic modeling in software engineering research

Extracting information and inferences from a large text corpus

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Key words

关键词

CLC number

Navigation

Emerging topic identification from app reviews via adaptive online biterm topic modeling

Abstract

摘要

Access this article

Similar content being viewed by others

Mining Source Code Topics Through Topic Model and Words Embedding

Topic modeling in software engineering research

Extracting information and inferences from a large text corpus

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

CLC number

Search

Navigation