Using frame semantics for classifying and summarizing application store reviews

Jha, Nishant; Mahmoud, Anas

doi:10.1007/s10664-018-9605-x

Using frame semantics for classifying and summarizing application store reviews

Published: 23 March 2018

Volume 23, pages 3734–3767, (2018)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Text mining techniques have been recently employed to classify and summarize user reviews on mobile application stores. However, due to the inherently diverse and unstructured nature of user-generated online textual data, text-based review mining techniques often produce excessively complicated models that are prone to overfitting. In this paper, we propose a novel approach, based on frame semantics, for app review mining. Semantic frames help to generalize from raw text (individual words) to more abstract scenarios (contexts). This lower-dimensional representation of text is expected to enhance the predictive capabilities of review mining techniques and reduce the chances of overfitting. Specifically, our analysis in this paper is two-fold. First, we investigate the performance of semantic frames in classifying informative user reviews into various categories of actionable software maintenance requests. Second, we propose and evaluate the performance of multiple summarization algorithms in generating concise and representative summaries of informative reviews. Three different datasets of app store reviews, sampled from a broad range of application domains, are used to conduct our experimental analysis. The results show that semantic frames can enable an efficient and accurate review classification process. However, in review summarization tasks, our results show that text-based summarization generates more comprehensive summaries than frame-based summarization. Finally, we introduces MARC 2.0, a review classification and summarization suite that implements the algorithms investigated in our analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining User Requirements from Application Store Reviews Using Frame Semantics

On the automatic classification of app reviews

Article 14 May 2016

Analysing app reviews for software engineering: a systematic literature review

Article Open access 20 January 2022

Notes

https://www.statista.com/topics/1729/app-stores/
https://framenet.icsi.berkeley.edu/fndrupal/
Our dataset is publicly available at http://seel.cse.lsu.edu/data/emse18.zip
Randomization in our analysis is implemented using the .NET Random class
www.cs.waikato.ac.nz/~ml/weka/
www.cs.cmu.edu/~ark/SEMAFOR/
http://seel.cse.lsu.edu/data/emse18.zip
http://demo.ark.cs.cmu.edu/parse
https://github.com/seelprojects/MARC-2.0
https://framenet.icsi.berkeley.edu/fndrupal/current_status

References

Agarwal A, Balasubramanian S, Kotalwar A, Zheng J, Rambow O (2014) Frame semantic tree kernels for social network extraction from text. In: Conference of the European chapter of the association for computational linguistics, pp 211–219
Baker C, Fillmore C, Lowe J (1998) The Berkeley Framenet project. In: International conference on computational linguistics, pp 86–90
Bano M, Zowghi D (2015) A systematic review on the relationship between user involvement and system success. Inf Softw Technol 58:148–169
Article Google Scholar
Barker E, Paramita M, Funk A, Kurtic E, Aker A, Foster J, Hepple M, Gaizauskas R (2016) What’s the issue here?: task-based evaluation of reader comment summarization systems. In: International conference on language resources and evaluation, pp 23–28
Barzilay R, McKeown K, Elhadad M (1999) Information fusion in the context of multi-document summarization. In: Annual meeting of the association for computational linguistics on computational linguistics, pp 550–557
Basole R, Karla J (2012) Value transformation in the mobile service ecosystem: a study of app store emergence and growth. Service Science 4(1):24–41
Article Google Scholar
Berry D (2017) Evaluation of tools for hairy requirements and software engineering tasks. In: International requirements engineering conference workshops, pp 284–291
Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30(1–7):107–117
Article Google Scholar
Brusilovsky P, Kobsa A, Nejdl W (2007) The adaptive web: methods and strategies of web personalization. Springer, Berlin, pp 335–336
Book Google Scholar
Burges C (1998) A tutorial on Support Vector Machines for pattern recognition. Data Min Knowl Disc 2(2):121–167
Article Google Scholar
Cai L, Hofmann T (2004) Hierarchical document categorization with support vector machines. In: International conference on information and knowledge management, pp 78–87
Carreńo G, Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: International conference on software engineering, pp 582–591
Chen N, Lin J, Hoi S, Xiao X, Zhang B (2014) AR-Miner: mining informative reviews for developers from mobile app marketplace. In: International conference on software engineering, pp 767–778
Cheung J (2008) Comparing abstractive and extractive summarization of evaluative text: controversiality and content selection. B. Sc. (Hons.) Thesis in The Department of Computer Science of the Faculty of Science, University of British Columbia
Ciurumelea A, Schaufelbühl A, Panichella S, Gall H (2017) Analyzing reviews and code of mobile apps for better release planning. In: International conference on software analysis, evolution and reengineering, pp 91–102
Das D, Schneider N, Chen D, Smith N (2010) SEMAFOR 1.0: a probabilistic frame-semantic parser. Tech. rep., Report number: CMU-LTI-10-001, Carnegie Mellon University
Dean A, Voss D (1999) Design and analysis of experiments. Springer, Berlin
Book Google Scholar
Dumais S, Chen H (2000) Hierarchical classification of Web content. In: ACM international conference on research and development in information retrieval, pp 256–263
Erkan G, Radev D (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22(1):457–479
Article Google Scholar
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International conference on knowledge discovery and data mining, pp 226–231
Fillmore C (1976) Frame semantics and the nature of language. In: Annals of the New York academy of sciences: conference on the origin and development of language and speech, pp 20–32
Article Google Scholar
Fleischman M, Kwon N, Hovy E (2003) Maximum entropy models for FrameNet classification. In: Empirical methods in natural language processing, pp 49–56
Groen E, Kopczyǹska S, Hauer M, Krafft T, Doerr J (2017) Users: the hidden software product quality experts?: a study on how app users report quality aspects in online reviews. In: International requirements engineering conference, pp 80–89
Guzman E, Maalej W (2014) How do users like this feature? A fine grained sentiment analysis of app reviews. In: Requirements engineering conference, pp 153–162
Guzman E, El-Haliby M, Bruegge B (2015) Ensemble methods for app review classification: an approach for software evolution. In: International conference on automated software engineering, pp 771–776
Guzman E, Alkadhi R, Seyff N (2016) A needle in a haystack: what do Twitter users say about software?. In: International requirements engineering conference, pp 96–105
Guzman E, Ibrahim M, Glinz M (2017) A little bird told me: mining tweets for requirements and software evolution. In: International requirements engineering conference, pp 11–20
Ha E, Wagner D (2013) Do Android users write about electric sheep? Examining consumer reviews in Google Play. In: Consumer communications and networking conference, pp 149–157
Hahn U, Mani I (2000) The challenges of automatic summarization. Computer 33(11):29–36
Article Google Scholar
Hasa K, Ng V (2013) Frame semantics for stance classification. In: Computational natural language learning, pp 124–132
Huffman-Hayes J, Dekhtyar A, Sundaram S (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32 (1):4–19
Article Google Scholar
Iacob C, Harrison R (2013) Retrieving and analyzing mobile apps feature requests from online reviews. In: Mining software repositories, pp 41–44
Inouye D, Kalita J (2011) Comparing Twitter summarization algorithms for multiple post summaries. In: International conference on social computing and international conference on privacy, security, risk and trust, pp 298–306
Jha N, Mahmoud A (2017a) MARC: a mobile application review classifier. In: Requirements engineering: foundation for software quality: workshops, pp 1–6
Chapter Google Scholar
Jha N, Mahmoud A (2017b) Mining user requirements from application store reviews using frame semantics. In: Requirements engineering: foundation for software quality, pp 1–15
Chapter Google Scholar
Joachims T (1998) Text categorization with Support Vector Machines: learning with many relevant features. In: European conference on machine learning, pp 137–142
Chapter Google Scholar
Johann T, Stanik C, Alizadeh A, Maalej W (2017) Safe: a simple approach for feature extraction from app descriptions and app reviews. In: International requirements engineering conference, pp 21–31
Khabiri E, Caverlee J, Hsu C (2011) Summarizing user-contributed comments. In: International AAAI conference on Weblogs and social media, pp 534–537
Khalid H, Shihab E, Nagappan M, Hassan A (2015) What do mobile app users complain about? IEEE Softw 32(3):70–77
Article Google Scholar
Khatiwada S, Tushev M, Mahmoud A (2018) Just enough semantics: an information theoretic approach for ir-based software bug localization. Inf Softw Technol 93:45–57
Article Google Scholar
Kim S, Han K, Rim H, Myaeng S (2006) Some effective techniques for Naive Bayes text classification. IEEE Trans Knowl Data Eng 18(11):1457–1466
Article Google Scholar
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence, pp 1137–1143
Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In: National conference on artificial intelligence, pp 223–228
Lin C (2004) ROUGE: a package for automatic evaluation of summaries. In: Workshop on text summarization branches out, pp 74–81
Lin C, Hovy E (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Conference of the North American chapter of the association for computational linguistics on human language technology, pp 71–78
Llewellyn C, Grover C, Oberlander J (2014) Summarizing newspaper comments. In: International conference on Weblogs and social media, pp 599–602
Lo D, Nagappan N, Zimmermann T (2015) How practitioners perceive the relevance of software engineering research. In: Joint meeting on foundations of software engineering, pp 415–425
Lovins J (1968) Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11:22–31
Google Scholar
Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: Requirements engineering conference, pp 116–125
Mackie S, McCreadie R, Macdonald C, Ounis I (2014) Comparing algorithms for microblog summarisation. In: Information access evaluation. Multilinguality, multimodality, and interaction: 5th international conference of the CLEF initiative, pp 153–159
Google Scholar
Martin W, Harman M, Jia Y, Sarro F, Zhang Y (2015) The app sampling problem for app store mining. In: Working conference on mining software repositories, pp 123–133
Martin W, Sarro F, Jia Y, Zhang Y, Harman M (2017) A survey of app store analysis for software engineering. IEEE Trans Softw Eng 43(9):817–847
Article Google Scholar
McCallum A, Nigam K (1998) A comparison of event models for Naive Bayes text classification. In: AAAI workshop on learning for text categorization, pp 41–48
McCord M, Chuah M (2011) Spam detection on Twitter using traditional classifiers. In: international conference on Autonomic and trusted computing, pp 175–186
Mcllroy S, Ali N, Khalid H, Hassan A (2016) Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empir Softw Eng 21(3):1067–1106
Article Google Scholar
Mehrotra R, Sanner S, Buntine W, Xie L (2013) Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: International ACM SIGIR conference on research and development in information retrieval, pp 889–892
Mitchell T (1997) Machine learning. McGraw-Hill, New York
MATH Google Scholar
Moschitti A, Morarescu P, Harabagiu S (2003) Open domain information extraction via automatic semantic labeling. In: The Florida artificial intelligence research society conference, pp 397–401
Nayebi M, Cho H, Farrahi H, Ruhe G (2017) App store mining is not enough. In: International conference on software engineering companion, pp 152–154
Nenkova A, Vanderwende L (2005) The impact of frequency on summarization. Tech. rep., Report number: MSR-TR-2005-101, Microsoft Research, Redmond, Washington
Nichols J, Mahmud J, Drews C (2012) Summarizing sporting events using Twitter. In: ACM international conference on intelligent user interfaces, pp 189–198
Otterbacher J, Erkan G, Radev D (2009) Biased lexrank: passage retrieval using random walks with question-based priors. Inf Process Manag 45(1):42–54
Article Google Scholar
Pagano D, Maalej W (2013) User feedback in the AppStore: an empirical study. In: Requirements engineering conference, pp 125–134
Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the Web. Tech. rep., Stanford University, Stanford
Panichella S, Di Sorbo A, Guzman E, Visaggio C, Canfora G, Gall H (2015) How can I improve my app? Classifying user reviews for software maintenance and evolution. In: International conference on software maintenance and evolution, pp 281–290
Petsas T, Papadogiannakis A, Polychronakis M, Markatos E, Karagiannis T (2013) Rise of the planet of the apps: a systematic study of the mobile app ecosystem. In: Conference on internet measurement conference, pp 277–290
Platt J (1998) Fast training of Support Vector Machines using sequential minimal optimization. In: Schoelkopf B, Burges C, Smola A (eds) Advances in Kernel methods - Support Vector learning. MIT Press, pp 185–208
Poché E, Jha N, Williams G, Staten J, Vesper M, Mahmoud A (2017) Analyzing user comments on YouTube coding tutorial videos. In: International conference on program comprehension, pp 196–206
Powers D (2014) What the f-measure doesn’t measure. Tech. rep., Report number: KIT-14-001 School of Computer Science, Engineering and Mathematics, Flinders University
Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106
Google Scholar
Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: IEEE international conference on data mining, pp 995–1000
Runeson P (2003) Using students as experimental subjects—an analysis of graduate and freshmen PSP student data. In: Empirical assessment in software engineering, pp 95–102
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Article MathSciNet Google Scholar
Shen D, Lapata M (2007) Using semantic roles to improve question answering. In: Joint conference on empirical methods in natural language processing and computational natural language learning, pp 12–21
Sinha S (2008) Answering questions about complex events. PhD thesis, Berkeley, CA, USA
Sorbo A, Panichella S, Alexandru C, Shimagaki J, Visaggio C, Canfora G, Gall H (2016) What would users change in my app? Summarizing app reviews for recommending software changes. In: International symposium on foundations of software engineering, pp 499–510
Squires L (2010) Enregistering internet language. Lang Soc 39(4):457–492
Article Google Scholar
Steinwart I (2001) On the influence of the kernel on the consistency of Support Vector Machines. J Mach Learn Res 2:67–93
MathSciNet MATH Google Scholar
Tukey J (1949) Comparing individual means in the analysis of variance. Biometrics 5(2):99–114
Article MathSciNet Google Scholar
Üstün B, Melssen W, Buydens L (2006) Facilitating the application of support vector regression by using a universal Pearson VII function based kernel. Chemometr Intell Lab Syst 81:29–40
Article Google Scholar
Villarroel L, Bavota G, Russo B, Oliveto R, Di Penta M (2016) Release planning of mobile apps based on user reviews. In: International conference on software engineering, pp 14–24
Wang A (2010) Don’t follow me: spam detection in Twitter. In: International conference on security and cryptography, pp 1–10
Wang S, Manning C (2012) Baselines and bigrams: simple, good sentiment and topic classification. In: Annual meeting of the association for computational linguistics, pp 90–94
Williams G, Mahmoud A (2017) Mining Twitter feeds for software user requirements. In: IEEE international requirements engineering conference, pp 1–10
Xie B, Passonneau R, Wu L, Creamer G (2013) Semantic frames to predict stock price movement. In: Annual meeting of the association for computational linguistics, pp 873–883

Download references

Acknowledgments

This work was supported in part by the Louisiana Board of Regents Research Competitiveness Subprogram (LA BoR-RCS), contract number: LEQSF(2015-18)-RD-A-07.

Author information

Authors and Affiliations

Division of Computer Science and Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA
Nishant Jha & Anas Mahmoud

Authors

Nishant Jha
View author publications
You can also search for this author inPubMed Google Scholar
Anas Mahmoud
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Anas Mahmoud.

Additional information

Communicated by: Paul Grünbacher and Anna Perini

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jha, N., Mahmoud, A. Using frame semantics for classifying and summarizing application store reviews. Empir Software Eng 23, 3734–3767 (2018). https://doi.org/10.1007/s10664-018-9605-x

Download citation

Published: 23 March 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10664-018-9605-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using frame semantics for classifying and summarizing application store reviews

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mining User Requirements from Application Store Reviews Using Frame Semantics

On the automatic classification of app reviews

Analysing app reviews for software engineering: a systematic literature review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now