research-article

Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval

Authors:

ChengXiang Zhai,

Haohong WangAuthors Info & Claims

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 533 - 542

https://doi.org/10.1145/2766462.2767759

Published: 09 August 2015 Publication History

Abstract

Smartphones and tablets with their apps pervaded our everyday life, leading to a new demand for search tools to help users find the right apps to satisfy their immediate needs. While there are a few commercial mobile app search engines available, the new task of mobile app retrieval has not yet been rigorously studied. Indeed, there does not yet exist a test collection for quantitatively evaluating this new retrieval task. In this paper, we first study the effectiveness of the state-of-the-art retrieval models for the app retrieval task using a new app retrieval test data we created. We then propose and study a novel approach that generates a new representation for each app. Our key idea is to leverage user reviews to find out important features of apps and bridge vocabulary gap between app developers and users. Specifically, we jointly model app descriptions and user reviews using topic model in order to generate app representations while excluding noise in reviews. Experiment results indicate that the proposed approach is effective and outperforms the state-of-the-art retrieval models for app retrieval.

References

[1]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003.

Digital Library

[2]

C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 25--32. ACM, 2004.

Digital Library

[3]

A. Datta, K. Dutta, S. Kajanan, and N. Pervin. Mobilewalla: A mobile application search engine. In Mobile Computing, Applications, and Services, pages 172--187. Springer, 2012.

[4]

A. P. De Vries, A.-M. Vercoustre, J. A. Thom, N. Craswell, and M. Lalmas. Overview of the inex 2007 entity ranking track. In Focused Access to XML Documents, pages 245--251. Springer, 2008.

Digital Library

[5]

H. Duan, C. Zhai, J. Cheng, and A. Gattani. Supporting keyword search in product database: A probabilistic approach. Proc. VLDB Endow., 6(14):1786--1797, Sept. 2013.

Digital Library

[6]

H. Fang, T. Tao, and C. Zhai. A formal study of information retrieval heuristics. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 49--56. ACM, 2004.

Digital Library

[7]

K. Ganesan and C. Zhai. Findilike: preference driven entity search. In Proceedings of the 21st international conference companion on World Wide Web, pages 345--348. ACM, 2012.

Digital Library

[8]

K. Ganesan and C. Zhai. Opinion-based entity ranking. Information retrieval, 15(2):116--150, 2012.

Digital Library

[9]

T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 50--57. ACM, 1999.

Digital Library

[10]

D. Jannach, M. Zanker, A. Felfernig, and G. Friedrich. Recommender systems: an introduction. Cambridge University Press, 2010.

[11]

K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4):422--446, 2002.

Digital Library

[12]

J. Kamps, M. Marx, M. De Rijke, and B. Sigurbjörnsson. Xml retrieval: What to retrieve? In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 409--410. ACM, 2003.

Digital Library

[13]

J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. biometrics, pages 159--174, 1977.

[14]

W. Li and A. McCallum. Pachinko allocation: Dag-structured mixture models of topic correlations. In Proceedings of the 23rd international conference on Machine learning, pages 577--584. ACM, 2006.

Digital Library

[15]

J. Lin, K. Sugiyama, M.-Y. Kan, and T.-S. Chua. Addressing cold-start in app recommendation: latent user models constructed from twitter followers. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 283--292. ACM, 2013.

Digital Library

[16]

Z. Liu, J. Walker, and Y. Chen. Xseek: a semantic xml search engine using keywords. In Proceedings of the 33rd international conference on Very large data bases, pages 1330--1333. VLDB Endowment, 2007.

Digital Library

[17]

C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The stanford corenlp natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55--60, 2014.

[18]

P. Ogilvie and J. Callan. Combining document representations for known-item search. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 143--150. ACM, 2003.

Digital Library

[19]

J. Pehcevski, A.-M. Vercoustre, and J. A. Thom. Exploiting locality of wikipedia links in entity ranking. In Advances in Information Retrieval, pages 258--269. Springer, 2008.

Digital Library

[20]

J. Pérez-Iglesias, J. R. Pérez-Agüera, V. Fresno, and Y. Z. Feinstein. Integrating the probabilistic models bm25/bm25f into lucene. arXiv preprint arXiv:0911.5046, 2009.

[21]

J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275--281. ACM, 1998.

Digital Library

[22]

S. E. Robertson. The probability ranking principle in ir. Readings in information retrieval, pages 281--286, 1997.

Digital Library

[23]

F. Song and W. B. Croft. A general language model for information retrieval. In Proceedings of the eighth international conference on Information and knowledge management, pages 316--321. ACM, 1999.

Digital Library

[24]

A.-M. Vercoustre, J. A. Thom, and J. Pehcevski. Entity ranking in wikipedia. In Proceedings of the 2008 ACM symposium on Applied computing, pages 1101--1106. ACM, 2008.

Digital Library

[25]

H. M. Wallach, D. Minmo, and A. McCallum. Rethinking lda: Why priors matter. 2009.

[26]

N. Walsh, M. Fernández, A. Malhotra, M. Nagy, and J. Marsh. Xquery 1.0 and xpath 2.0 data model (xdm). W3C recommendation, W3C (January 2007), 2007.

[27]

X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 178--185. ACM, 2006.

Digital Library

[28]

X. Yi and J. Allan. A comparative study of utilizing topic models for information retrieval. In Advances in Information Retrieval, pages 29--41. Springer, 2009.

Digital Library

[29]

E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In Proceedings of the 15th ACM international conference on Information and knowledge management, pages 102--111. ACM, 2006.

Digital Library

[30]

P. Yin, P. Luo, W.-C. Lee, and M. Wang. App recommendation: a contest between satisfaction and temptation. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 395--404. ACM, 2013.

Digital Library

[31]

C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342. ACM, 2001.

Digital Library

[32]

H. Zhu, H. Xiong, Y. Ge, and E. Chen. Mobile app recommendations with security and privacy awareness. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 951--960. ACM, 2014.

Digital Library

Cited By

Saedi AFatemi ANematbakhsh MRosset SVilnat A(2025)Entity search based on consumer preferences leveraging user reviewsExpert Systems with Applications10.1016/j.eswa.2025.126990(126990)Online publication date: Feb-2025
https://doi.org/10.1016/j.eswa.2025.126990
Wang XZhang TTan YShang WLi Y(2024)How to effectively mine app reviews concerning software ecosystem? A survey of review characteristicsJournal of Systems and Software10.1016/j.jss.2024.112040213(112040)Online publication date: Jul-2024
https://doi.org/10.1016/j.jss.2024.112040
Zhang ZStefanidis K(2024)Data-Driven Analysis for Monitoring Software EvolutionNew Trends in Database and Information Systems10.1007/978-3-031-70421-5_36(383-391)Online publication date: 14-Nov-2024
https://doi.org/10.1007/978-3-031-70421-5_36
Show More Cited By

Index Terms

Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

A preliminary analysis of mobile app user reviews
OzCHI '12: Proceedings of the 24th Australian Computer-Human Interaction Conference

The advent of online software distribution channels like Apple Inc.'s App Store and Google Inc.'s Google Play has offered developers a single, low cost, and powerful distribution mechanism. These online stores help users discover apps as well as leave a ...
An Explorative Study of the Mobile App Ecosystem from App Developers' Perspective
WWW '17: Proceedings of the 26th International Conference on World Wide Web

With the prevalence of smartphones, app markets such as Apple App Store and Google Play has become the center stage in the mobile app ecosystem, with millions of apps developed by tens of thousands of app developers in each major market. This paper ...
Leveraging app features to improve mobile app retrieval

The continued increase in the use of smartphones and other mobile devices has led to a substantial increase in the demand for mobile applications. With the growing availability of mobile apps, retrieving the right application from a large set has become ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2015

1198 pages

ISBN:9781450336215

DOI:10.1145/2766462

General Chair:
Ricardo Baeza-Yates
Yahoo Labs, USA
,
Program Chairs:
Mounia Lalmas
Yahoo Labs, UK
,
Alistair Moffat
University of Melbourne, Australia
,
Berthier Ribeiro-Neto
Google, Brazil, and UFMG, Brazil

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SIGIR '15

Sponsor:

SIGIR

SIGIR '15: The 38th International ACM SIGIR conference on research and development in Information Retrieval

August 9 - 13, 2015

Santiago, Chile

Acceptance Rates

SIGIR '15 Paper Acceptance Rate 70 of 351 submissions, 20%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
1,001
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)3

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Saedi AFatemi ANematbakhsh MRosset SVilnat A(2025)Entity search based on consumer preferences leveraging user reviewsExpert Systems with Applications10.1016/j.eswa.2025.126990(126990)Online publication date: Feb-2025
https://doi.org/10.1016/j.eswa.2025.126990
Wang XZhang TTan YShang WLi Y(2024)How to effectively mine app reviews concerning software ecosystem? A survey of review characteristicsJournal of Systems and Software10.1016/j.jss.2024.112040213(112040)Online publication date: Jul-2024
https://doi.org/10.1016/j.jss.2024.112040
Zhang ZStefanidis K(2024)Data-Driven Analysis for Monitoring Software EvolutionNew Trends in Database and Information Systems10.1007/978-3-031-70421-5_36(383-391)Online publication date: 14-Nov-2024
https://doi.org/10.1007/978-3-031-70421-5_36
Yu LWang HLuo XZhang TLiu KChen JZhou HTang YXiao X(2023)Towards Automatically Localizing Function Errors in Mobile Apps With User ReviewsIEEE Transactions on Software Engineering10.1109/TSE.2022.317809649:4(1464-1486)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TSE.2022.3178096
Coelho JMano DPaula BCoutinho COliveira JRibeiro RBatista F(2023)Semantic similarity for mobile application recommendation under scarce user dataEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.105974121:COnline publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.engappai.2023.105974
Khlifi GJenhani IMessaoud MMkaouer M(2023)Multi-label Classification of Mobile Application User Reviews Using Neural Language ModelsSymbolic and Quantitative Approaches to Reasoning with Uncertainty10.1007/978-3-031-45608-4_31(417-426)Online publication date: 19-Nov-2023
https://doi.org/10.1007/978-3-031-45608-4_31
Zhou WWang YGao CYang F(2022)Emerging topic identification from app reviews via adaptive online biterm topic modeling基于自适应在线双词主题模型的应用程序评论新兴主题识别Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210046523:5(678-691)Online publication date: 11-Apr-2022
https://doi.org/10.1631/FITEE.2100465
Almansour AAlotaibi RAlharbi H(2022)Text-rating review discrepancy (TRRD): an integrative review and implications for researchFuture Business Journal10.1186/s43093-022-00114-y8:1Online publication date: 22-Feb-2022
https://doi.org/10.1186/s43093-022-00114-y
Alshangiti MShi WLima ELiu XYu QRoychoudhury ACadar CKim M(2022)Hierarchical Bayesian multi-kernel learning for integrated classification and summarization of app reviewsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549174(558-569)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3540250.3549174
Tushev MEbrahimi FMahmoud ADwyer MDamian DZeller A(2022)Domain-specific analysis of mobile app reviews using keyword-assisted topic modelsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510201(762-773)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510201
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten