Users’ interest grouping from online reviews based on topic frequency and order

Si, Jianfeng; Li, Qing; Qian, Tieyun; Deng, Xiaotie

doi:10.1007/s11280-013-0239-z

Users’ interest grouping from online reviews based on topic frequency and order

Published: 27 July 2013

Volume 17, pages 1321–1342, (2014)
Cite this article

World Wide Web Aims and scope Submit manuscript

Jianfeng Si¹,
Qing Li¹,
Tieyun Qian² &
…
Xiaotie Deng³

676 Accesses
8 Citations
Explore all metrics

Abstract

Large volume of online review data can reveal consumers’ major interests on domain product, which attracts great research interests from the academic community. Most of the existing works focus on the problems of review summarization, aspect identification or opinion mining from an item’s point of view such as the quality or popularity of products. Considering the fact that users who generate those review texts draw different attentions to product aspects with respect to their own interests, in this article, we aim to learn K users’ interest groups indicated by their review writings. Such K interest groups’ identification can facilitate better understanding of major and potential consumers’ concerns which are crucial for applications like product improvement on customer-oriented design or diverse marketing strategies. Instead of using a traditional text clustering approach, we treat the groupId/clusterId as a hidden variable and use a permutation-based structural topic model called KMM. Through this model, we infer K interest groups’ distribution by discovering not only the frequency of product aspects (Topic Frequency), but also the occurrence priority of respective aspects (Topic Order). They jointly present an informative summarization on the raw review corpus. Our experiment on several real-world review datasets demonstrates a competitive solution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Article 28 November 2018

Recommender Systems: Techniques, Applications, and Challenges

Customer segmentation using online platforms: isolating behavioral and demographic segments for persona creation via aggregated user data

Article 23 August 2018

References

Abdul-Mageed, M., Diab, M.T., Korayem, M.: Subjectivity and sentiment analysis of modern standard arabic. In: ACL (Short Papers)’11, pp. 587–591 (2011)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. VLDB ’94, Morgan Kaufmann Publishers Inc., San Francisco (1994). http://dl.acm.org/citation.cfm?id=645920.672836
Google Scholar
Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and ir precision-recall measures. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 369–370. SIGIR ’03, ACM, New York (2003). doi:10.1145/860435.860505
Chapter Google Scholar
Beineke, P., Hastie, T., Manning, C., Vaithyanathan, S.: An exploration of sentiment summarization. In: Proceeding of AAAI, pp. 12–15 (2003)
Bernardo, J.M., Smith., A.F.: Bayesian Theory. Wiley Series in Probability and Statistics (2000)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57, 7:1–7:30 (2010)
Article MathSciNet Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Chen, H., Branavan, S.R.K., Barzilay, R., Karger, D.R.: Content modeling using latent permutations. J. Artif. Intell. Res. (JAIR) 36, 129–163 (2009)
MathSciNet MATH Google Scholar
Fligner, M.A., Verducci, J.S.: Distance based ranking models. J. Roy. Stat. Soc. B Met. 48(3), 359–369 (1986)
MathSciNet MATH Google Scholar
Gamon, M., Aue, A., Corston-Oliver, S., Ringger, E.K.: Pulse: Mining customer opinions from free text. In: IDA’05, pp. 121–132 (2005)
Ganesan, K., Zhai, C.: Opinion-Based Entity Ranking. Information Retrieval (2011)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101(suppl. 1), 5228–5235 (2004)
Article Google Scholar
Gruber, A., Rosen-Zvi, M., Weiss, Y.: Hidden topic Markov models. In: Artificial Intelligence and Statistics (AISTATS). San Juan, Puerto Rico (2007)
Heinrich, G.: Parameter estimation for text analysis. Tech. Rep., University of Leipzig, Germany (2004). http://www.arbylon.net/publications/text-est.pdf
Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 219–230. WSDM ’08 (2008)
Jindal, N., Liu, B., Lim, E.P.: Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1549–1552. CIKM ’10 (2010)
Jordan, M. (ed.): Learning in Graphical Models. MIT Press, Cambridge (1999)
Google Scholar
Kawamae, N.: Author interest topic model. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 887–888. SIGIR ’10, ACM, New York, NY, USA (2010). doi:10.1145/1835449.1835666
Google Scholar
Leung, C.K., Chan, S.F., Chung, F.L., Ngai, G.: A probabilistic rating inference framework for mining user preferences from reviews. World Wide Web 14(2), 187–215 (2011). doi:10.1007/s11280-011-0117-5
Article Google Scholar
Li, W., McCallum, A.: Pachinko allocation: Dag-structured mixture models of topic correlations. In: ICML (2006)
Liu, B.: Opinion observer: Analyzing and comparing opinions on the web. In: Proceedings of the 14th international conference on World Wide Web, pp. 342–351. WWW’05 (2005)
Mei, Q., Liu, C., Su, H., Zhai, C.: A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In: Proceedings of the 15th International Conference on World Wide Web, pp. 533–542. WWW ’06 (2006)
Mukherjee, A., Liu, B., Glance, N.: Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web, pp. 191–200. WWW ’12, ACM, New York, NY, USA (2012). doi:10.1145/2187836.2187863
Chapter Google Scholar
Mukherjee, A., Liu, B., Wang, J., Glance, N., Jindal, N.: Detecting group review spam. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 93–94. WWW ’11 (2011)
Phan, X.H., Nguyen, C.T.: Gibbslda+ +: A c/c+ + implementation of latent dirichlet allocation (lda) (2007)
Popescu, A.M., Etzioni, O.: Extracting product features and opinions from reviews. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 339–346. HLT ’05 (2005)
Purver, M., Griffiths, T.L., Körding, K.P., Tenenbaum, J.B.: Unsupervised topic modelling for multi-party spoken discourse. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 17–24. ACL-44 (2006)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. UAI ’04, AUAI Press, Arlington, Virginia, United States (2004). http://dl.acm.org/citation.cfm?id=1036843.1036902
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975). doi:10.1145/361219.361220
Article MATH Google Scholar
Sensoy, M., Yolum, P.: Automating user reviews using ontologies: an agent-based approach. World Wide Web 15(3), 285–323 (2012). doi:10.1007/s11280-011-0134-4
Article Google Scholar
Si, J., Li, Q., Qian, T., Deng, X.: Discovering k web user groups with specific aspect interests. In: Proceedings of Machine Learning and Data Mining in Pattern Recognition, pp. 321–335. MLDM 2012 (2012)
Titov, I., McDonald, R.: Modeling online reviews with multi-grain topic models. In: Proceeding of the 17th International Conference on World Wide Web, pp. 111–120. WWW ’08 (2008)
Wang, H., Zhang, D., Zhai, C.: Structural topic model for latent topical structure analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1526–1535. ACL-HLT ’11, Association for Computational Linguistics, Stroudsburg, PA, USA (2011). http://dl.acm.org/citation.cfm?id=2002472.2002657
Google Scholar
Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiments and analysis. Tech. Rep., University of Minnesota Press, Minneapolis (2002)
Zhou, X., Zhang, X., Hu, X.: Semantic smoothing of document models for agglomerative clustering. In: Proceeding 20th International Joint Conf. Artificial Intelligence, pp. 2928–2933. IJCAI’ 07 (2007)

Download references

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Hong Kong, China
Jianfeng Si & Qing Li
State Key Laboratory of Software Engineering, Wuhan University, Wuhan, China
Tieyun Qian
AIMS Lab, Department of Computer Science, Shanghai Jiaotong University, Shanghai, China
Xiaotie Deng

Authors

Jianfeng Si
View author publications
You can also search for this author in PubMed Google Scholar
Qing Li
View author publications
You can also search for this author in PubMed Google Scholar
Tieyun Qian
View author publications
You can also search for this author in PubMed Google Scholar
Xiaotie Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianfeng Si.

Additional information

This paper is an extended version of our previous conference paper [32]

Rights and permissions

Reprints and permissions

About this article

Cite this article

Si, J., Li, Q., Qian, T. et al. Users’ interest grouping from online reviews based on topic frequency and order. World Wide Web 17, 1321–1342 (2014). https://doi.org/10.1007/s11280-013-0239-z

Download citation

Received: 08 July 2012
Revised: 04 June 2013
Accepted: 02 July 2013
Published: 27 July 2013
Issue Date: November 2014
DOI: https://doi.org/10.1007/s11280-013-0239-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Users’ interest grouping from online reviews based on topic frequency and order

Abstract

Access this article

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Recommender Systems: Techniques, Applications, and Challenges

Customer segmentation using online platforms: isolating behavioral and demographic segments for persona creation via aggregated user data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Users’ interest grouping from online reviews based on topic frequency and order

Abstract

Access this article

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Recommender Systems: Techniques, Applications, and Challenges

Customer segmentation using online platforms: isolating behavioral and demographic segments for persona creation via aggregated user data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation