Skip to main content
Log in

Users’ interest grouping from online reviews based on topic frequency and order

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Large volume of online review data can reveal consumers’ major interests on domain product, which attracts great research interests from the academic community. Most of the existing works focus on the problems of review summarization, aspect identification or opinion mining from an item’s point of view such as the quality or popularity of products. Considering the fact that users who generate those review texts draw different attentions to product aspects with respect to their own interests, in this article, we aim to learn K users’ interest groups indicated by their review writings. Such K interest groups’ identification can facilitate better understanding of major and potential consumers’ concerns which are crucial for applications like product improvement on customer-oriented design or diverse marketing strategies. Instead of using a traditional text clustering approach, we treat the groupId/clusterId as a hidden variable and use a permutation-based structural topic model called KMM. Through this model, we infer K interest groups’ distribution by discovering not only the frequency of product aspects (Topic Frequency), but also the occurrence priority of respective aspects (Topic Order). They jointly present an informative summarization on the raw review corpus. Our experiment on several real-world review datasets demonstrates a competitive solution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abdul-Mageed, M., Diab, M.T., Korayem, M.: Subjectivity and sentiment analysis of modern standard arabic. In: ACL (Short Papers)’11, pp. 587–591 (2011)

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. VLDB ’94, Morgan Kaufmann Publishers Inc., San Francisco (1994). http://dl.acm.org/citation.cfm?id=645920.672836

    Google Scholar 

  3. Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and ir precision-recall measures. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 369–370. SIGIR ’03, ACM, New York (2003). doi:10.1145/860435.860505

    Chapter  Google Scholar 

  4. Beineke, P., Hastie, T., Manning, C., Vaithyanathan, S.: An exploration of sentiment summarization. In: Proceeding of AAAI, pp. 12–15 (2003)

  5. Bernardo, J.M., Smith., A.F.: Bayesian Theory. Wiley Series in Probability and Statistics (2000)

  6. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    MATH  Google Scholar 

  7. Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57, 7:1–7:30 (2010)

    Article  MathSciNet  Google Scholar 

  8. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  9. Chen, H., Branavan, S.R.K., Barzilay, R., Karger, D.R.: Content modeling using latent permutations. J. Artif. Intell. Res. (JAIR) 36, 129–163 (2009)

    MathSciNet  MATH  Google Scholar 

  10. Fligner, M.A., Verducci, J.S.: Distance based ranking models. J. Roy. Stat. Soc. B Met. 48(3), 359–369 (1986)

    MathSciNet  MATH  Google Scholar 

  11. Gamon, M., Aue, A., Corston-Oliver, S., Ringger, E.K.: Pulse: Mining customer opinions from free text. In: IDA’05, pp. 121–132 (2005)

  12. Ganesan, K., Zhai, C.: Opinion-Based Entity Ranking. Information Retrieval (2011)

  13. Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101(suppl. 1), 5228–5235 (2004)

    Article  Google Scholar 

  14. Gruber, A., Rosen-Zvi, M., Weiss, Y.: Hidden topic Markov models. In: Artificial Intelligence and Statistics (AISTATS). San Juan, Puerto Rico (2007)

  15. Heinrich, G.: Parameter estimation for text analysis. Tech. Rep., University of Leipzig, Germany (2004). http://www.arbylon.net/publications/text-est.pdf

  16. Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 219–230. WSDM ’08 (2008)

  17. Jindal, N., Liu, B., Lim, E.P.: Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1549–1552. CIKM ’10 (2010)

  18. Jordan, M. (ed.): Learning in Graphical Models. MIT Press, Cambridge (1999)

    Google Scholar 

  19. Kawamae, N.: Author interest topic model. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 887–888. SIGIR ’10, ACM, New York, NY, USA (2010). doi:10.1145/1835449.1835666

    Google Scholar 

  20. Leung, C.K., Chan, S.F., Chung, F.L., Ngai, G.: A probabilistic rating inference framework for mining user preferences from reviews. World Wide Web 14(2), 187–215 (2011). doi:10.1007/s11280-011-0117-5

    Article  Google Scholar 

  21. Li, W., McCallum, A.: Pachinko allocation: Dag-structured mixture models of topic correlations. In: ICML (2006)

  22. Liu, B.: Opinion observer: Analyzing and comparing opinions on the web. In: Proceedings of the 14th international conference on World Wide Web, pp. 342–351. WWW’05 (2005)

  23. Mei, Q., Liu, C., Su, H., Zhai, C.: A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In: Proceedings of the 15th International Conference on World Wide Web, pp. 533–542. WWW ’06 (2006)

  24. Mukherjee, A., Liu, B., Glance, N.: Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web, pp. 191–200. WWW ’12, ACM, New York, NY, USA (2012). doi:10.1145/2187836.2187863

    Chapter  Google Scholar 

  25. Mukherjee, A., Liu, B., Wang, J., Glance, N., Jindal, N.: Detecting group review spam. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 93–94. WWW ’11 (2011)

  26. Phan, X.H., Nguyen, C.T.: Gibbslda+ +: A c/c+ + implementation of latent dirichlet allocation (lda) (2007)

  27. Popescu, A.M., Etzioni, O.: Extracting product features and opinions from reviews. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 339–346. HLT ’05 (2005)

  28. Purver, M., Griffiths, T.L., Körding, K.P., Tenenbaum, J.B.: Unsupervised topic modelling for multi-party spoken discourse. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 17–24. ACL-44 (2006)

  29. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. UAI ’04, AUAI Press, Arlington, Virginia, United States (2004). http://dl.acm.org/citation.cfm?id=1036843.1036902

    Google Scholar 

  30. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975). doi:10.1145/361219.361220

    Article  MATH  Google Scholar 

  31. Sensoy, M., Yolum, P.: Automating user reviews using ontologies: an agent-based approach. World Wide Web 15(3), 285–323 (2012). doi:10.1007/s11280-011-0134-4

    Article  Google Scholar 

  32. Si, J., Li, Q., Qian, T., Deng, X.: Discovering k web user groups with specific aspect interests. In: Proceedings of Machine Learning and Data Mining in Pattern Recognition, pp. 321–335. MLDM 2012 (2012)

  33. Titov, I., McDonald, R.: Modeling online reviews with multi-grain topic models. In: Proceeding of the 17th International Conference on World Wide Web, pp. 111–120. WWW ’08 (2008)

  34. Wang, H., Zhang, D., Zhai, C.: Structural topic model for latent topical structure analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1526–1535. ACL-HLT ’11, Association for Computational Linguistics, Stroudsburg, PA, USA (2011). http://dl.acm.org/citation.cfm?id=2002472.2002657

    Google Scholar 

  35. Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiments and analysis. Tech. Rep., University of Minnesota Press, Minneapolis (2002)

  36. Zhou, X., Zhang, X., Hu, X.: Semantic smoothing of document models for agglomerative clustering. In: Proceeding 20th International Joint Conf. Artificial Intelligence, pp. 2928–2933. IJCAI’ 07 (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianfeng Si.

Additional information

This paper is an extended version of our previous conference paper [32]

Rights and permissions

Reprints and permissions

About this article

Cite this article

Si, J., Li, Q., Qian, T. et al. Users’ interest grouping from online reviews based on topic frequency and order. World Wide Web 17, 1321–1342 (2014). https://doi.org/10.1007/s11280-013-0239-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-013-0239-z

Keywords

Navigation