Abstract
Large volume of online review data can reveal consumers’ major interests on domain product, which attracts great research interests from the academic community. Most of the existing works focus on the problems of review summarization, aspect identification or opinion mining from an item’s point of view such as the quality or popularity of products. Considering the fact that users who generate those review texts draw different attentions to product aspects with respect to their own interests, in this article, we aim to learn K users’ interest groups indicated by their review writings. Such K interest groups’ identification can facilitate better understanding of major and potential consumers’ concerns which are crucial for applications like product improvement on customer-oriented design or diverse marketing strategies. Instead of using a traditional text clustering approach, we treat the groupId/clusterId as a hidden variable and use a permutation-based structural topic model called KMM. Through this model, we infer K interest groups’ distribution by discovering not only the frequency of product aspects (Topic Frequency), but also the occurrence priority of respective aspects (Topic Order). They jointly present an informative summarization on the raw review corpus. Our experiment on several real-world review datasets demonstrates a competitive solution.
Similar content being viewed by others
References
Abdul-Mageed, M., Diab, M.T., Korayem, M.: Subjectivity and sentiment analysis of modern standard arabic. In: ACL (Short Papers)’11, pp. 587–591 (2011)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. VLDB ’94, Morgan Kaufmann Publishers Inc., San Francisco (1994). http://dl.acm.org/citation.cfm?id=645920.672836
Azzopardi, L., Girolami, M., van Risjbergen, K.: Investigating the relationship between language model perplexity and ir precision-recall measures. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 369–370. SIGIR ’03, ACM, New York (2003). doi:10.1145/860435.860505
Beineke, P., Hastie, T., Manning, C., Vaithyanathan, S.: An exploration of sentiment summarization. In: Proceeding of AAAI, pp. 12–15 (2003)
Bernardo, J.M., Smith., A.F.: Bayesian Theory. Wiley Series in Probability and Statistics (2000)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57, 7:1–7:30 (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Chen, H., Branavan, S.R.K., Barzilay, R., Karger, D.R.: Content modeling using latent permutations. J. Artif. Intell. Res. (JAIR) 36, 129–163 (2009)
Fligner, M.A., Verducci, J.S.: Distance based ranking models. J. Roy. Stat. Soc. B Met. 48(3), 359–369 (1986)
Gamon, M., Aue, A., Corston-Oliver, S., Ringger, E.K.: Pulse: Mining customer opinions from free text. In: IDA’05, pp. 121–132 (2005)
Ganesan, K., Zhai, C.: Opinion-Based Entity Ranking. Information Retrieval (2011)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101(suppl. 1), 5228–5235 (2004)
Gruber, A., Rosen-Zvi, M., Weiss, Y.: Hidden topic Markov models. In: Artificial Intelligence and Statistics (AISTATS). San Juan, Puerto Rico (2007)
Heinrich, G.: Parameter estimation for text analysis. Tech. Rep., University of Leipzig, Germany (2004). http://www.arbylon.net/publications/text-est.pdf
Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 219–230. WSDM ’08 (2008)
Jindal, N., Liu, B., Lim, E.P.: Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1549–1552. CIKM ’10 (2010)
Jordan, M. (ed.): Learning in Graphical Models. MIT Press, Cambridge (1999)
Kawamae, N.: Author interest topic model. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 887–888. SIGIR ’10, ACM, New York, NY, USA (2010). doi:10.1145/1835449.1835666
Leung, C.K., Chan, S.F., Chung, F.L., Ngai, G.: A probabilistic rating inference framework for mining user preferences from reviews. World Wide Web 14(2), 187–215 (2011). doi:10.1007/s11280-011-0117-5
Li, W., McCallum, A.: Pachinko allocation: Dag-structured mixture models of topic correlations. In: ICML (2006)
Liu, B.: Opinion observer: Analyzing and comparing opinions on the web. In: Proceedings of the 14th international conference on World Wide Web, pp. 342–351. WWW’05 (2005)
Mei, Q., Liu, C., Su, H., Zhai, C.: A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In: Proceedings of the 15th International Conference on World Wide Web, pp. 533–542. WWW ’06 (2006)
Mukherjee, A., Liu, B., Glance, N.: Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web, pp. 191–200. WWW ’12, ACM, New York, NY, USA (2012). doi:10.1145/2187836.2187863
Mukherjee, A., Liu, B., Wang, J., Glance, N., Jindal, N.: Detecting group review spam. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 93–94. WWW ’11 (2011)
Phan, X.H., Nguyen, C.T.: Gibbslda+ +: A c/c+ + implementation of latent dirichlet allocation (lda) (2007)
Popescu, A.M., Etzioni, O.: Extracting product features and opinions from reviews. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 339–346. HLT ’05 (2005)
Purver, M., Griffiths, T.L., Körding, K.P., Tenenbaum, J.B.: Unsupervised topic modelling for multi-party spoken discourse. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 17–24. ACL-44 (2006)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. UAI ’04, AUAI Press, Arlington, Virginia, United States (2004). http://dl.acm.org/citation.cfm?id=1036843.1036902
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975). doi:10.1145/361219.361220
Sensoy, M., Yolum, P.: Automating user reviews using ontologies: an agent-based approach. World Wide Web 15(3), 285–323 (2012). doi:10.1007/s11280-011-0134-4
Si, J., Li, Q., Qian, T., Deng, X.: Discovering k web user groups with specific aspect interests. In: Proceedings of Machine Learning and Data Mining in Pattern Recognition, pp. 321–335. MLDM 2012 (2012)
Titov, I., McDonald, R.: Modeling online reviews with multi-grain topic models. In: Proceeding of the 17th International Conference on World Wide Web, pp. 111–120. WWW ’08 (2008)
Wang, H., Zhang, D., Zhai, C.: Structural topic model for latent topical structure analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1526–1535. ACL-HLT ’11, Association for Computational Linguistics, Stroudsburg, PA, USA (2011). http://dl.acm.org/citation.cfm?id=2002472.2002657
Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiments and analysis. Tech. Rep., University of Minnesota Press, Minneapolis (2002)
Zhou, X., Zhang, X., Hu, X.: Semantic smoothing of document models for agglomerative clustering. In: Proceeding 20th International Joint Conf. Artificial Intelligence, pp. 2928–2933. IJCAI’ 07 (2007)
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is an extended version of our previous conference paper [32]
Rights and permissions
About this article
Cite this article
Si, J., Li, Q., Qian, T. et al. Users’ interest grouping from online reviews based on topic frequency and order. World Wide Web 17, 1321–1342 (2014). https://doi.org/10.1007/s11280-013-0239-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-013-0239-z