Abstract
With the increasing popularity of online e-commerce services, a large volume of online reviews have been constantly generated by users. In this paper, we propose to study the problem of inferring functional labels using online review text. Functional labels summarize and highlight the main characteristics of a business, which can serve as bridges between the consumption needs and the service functions. We consider two kinds of semantic similarities: lexical similarity and embedding similarity, which characterize the relatedness in two different perspectives. To measure the lexical similarity, we use the classic probabilistic ranking formula, i.e., BM25; to measure the embedding similarity, we propose an extended embedding model which can incorporate weak supervised information derived from review text. These two kinds of similarities compensate each other and capture the semantic relatedness in a more comprehensive way. We construct a test collection consisting of four different domains based on a Yelp dataset and consider multiple baseline methods for comparison. Extensive experiments have shown that the proposed methods are very effective.







Similar content being viewed by others
Notes
We assume that the information supported by review writers is important. Thus, we discard all the reviews with the rating less than three stars.
For simplicity, we will only present the objective function for a single business; it will be easy to extend to multiple businesses.
Our current model requires that the attribute values should be discretized.
A label can be considered as a phrase in this method.
References
Archak N, Ghose A, Ipeirotis P (2007) Show me the money! Deriving the pricing power of product features by mining consumer reviews. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining (KDD)
Barker K, Cornacchia N (2000) Using noun phrase heads to extract document keyphrases. In: Advances in artificial intelligence. Springer, Berlin, pp 40–52
Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Bengio Y, LeCun Y, Henderson D (1993) Globally trained handwritten word recognizer using spatial representation, convolutional neural networks, and hidden markov models. In: 7th NIPS conference on advances in neural information processing systems 6, Denver, Colorado, USA, pp 937–944
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Bordes A, Usunier N, García-Durán A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 2787–2795
Branavan SRK, Chen H, Eisenstein J, Barzilay R (2008) Learning document-level semantic properties from free-text annotations. In: Proceedings of the Association for Computational Linguistics (ACL)
Breck E, Choi Y, Cardie C (2007) Identifying expressions of opinion in context. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), Hyderabad, India
Ganu G, Elhadad N, Marian A (2009) Beyond the stars: improving rating predictions using review text content. In: Proceedings of the 12th international workshop on the web and databases (WebDB)
Ganu G, Kakodkar Y, Marian AL (2013) Improving the quality of predictions using textual information in online user reviews. Inf Syst 38(1):1–15
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining (KDD), pp 168–177
Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the conference on web search and web data mining (WSDM), pp 219–230
Jones KS, Walker S, Robertson SE (2000) A probabilistic model of information retrieval: development and comparative experiments—part 1. Inf Process Manag 36(6):779–808
Jones KS, Walker S, Robertso SE (2000) A probabilistic model of information retrieval: development and comparative experiments—part 2. Inf Process Manag 36(6):809–840
Kiros R, Salakhutdinov R, Zemel RS (2014) Multimodal neural language models. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, pp 595–603
Kiros R, Zemel RS, Salakhutdinov RR (2014) A multiplicative model for learning distributed text-based attribute representations. In: Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, Montreal, Quebec, Canada, pp 2348–2356
Krämer B (1995) Classification of generic places: explorations with implications for evaluation. J Environ Psychol 15(1):3–22
Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, pp 1188–1196
Litvak M, Last M (2008) Graph-based keyword extraction for single-document summarization. In: Proceedings of the workshop on multi-source multilingual information extraction and summarization. Association for Computational Linguistics, pp 17–24
Liu Y, Huang J, An A, Yu X (2007) ARSA: A sentiment-aware model for predicting sales performance using blogs. In: Proceedings of the ACM special interest group on information retrieval (SIGIR)
Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 366–376
McGlohon M, Glance NS, Reiter Z (2010) Star quality: aggregating reviews to rank products and merchants. In: ICWSM. The AAAI Press
Mei Q , Ling X, Wondra M, Su H, Zhai CX (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of WWW, New York, NY, USA. ACM Press, pp 171–180
Mihalcea R, Tarau P (2004) Textrank: bringing order into texts. Association for Computational Linguistics
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. CoRR, abs/1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 3111–3119
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Socher R, Chen D, Manning CD, Ng AY (2013) Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 926–934
Tomokiyo T, Hurst M (2003) A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 workshop on multiword expressions: analysis, acquisition and treatment, vol 18. Association for Computational Linguistics, pp 33–40
Wan X, Xiao J (2008) Collabrank: towards a collaborative approach to single-document keyphrase extraction. In: Proceedings of the 22nd international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 969–976
Wan X, Xiao J (2008) Single document keyphrase extraction using neighborhood knowledge. AAAI 8:855–860
Wang J, Zhao WX, He Y, Li X (2014) Infer user interests via link structure regularization. ACM TIST 5(2):23:1–23:22
Xu X, Tan S, Liu Y, Cheng X, Lin Z (2012) Towards jointly extracting aspects and aspect-specific sentiment knowledge. In: 21st ACM international conference on information and knowledge management, CIKM’12, Maui, HI, USA, pp 1895–1899
Zhao WX, Li S, He Y, Chang EY, Wen J-R, Li X (2016) Connecting social media to e-commerce: Cold-start product recommendation using microblogging information. IEEE Trans Knowl Data Eng 28(5):1147–1159
Zhao XW, Wang J, He Y, Nie JY, Li X (2013) Originator or propagator?: incorporating social role theory into topic models for twitter content analysis. In: 22nd ACM international conference on information and knowledge management, CIKM’13, San Francisco, CA, USA, pp 1649–1654
Zhao X, Jiang J, Yan H, Li X (2010) Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Cambridge, MA. Association for Computational Linguistics, pp 56–65
Acknowledgements
The authors thank the anonymous reviewers for their valuable and constructive comments. The work was partially supported by National Natural Science Foundation of China under the Grant Number 61502502, Beijing Natural Science Foundation under the Grant Number 4162032, and Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF201703).
Author information
Authors and Affiliations
Corresponding author
Additional information
Wayne Xin Zhao: Co-first author.
Rights and permissions
About this article
Cite this article
Fan, F., Zhao, W.X., Wen, JR. et al. Mining collective knowledge: inferring functional labels from online review for business. Knowl Inf Syst 53, 723–747 (2017). https://doi.org/10.1007/s10115-017-1050-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1050-4