Skip to main content
Log in

Mining collective knowledge: inferring functional labels from online review for business

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

With the increasing popularity of online e-commerce services, a large volume of online reviews have been constantly generated by users. In this paper, we propose to study the problem of inferring functional labels using online review text. Functional labels summarize and highlight the main characteristics of a business, which can serve as bridges between the consumption needs and the service functions. We consider two kinds of semantic similarities: lexical similarity and embedding similarity, which characterize the relatedness in two different perspectives. To measure the lexical similarity, we use the classic probabilistic ranking formula, i.e., BM25; to measure the embedding similarity, we propose an extended embedding model which can incorporate weak supervised information derived from review text. These two kinds of similarities compensate each other and capture the semantic relatedness in a more comprehensive way. We construct a test collection consisting of four different domains based on a Yelp dataset and consider multiple baseline methods for comparison. Extensive experiments have shown that the proposed methods are very effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://yelp.com.

  2. www.expedia.com.

  3. http://www.tripadvisor.com.

  4. http://yelp.com.

  5. We assume that the information supported by review writers is important. Thus, we discard all the reviews with the rating less than three stars.

  6. For simplicity, we will only present the objective function for a single business; it will be easy to extend to multiple businesses.

  7. Our current model requires that the attribute values should be discretized.

  8. http://www.yelp.com/dataset_challenge.

  9. A label can be considered as a phrase in this method.

References

  1. Archak N, Ghose A, Ipeirotis P (2007) Show me the money! Deriving the pricing power of product features by mining consumer reviews. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining (KDD)

  2. Barker K, Cornacchia N (2000) Using noun phrase heads to extract document keyphrases. In: Advances in artificial intelligence. Springer, Berlin, pp 40–52

  3. Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  4. Bengio Y, LeCun Y, Henderson D (1993) Globally trained handwritten word recognizer using spatial representation, convolutional neural networks, and hidden markov models. In: 7th NIPS conference on advances in neural information processing systems 6, Denver, Colorado, USA, pp 937–944

  5. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

  6. Bordes A, Usunier N, García-Durán A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 2787–2795

  7. Branavan SRK, Chen H, Eisenstein J, Barzilay R (2008) Learning document-level semantic properties from free-text annotations. In: Proceedings of the Association for Computational Linguistics (ACL)

  8. Breck E, Choi Y, Cardie C (2007) Identifying expressions of opinion in context. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), Hyderabad, India

  9. Ganu G, Elhadad N, Marian A (2009) Beyond the stars: improving rating predictions using review text content. In: Proceedings of the 12th international workshop on the web and databases (WebDB)

  10. Ganu G, Kakodkar Y, Marian AL (2013) Improving the quality of predictions using textual information in online user reviews. Inf Syst 38(1):1–15

    Article  Google Scholar 

  11. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining (KDD), pp 168–177

  12. Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the conference on web search and web data mining (WSDM), pp 219–230

  13. Jones KS, Walker S, Robertson SE (2000) A probabilistic model of information retrieval: development and comparative experiments—part 1. Inf Process Manag 36(6):779–808

    Article  Google Scholar 

  14. Jones KS, Walker S, Robertso SE (2000) A probabilistic model of information retrieval: development and comparative experiments—part 2. Inf Process Manag 36(6):809–840

    Article  Google Scholar 

  15. Kiros R, Salakhutdinov R, Zemel RS (2014) Multimodal neural language models. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, pp 595–603

  16. Kiros R, Zemel RS, Salakhutdinov RR (2014) A multiplicative model for learning distributed text-based attribute representations. In: Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, Montreal, Quebec, Canada, pp 2348–2356

  17. Krämer B (1995) Classification of generic places: explorations with implications for evaluation. J Environ Psychol 15(1):3–22

    Article  Google Scholar 

  18. Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, pp 1188–1196

  19. Litvak M, Last M (2008) Graph-based keyword extraction for single-document summarization. In: Proceedings of the workshop on multi-source multilingual information extraction and summarization. Association for Computational Linguistics, pp 17–24

  20. Liu Y, Huang J, An A, Yu X (2007) ARSA: A sentiment-aware model for predicting sales performance using blogs. In: Proceedings of the ACM special interest group on information retrieval (SIGIR)

  21. Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 366–376

  22. McGlohon M, Glance NS, Reiter Z (2010) Star quality: aggregating reviews to rank products and merchants. In: ICWSM. The AAAI Press

  23. Mei Q , Ling X, Wondra M, Su H, Zhai CX (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of WWW, New York, NY, USA. ACM Press, pp 171–180

  24. Mihalcea R, Tarau P (2004) Textrank: bringing order into texts. Association for Computational Linguistics

  25. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. CoRR, abs/1301.3781

  26. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 3111–3119

  27. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135

    Article  Google Scholar 

  28. Socher R, Chen D, Manning CD, Ng AY (2013) Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 926–934

  29. Tomokiyo T, Hurst M (2003) A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 workshop on multiword expressions: analysis, acquisition and treatment, vol 18. Association for Computational Linguistics, pp 33–40

  30. Wan X, Xiao J (2008) Collabrank: towards a collaborative approach to single-document keyphrase extraction. In: Proceedings of the 22nd international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 969–976

  31. Wan X, Xiao J (2008) Single document keyphrase extraction using neighborhood knowledge. AAAI 8:855–860

    Google Scholar 

  32. Wang J, Zhao WX, He Y, Li X (2014) Infer user interests via link structure regularization. ACM TIST 5(2):23:1–23:22

  33. Xu X, Tan S, Liu Y, Cheng X, Lin Z (2012) Towards jointly extracting aspects and aspect-specific sentiment knowledge. In: 21st ACM international conference on information and knowledge management, CIKM’12, Maui, HI, USA, pp 1895–1899

  34. Zhao WX, Li S, He Y, Chang EY, Wen J-R, Li X (2016) Connecting social media to e-commerce: Cold-start product recommendation using microblogging information. IEEE Trans Knowl Data Eng 28(5):1147–1159

    Article  Google Scholar 

  35. Zhao XW, Wang J, He Y, Nie JY, Li X (2013) Originator or propagator?: incorporating social role theory into topic models for twitter content analysis. In: 22nd ACM international conference on information and knowledge management, CIKM’13, San Francisco, CA, USA, pp 1649–1654

  36. Zhao X, Jiang J, Yan H, Li X (2010) Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Cambridge, MA. Association for Computational Linguistics, pp 56–65

Download references

Acknowledgements

The authors thank the anonymous reviewers for their valuable and constructive comments. The work was partially supported by National Natural Science Foundation of China under the Grant Number 61502502, Beijing Natural Science Foundation under the Grant Number 4162032, and Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF201703).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wayne Xin Zhao.

Additional information

Wayne Xin Zhao: Co-first author.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, F., Zhao, W.X., Wen, JR. et al. Mining collective knowledge: inferring functional labels from online review for business. Knowl Inf Syst 53, 723–747 (2017). https://doi.org/10.1007/s10115-017-1050-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1050-4

Keywords

Navigation