Skip to main content

A Domain Independent Framework to Extract and Aggregate Analogous Features in Online Reviews

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7181))

  • 2029 Accesses

Abstract

Extracting and detecting features from online reviews is both important and challenging, especially when domain knowledge is not explicitly available. Moreover, opinions about the same feature of a product or service are frequently expressed in various lexical forms. In this paper, we present a novel framework to automatically detect, extract and aggregate semantically related features of reviewed products and services. Our model uses sentence level syntactic and lexical information to detect candidate feature words, and corpus level co-occurrence statistics to perform grouping of semantically similar features to obtain high precision feature detection. The high precision feature assembly capability of our model has a distinct advantage over state of the art approaches, like double propagation, by producing short and succinct sets of features compared to potential thousands of features that are generated by existing approaches. We evaluate our model in two completely unrelated domains, restaurant and camera online reviews, to verify its domain independence. The results of our model outperformed existing state of the art probabilistic models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Qiu, G., et al.: Opinion Word Expansion and Target Extraction through Double Propagation. Computational Linguistics 37, 9–27 (2011)

    Article  Google Scholar 

  2. Dhar, V., Chang, E., Stern, L.N.: Does Chatter Matter? The Impact of User-Generated Content on Music. CeDER Working Paper. New York University (2007)

    Google Scholar 

  3. Ipeirotis, P.G.: Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics. IEEE Transactions on Knowledge and Data Engineering (TKDE) 99 (2010)

    Google Scholar 

  4. Bansal, M., Cardie, C., Lee, L.: The power of negative thinking: Exploiting label disagreement in the min-cut classification framework. In: Proceedings of COLING: Companion volume: Posters, pp. 13–16 (2008)

    Google Scholar 

  5. Eguchi, K., Lavrenko, V.: Sentiment Retrieval using Generative Models. In: Jurafsky, D., Gaussier, É. (eds.) ACL, pp. 345–354 (2006) ISBN: 1-932432-73-6

    Google Scholar 

  6. Esuli, A., Sebastiani, F.: Determining the semantic orientation of terms through gloss classification, pp. 617–624 (2005)

    Google Scholar 

  7. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques, pp. 79–86. Association for Computational Linguistics, Stroudsburg (2002)

    Google Scholar 

  8. Pang, B., Lee, L.: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts, pp. 271–278 (2004)

    Google Scholar 

  9. Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews, pp. 417–424. Association for Computational Linguistics, Stroudsburg (2002), doi: http://dx.doi.org/10.3115/1073083.1073153

    Google Scholar 

  10. Dumais, S.T., et al.: Using Latent Semantic Analysis To Improve Access To Textual Information, pp. 281–285. ACM (1988)

    Google Scholar 

  11. Hofmann, T.: Probabilistic latent semantic indexing, pp. 50–57. ACM, New York (1999), doi: http://doi.acm.org/10.1145/312624.312649 , ISBN: 1-58113-096-1

    Google Scholar 

  12. Blei, D.M., et al.: Latent dirichlet allocation. Journal of Machine Learning Research 3 (2003)

    Google Scholar 

  13. Hu, M., Liu, B.: Mining opinion features in customer reviews, pp. 755–760. AAAI Press (2004) ISBN: 0-262-51183-5

    Google Scholar 

  14. Brody, S., Elhadad, N.: An unsupervised aspect-sentiment model for online reviews, pp. 804–812. Association for Computational Linguistics, Stroudsburg (2010) ISBN: 1-932432-65-5

    Google Scholar 

  15. Titov, I., McDonald, R.: Modeling online reviews with multi-grain topic models, pp. 111–120. ACM, New York (2008) ISBN: 978-1-60558-085-2

    Google Scholar 

  16. Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38, 39–41 (1995)

    Article  Google Scholar 

  17. Zhuang, L., Jing, F., Zhu, X.-Y.: Movie review mining and summarization, pp. 43–50. ACM, New York (2006) ISBN: 1-59593-433-2

    Google Scholar 

  18. Morinaga, S., et al.: Mining product reputations on the Web, pp. 341–349. ACM, New York (2002) ISBN: 1-58113-567-X

    Google Scholar 

  19. Popescu, A.-M., Etzioni, O.: Extracting product features and opinions from reviews, pp. 339–346. Association for Computational Linguistics, Stroudsburg (2005)

    Google Scholar 

  20. Lu, Y., Zhai, C.X., Sundaresan, N.: Rated aspect summarization of short comments, pp. 131–140. ACM, New York (2009), doi: http://doi.acm.org/10.1145/1526709.1526728 , ISBN: 978-1-60558-487-4

    Google Scholar 

  21. Wiebe, J.M.: Learning Subjective Adjectives from Corpora, pp. 735–740 (2000)

    Google Scholar 

  22. Ganu, G., Elhadad, N., Marian, A.: Beyond the stars: improving rating predictions using review text content. In: Proceedings of the Twelfth International Workshop on the Web and Databases (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bhattarai, A., Niraula, N., Rus, V., Lin, KI. (2012). A Domain Independent Framework to Extract and Aggregate Analogous Features in Online Reviews. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28604-9_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28604-9_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28603-2

  • Online ISBN: 978-3-642-28604-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics