Skip to main content

Clustering Chinese Product Features with Multilevel Similarity

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (CCL 2015, NLP-NABD 2015)

Abstract

This paper presents an unsupervised hierarchical clustering approach for grouping co-referred features in Chinese product reviews. To handle different levels of connections between co-referred product features, we consider three similarity measures, namely the literal similarity, the word embedding-based semantic similarity and the explanatory evaluation based contextual similarity. We apply our approach to two corpora of product reviews in car and mobilephone domains. We demonstrate that combining multilevel similarity is of great value to feature normalization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    http://code.google.com/p/word2vec/

References

  • Bhagat, R., Hovy, E.: What is a paraphrase? Computat. Linguist. 39(3), 463–472 (2013)

    Article  Google Scholar 

  • Carenini, G., Ng, R.T., Zwart, E.: Extracting knowledge from evaluative text. In: Proceedings of the 3rd International Conference on Knowledge Capture, pp. 11–18 (2005)

    Google Scholar 

  • Guo, H., Zhu, H., Guo, Z., Zhang, X.X., Su, Z.: Product feature categorization with multilevel latent semantic association. In: Proceedings of CIKM 2009, pp. 1087–1096 (2009)

    Google Scholar 

  • Kim, H.D., Castellanos, M.G., Hsu, M., Zhai, C.X., Dayal, U., Ghosh, R.: Ranking explanatory sentences for opinion summarization. In: Proceedings of SIGIR 2013, pp. 1069–1072 (2013)

    Google Scholar 

  • Liu, V.: Sentiment analysis and subjectivity. In: Handbook of Natural Language Processing, vol. 2, pp. 627–666 (2010)

    Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013a)

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013b)

    Google Scholar 

  • Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of HLT-NAACL 2013, pp. 746–751 (2013c)

    Google Scholar 

  • Pavlopoulos, J., Androutsopoulos, I.: Multi-granular aspect aggregation in aspect-based sentiment analysis. In: Proceedings of EACL 2014, pp. 78–87 (2014)

    Google Scholar 

  • Zhai, Z., Liu, B., Xu, H., Jia, P.: Grouping product features using semi-supervised learning with soft-constraints. In: Proceedings of COLING 2010, pp. 1272–1280 (2010)

    Google Scholar 

  • Zhai, Z., Liu, B., Xu, H., Jia, P.: Clustering product features for opinion mining. In: Proceedings of WSDM 2011, pp. 347–354 (2011)

    Google Scholar 

  • Yang, Y., Ma, Y., Lin, H.: Clustering product features in opinion mining. J. Chin. Inf. Process. 26(3), 104–108 (2012)

    Google Scholar 

  • He, Y., Pan, D., Fu, G.: Chinese explanatory opinionated sentence recognition based on auto-encoding features. Acta Scientiarum Naturalium Universitatis Pekinensis, 1–7 (2015)

    Google Scholar 

  • Skalak, D.: Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithm. In: Proceedings of ICML 1994, pp. 293–301 (1994)

    Chapter  Google Scholar 

Download references

Acknowledgments

This study was supported by National Natural Science Foundation of China under Grant No. 61170148 and the Returned Scholar Foundation of Heilongjiang Province.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guohong Fu .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

He, Y., Song, J., Nan, Y., Fu, G. (2015). Clustering Chinese Product Features with Multilevel Similarity. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25816-4_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25815-7

  • Online ISBN: 978-3-319-25816-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics