A novel feature extraction methodology for sentiment analysis of product reviews

Chen, Xin; Xue, Yun; Zhao, Hongya; Lu, Xin; Hu, Xiaohui; Ma, Zhihao

doi:10.1007/s00521-018-3477-2

A novel feature extraction methodology for sentiment analysis of product reviews

Original Article
Published: 21 April 2018

Volume 31, pages 6625–6642, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xin Chen ORCID: orcid.org/0000-0001-5931-5072^1,3,
Yun Xue^1,2,
Hongya Zhao³,
Xin Lu¹,
Xiaohui Hu¹ &
…
Zhihao Ma¹

1167 Accesses
19 Citations
Explore all metrics

Abstract

Feature extraction is one of the key steps for text sentiment analysis (SA), and the corresponding algorithms have important effect on the results. In the paper, a novel methodology is proposed to extract the feature for SA of product reviews. First, based on the diversified expression forms of product reviews, the generalized TF–IDF feature vectors are obtained by introducing the semantic similarity of synonyms. Then, in view of the different lengths of product reviews, the local patterns of the feature vectors are identified with OPSM biclustering algorithm. Finally, we improve PrefixSpan algorithm to detect the frequent and pseudo-consecutive phrases with high discriminative ability (namely FPCD phrases), which contain word-order information. Furthermore, some important factors, such as the separation and discriminative ability of words, are also employed to improve the discriminative ability of sentiment polarity. Based on the previous steps, the text feature vectors are extracted. A series of the experiment and comparison results indicate that the performance for SA on product review is greatly improved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

Article 09 April 2024

References

Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP), pp 79–86
Tan S, Zhang J (2008) An empirical study of sentiment analysis for Chinese documents. Expert Syst Appl 34(4):2622–2629
Article Google Scholar
Zhang HJ, Ji Y, Li J, Ye Y (2016) A triple wing harmonium model for movie recommendation. IEEE Trans Ind Inf 12(1):231–239
Article Google Scholar
Zhang Y (2015) Incorporating phrase-level sentiment analysis on textual reviews for personalized recommendation. In: Proceedings of the eighth ACM international conference on web search and data mining. ACM 2015, pp 435–440
Yaakub MR, Li Y, Zhang J (2013) Integration of sentiment analysis into customer relational model: the importance of feature ontology and synonym. Procedia Technol 11:495–501
Article Google Scholar
Wang W, Tan G, Wang H (2016) Cross-domain comparison of algorithm performance in extracting aspect-based opinions from Chinese online reviews. Int J Mach Learn Cybern 8(3):1–18
Google Scholar
Basu T, Murthy C (2016) A supervised term selection technique for effective text categorization. Int J Mach Learn Cybern 7(5):877–892
Article Google Scholar
Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606
Article Google Scholar
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
MATH Google Scholar
Ben-Dor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3–4):373–384
Article Google Scholar
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: the Prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Article Google Scholar
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. Meet Assoc Comput Linguist Hum Lang Technol 2011:142–150
Google Scholar
Salton G, Yu CT (1974) On the construction of effective vocabularies for information retrieval. ACM SIGIR Forum 9(3):48–60
Article Google Scholar
Morin F, Bengio Y (2005) Hierarchical probabilistic neural network language model. Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pp 246–252
Mnih A, Hinton GE (2009) A scalable hierarchical distributed language model. International Conference on Neural Information Processing Systems, pp 1081–1088
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. Proceedings of Workshop at International Conference on Learning Representations, pp 1–12
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. Conf Empir Methods Nat Lang Proc 2014:1532–1543
Google Scholar
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. Comput Sci 5(1):36
Google Scholar
Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv preprint arXiv:160704606
Wang Y, Liu Z, Sun M (2015) Incorporating linguistic knowledge for learning distributed word representations. PLoS ONE 10(4):e0118437
Article Google Scholar
Matsumoto S, Takamura H, Okumura M (2005) Sentiment classification using word sub-sequences and dependency sub-trees. In: Pacific-Asia conference on knowledge discovery and data mining, 2005. Springer, pp 301–311
Dong Z, Dong Q (2003) HowNet—a hybrid language and knowledge resource. Int Conf Nat Lang Process Knowl Eng Proc 2003:820–824
Google Scholar
Yuan B, Liu Y, Li H (2013) Sentiment classification in Chinese microblogs: lexicon-based and learning-based approaches. Int Proc Econ Dev Res 68:1
Google Scholar
Miller GA (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41
Article Google Scholar
Esuli A, Sebastiani F (2006) Sentiwordnet: a publicly available lexical resource for opinion mining. Proceedings of the 5th Conference on Language Resources and Evaluation, pp 417–422
Xu R, Chen T, Xia Y, Lu Q, Liu B, Wang X (2015) Word embedding composition for data imbalances in sentiment and emotion classification. Cogn Comput 7(2):226–240
Article Google Scholar
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, Hoboken
MATH Google Scholar
Törönen P, Kolehmainen M, Wong G, Castren E (1999) Analysis of gene expression data using self-organizing maps. FEBS Lett 451(2):142–146
Article Google Scholar
Xu JH, Liu H (2010) Web user clustering analysis based on Kmeans algorithm. In: 2010 international conference on information, networking and automation, 2010, pp V2-6–V2-9
Xue Y, Liu ZW, Luo J, Ma ZH, Zhang MZ, Hu XH, Kuang QH (2015) Stock market trading rules discovery based on biclustering method. Math Probl Eng 2015:1–13
Article Google Scholar
Cheng Y, Church GM (2000) Biclustering of expression data. Int Conf Intell Syst Mol Biol 2000:93
Google Scholar
Yang J, Wang W, Wang H (2002)/spl delta/-clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th international conference on data engineering 2002, pp 517–528
Lazzeroni L, Owen A (2002) Plaid models for gene expression data. Stat Sin 12:61–86
MathSciNet MATH Google Scholar
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 1(1):24–45
Article Google Scholar
Liu ZW, Xue Y, Li MH, Ma B, Zhang MZ, Chen X, Hu XH (2017) Discovery of deep order-preserving submatrix in DNA microarray data based on sequential pattern mining. Int J Data Min Bioinform 17(3):217–237
Article Google Scholar
Wang H (2007) All common subsequences. In: Proceedings of the international joint conference on artificial intelligence, 2007, pp 635–640
Han JW, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu M-C (2000) Freespan: frequent pattern-projected sequential pattern mining. Paper presented at the proceedings of the 6th ACM SIGKDD international conference on knowledge discovery and data mining, 2000, pp 355–359
Peterson EA, Tang P (2008) Mining frequent sequential patterns with first-occurrence forests. In: Proceedings of the 46th annual southeast regional conference on XX. ACM, 2008, pp 34–39
Zhang HP, Yu HK, Xiong DY, Liu Q (2003) HHMM-based Chinese lexical analyzer ICTCLAS. Sighan Workshop on Chinese Language Processing, pp 758–759
Wang C, Zhang M, Ma S, Ru L (2008) Automatic online news issue construction in web environment. Int Conf World Wide Web 2008:457–466
Google Scholar
Hashimoto TB, Alvarezmelis D, Jaakkola TS (2015) Word, graph and manifold embedding from Markov processes. New Media & Society, pp 1–6
Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, Mcclosky D (2014) The Stanford Corenlp Natural Language Processing Toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp 55–60
Wu Q, Ye Y, Zhang H, Ng MK, Ho SS (2014) ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowl Based Syst 67(3):105–116
Article Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Article MathSciNet Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(10):2825–2830
MathSciNet MATH Google Scholar
Goodfellow I, Courville A, Bengio Y (2012) Large-scale feature learning with spike-and-slab sparse coding. Proceedings of the 29th International Conference on Machine Learning, pp 1439–1446
Zhang HJ, Chow TWS, Wu QMJ (2016) Organizing books and authors by multilayer SOM. IEEE Trans Neural Netw Learn Syst 27(12):2537
Article Google Scholar
Zhang HJ, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Ind Inform 13(2):616–624
Article Google Scholar
Zhang HJ, Cao X, Ho JKL, Chow TWS (2016) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531
Article Google Scholar
Oyedotun OK, Khashman A (2016) Deep learning in vision-based static hand gesture recognition. Neural Comput Appl 2016:1–11
Google Scholar

Download references

Acknowledgements

The authors thank gratefully for the colleagues participated in this work and provided technical supports. This work is supported by Guangdong Provincial Engineering Technology Research Center for Data Science (Nos. 2016KF09, 2016KF10), and the National Statistical Science Research Project of China (Nos. 2015LY81, 2016LY98). This work was also supported by the Science and Technology Department of Guangdong Province in China (Grant Nos. 2016A010101020, 2016A010101021, 2016A010101022), the grant from Guangdong Province Science and Technology Planning Project (No. 2013B040404009), Foundation of Guangdong Polytechnic of Science and Technology (No. XJSC2016206), Natural Science Funds of Shenzhen Science and Technology Innovation Commission (No. JCYJ20160527172144272) and the Innovation Project of Graduate School of South China Normal University (No. 2015lkxm37).

Author information

Authors and Affiliations

School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou, 510006, China
Xin Chen, Yun Xue, Xin Lu, Xiaohui Hu & Zhihao Ma
Guangdong Provincial Key Laboratory of Quantum Engineering and Quantum Materials, Guangdong Provincial Engineering Technology Research Center for Data Science, Guangzhou, 510006, China
Yun Xue
Shenzhen PolyTechnic, Shenzhen, 518055, China
Xin Chen & Hongya Zhao

Authors

Xin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yun Xue
View author publications
You can also search for this author in PubMed Google Scholar
Hongya Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xin Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhihao Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yun Xue.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, X., Xue, Y., Zhao, H. et al. A novel feature extraction methodology for sentiment analysis of product reviews. Neural Comput & Applic 31, 6625–6642 (2019). https://doi.org/10.1007/s00521-018-3477-2

Download citation

Received: 12 August 2017
Accepted: 03 April 2018
Published: 21 April 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s00521-018-3477-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel feature extraction methodology for sentiment analysis of product reviews

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel feature extraction methodology for sentiment analysis of product reviews

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation