Skip to main content
Log in

A data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

New sentiment words in product reviews are valuable resources that are directly close to users. The data processing of new sentiment word extraction can provide information service better for users and provide theoretical support for the related research of edge computing. Traditional methods for extracting new sentiment words generally ignored the context and syntactic information, which leads to the low accuracy and recall rate in the process of extracting new sentiment words. To tackle the mentioned issue, we proposed a data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews. Firstly, the probability that the new word is a sentiment word is calculated through the location rules derived from the sequence labeling result, and the candidate set of new sentiment words is obtained according to the probability. Then, the candidate set of new sentiment words is supplemented with the method of matching appositive words based on edit distance. Finally, the final set of new sentiment words is collected through fine-grained filtering, including the calculation of point mutual information and difference coefficient of positive and negative corpus (DC-PNC). The experimental results illustrate the effectiveness of new sentiment words extracted by the proposed method which can obviously improve the accuracy and recall rate of sentiment analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Data cannot be made available for privacy reasons.

References

  • Basiri ME, Abdar M, Kabiri A, Nemati S, Zhou X, Allahbakhshi F (2020) Improving sentiment polarity detection through target identification. IEEE Trans Comput Social Syst 7(1):113–128

    Article  Google Scholar 

  • Beigi OM, Moattar MH (2020) Automatic construction of domain-specific sentiment lexicon for unsupervised domain adaptation and sentiment classification. Knowledge-Based Syst 213:106423

    Article  Google Scholar 

  • Bi J, Liu Y, Fan Z (2019) Representing sentiment analysis results of online reviews using interval type-2 fuzzy numbers and its application to product ranking. Inf Sci 504:293–307

    Article  Google Scholar 

  • Chen Z, Liu X, Yin Y, Lu H (2020) Named entity recognition method for fault knowledge based on deep learning. In: Proceedings of the 4th international conference on machine learning and soft computing (ICMLSC 2020), Haiphong City, Viet Nam, ACM, January 17–19, 2020, pp.1–4

  • Darwich M, Noah SAM, Omar N (2020) Deriving the sentiment polarity of term senses using dual-step context-aware in-gloss matching. Inf Process Manag 57(6):102273

    Article  Google Scholar 

  • Deng D, Jing L, Yu J, Sun S, Michael K. Ng. (2019) Sentiment Lexicon construction with hierarchical supervision topic model. IEEE/ACM Trans Audio Speech Lang Process 27(4):704–718

    Article  Google Scholar 

  • Deng D, Jing L, Yu J, Sun S (2019) Sparse Self-Attention LSTM for Sentiment Lexicon Construction. IEEE/ACM Trans Audio Speech Lang Process 27(11):1777–1790

    Article  Google Scholar 

  • He K, Wang W, Wang X, Hopcroft JE (2019) A new anchor word selection method for the separable topic discovery. Wiley Interdiscip Rev Data Mining Knowledge Discov 9(5):1313–1318

    Article  Google Scholar 

  • Lee CW, Wu YL, Yu LC (2019) Combining mutual information and entropy for unknown word extraction from multilingual code-switching sentences. J Inf Sci Eng 35(3):597–610

    Google Scholar 

  • Lee Y, Park S, Yu K, Kim J (2018) Building place-specific sentiment Lexicon. In: Proceedings of the 2nd international conference on digital signal processing (ICDSP 2018). Association for Computing Machinery, Tokyo, Japan, ACM, February 25–27, 2018, pp.147–150

  • Li M, Lu Q, Long Y, Gui L (2017) Inferring affective meanings of words from word embedding. IEEE Trans Affect Comput 8(4):443–456

    Article  Google Scholar 

  • Li W, Guo K, Shi Y, Zhu L, Zheng Y (2018) DWWP: Domain-specific new words detection and word propagation system for sentiment analysis in the tourism domain. Knowl-Based Syst 146(15):203–214

    Article  Google Scholar 

  • Li X, Wu B, Zhang B (2016) Unknown word detection in song poetry. In: IEEE International conference on data science in cyberspace (DSC), Changsha, China, June 13–16, 2016, pp.544–549

  • Lin CW, Shao Y, Zhang J, Yun U (2020) Enhanced sequence labeling based on latent variable conditional random fields. Neurocomputing 403:431–440

    Article  Google Scholar 

  • Lu K, Wu J (2019) Sentiment analysis of film review texts based on sentiment dictionary and SVM. In: Proceedings of the 2019 3rd international conference on innovation in artificial intelligence (ICIAI 2019), Suzhou, China, ACM, March 15, 2019, pp.73–77

  • Manek AS, Shenoy PD, Mohan MC (2017) Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. Word Wide Web 20(2):135–154

    Article  Google Scholar 

  • Pandey PP, Soni MN (2019) Sentiment analysis on customer feedback data: amazon product reviews. In: 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon), Faridabad, India, February, 2019, pp.320–322

  • Peng Q, Zhang Y, Zhang Y, Jason B, Christopher DM (2020) Stanza: a python natural language processing toolkit for many human languages. In: Proceedings of the 58th annual meeting of the association for computational linguistics: system demonstrations (ACL 2020), Online, July 5–10, 2020, pp.101–108

  • Pota M, Marulli F, Esposito M, Pietro GD, Fujita H (2019) Multilingual POS tagging by a composite deep architecture based on character-level features and on-the-fly enriched Word Embeddings. Knowl-Based Syst 164:309–323

    Article  Google Scholar 

  • Sarna G, Bhatia M P S. (2016) A probalistic approach to automatically extract new words from social media. In: Proceedings of the 2016 IEEE/ACM international conference on advances in social networks analysis and mining. San Francisco, CA, USA, August 18–21, 2016, pp.719–725

  • SinghSN, Sarraf T (2020) Sentiment analysis of a product based on user reviews using random forests algorithm. In: 2020 10th international conference on cloud computing, data science & engineering (Confluence), Noida, India, April 9, 2020, pp.112–116

  • Sun X, Ma S, Zhang Y, Ren X (2019) Towards easier and faster sequence labeling for natural language processing: a search-based probabilistic online learning framework (SAPO). Inf Sci 478:303–317

    Article  MathSciNet  Google Scholar 

  • Sun X, Sun S, Yin M, Yang H (2020) Hybrid neural conditional random fields for multi-view sequence labeling. Knowledge-Based Syst 189:105151

    Article  Google Scholar 

  • Wang L, Li S, Yan Q, Zhou G (2018) Domain-specific named entity recognition with document-level optimization. ACM Trans Asian Low Resour Lang Inf Process 17(4):1–15

    Google Scholar 

  • Wang W, Bao F, Gao G (2019) Learning morpheme representation for mongolian named entity recognition. Neural Process Lett 50(3):2647–2664

    Article  Google Scholar 

  • Wu F, Huang Y, Yuan Z (2017) Domain-specific sentiment classification via fusing sentiment knowledge from multiple sources. Inf Fusion 35:26–37

    Article  Google Scholar 

  • Wu C, Wu F, Liu J, Huang Y, Xie X. (2019) Sentiment lexicon enhanced neural sentiment classification. In: Proceedings of the 28th ACM international conference on information and knowledge management (CIKM 2019), Beijing, China, ACM, November 3, 2019, pp.1091–1100

  • Yan L, Bai B, Chen W, Wu D (2017) New word extraction from Chinese financial documents. IEEE Signal Process Lett 24(6):770–773

    Article  Google Scholar 

  • Zhang S, Wei Z, Wang Y (2018) Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary. Futur Gener Comput Syst 81:395–403

    Article  Google Scholar 

  • Zhang S, Hu Z, Zhu G, Jin M, Li K (2021) Sentiment classification model for Chinese micro-blog comments based on key sentences extraction. Soft Comput 25:463–476

    Article  Google Scholar 

  • Zhao W, Guan Z, Chen L, He X, Cai D, Wang B, Wang Q (2018) Weakly-supervised deep embedding for product review sentiment analysis. IEEE Trans Knowl Data Eng 30(1):185–197

    Article  Google Scholar 

  • Zhao M, Zhang T, Chai J. (2016) Based on SO-PMI algorithm to discriminate sentimental words' polarity in TV programs' subjective evaluation. In: 2016 9th International symposium on computational intelligence and design (ISCID), Hangzhou, China, May 12, 2016, pp.38–40

  • Zhou D, Zhang Z, Zhang M, He Y (2018) Weakly supervised POS tagging without disambiguation. ACM Trans Asian Low Resour Lang Inf Process 17(4):1–19

    Article  Google Scholar 

  • Zhu G, Pan Z, Wang Q, Zhang S, Li K (2020) Building multi-subtopic Bi-level network for micro-blog hot topic based on feature co-Occurrence and semantic community division. J Net Comput Appl 170:102815

    Article  Google Scholar 

Download references

Acknowledgments

This research work was supported in part by the National Natural Science Foundation of China (Grant No. 62076006), in part by the 2019 Anhui Provincial Natural Science Foundation Project (Grant No. 1908085MF189).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shunxiang Zhang or KuanChing Li.

Ethics declarations

Conflict of interest

All the authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

No humans or any individual participants are involved in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, S., Xu, H., Zhu, G. et al. A data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews. Soft Comput 26, 853–866 (2022). https://doi.org/10.1007/s00500-021-06228-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-06228-9

Keywords

Navigation