Abstract
New sentiment words in product reviews are valuable resources that are directly close to users. The data processing of new sentiment word extraction can provide information service better for users and provide theoretical support for the related research of edge computing. Traditional methods for extracting new sentiment words generally ignored the context and syntactic information, which leads to the low accuracy and recall rate in the process of extracting new sentiment words. To tackle the mentioned issue, we proposed a data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews. Firstly, the probability that the new word is a sentiment word is calculated through the location rules derived from the sequence labeling result, and the candidate set of new sentiment words is obtained according to the probability. Then, the candidate set of new sentiment words is supplemented with the method of matching appositive words based on edit distance. Finally, the final set of new sentiment words is collected through fine-grained filtering, including the calculation of point mutual information and difference coefficient of positive and negative corpus (DC-PNC). The experimental results illustrate the effectiveness of new sentiment words extracted by the proposed method which can obviously improve the accuracy and recall rate of sentiment analysis.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Data cannot be made available for privacy reasons.
References
Basiri ME, Abdar M, Kabiri A, Nemati S, Zhou X, Allahbakhshi F (2020) Improving sentiment polarity detection through target identification. IEEE Trans Comput Social Syst 7(1):113–128
Beigi OM, Moattar MH (2020) Automatic construction of domain-specific sentiment lexicon for unsupervised domain adaptation and sentiment classification. Knowledge-Based Syst 213:106423
Bi J, Liu Y, Fan Z (2019) Representing sentiment analysis results of online reviews using interval type-2 fuzzy numbers and its application to product ranking. Inf Sci 504:293–307
Chen Z, Liu X, Yin Y, Lu H (2020) Named entity recognition method for fault knowledge based on deep learning. In: Proceedings of the 4th international conference on machine learning and soft computing (ICMLSC 2020), Haiphong City, Viet Nam, ACM, January 17–19, 2020, pp.1–4
Darwich M, Noah SAM, Omar N (2020) Deriving the sentiment polarity of term senses using dual-step context-aware in-gloss matching. Inf Process Manag 57(6):102273
Deng D, Jing L, Yu J, Sun S, Michael K. Ng. (2019) Sentiment Lexicon construction with hierarchical supervision topic model. IEEE/ACM Trans Audio Speech Lang Process 27(4):704–718
Deng D, Jing L, Yu J, Sun S (2019) Sparse Self-Attention LSTM for Sentiment Lexicon Construction. IEEE/ACM Trans Audio Speech Lang Process 27(11):1777–1790
He K, Wang W, Wang X, Hopcroft JE (2019) A new anchor word selection method for the separable topic discovery. Wiley Interdiscip Rev Data Mining Knowledge Discov 9(5):1313–1318
Lee CW, Wu YL, Yu LC (2019) Combining mutual information and entropy for unknown word extraction from multilingual code-switching sentences. J Inf Sci Eng 35(3):597–610
Lee Y, Park S, Yu K, Kim J (2018) Building place-specific sentiment Lexicon. In: Proceedings of the 2nd international conference on digital signal processing (ICDSP 2018). Association for Computing Machinery, Tokyo, Japan, ACM, February 25–27, 2018, pp.147–150
Li M, Lu Q, Long Y, Gui L (2017) Inferring affective meanings of words from word embedding. IEEE Trans Affect Comput 8(4):443–456
Li W, Guo K, Shi Y, Zhu L, Zheng Y (2018) DWWP: Domain-specific new words detection and word propagation system for sentiment analysis in the tourism domain. Knowl-Based Syst 146(15):203–214
Li X, Wu B, Zhang B (2016) Unknown word detection in song poetry. In: IEEE International conference on data science in cyberspace (DSC), Changsha, China, June 13–16, 2016, pp.544–549
Lin CW, Shao Y, Zhang J, Yun U (2020) Enhanced sequence labeling based on latent variable conditional random fields. Neurocomputing 403:431–440
Lu K, Wu J (2019) Sentiment analysis of film review texts based on sentiment dictionary and SVM. In: Proceedings of the 2019 3rd international conference on innovation in artificial intelligence (ICIAI 2019), Suzhou, China, ACM, March 15, 2019, pp.73–77
Manek AS, Shenoy PD, Mohan MC (2017) Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. Word Wide Web 20(2):135–154
Pandey PP, Soni MN (2019) Sentiment analysis on customer feedback data: amazon product reviews. In: 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon), Faridabad, India, February, 2019, pp.320–322
Peng Q, Zhang Y, Zhang Y, Jason B, Christopher DM (2020) Stanza: a python natural language processing toolkit for many human languages. In: Proceedings of the 58th annual meeting of the association for computational linguistics: system demonstrations (ACL 2020), Online, July 5–10, 2020, pp.101–108
Pota M, Marulli F, Esposito M, Pietro GD, Fujita H (2019) Multilingual POS tagging by a composite deep architecture based on character-level features and on-the-fly enriched Word Embeddings. Knowl-Based Syst 164:309–323
Sarna G, Bhatia M P S. (2016) A probalistic approach to automatically extract new words from social media. In: Proceedings of the 2016 IEEE/ACM international conference on advances in social networks analysis and mining. San Francisco, CA, USA, August 18–21, 2016, pp.719–725
SinghSN, Sarraf T (2020) Sentiment analysis of a product based on user reviews using random forests algorithm. In: 2020 10th international conference on cloud computing, data science & engineering (Confluence), Noida, India, April 9, 2020, pp.112–116
Sun X, Ma S, Zhang Y, Ren X (2019) Towards easier and faster sequence labeling for natural language processing: a search-based probabilistic online learning framework (SAPO). Inf Sci 478:303–317
Sun X, Sun S, Yin M, Yang H (2020) Hybrid neural conditional random fields for multi-view sequence labeling. Knowledge-Based Syst 189:105151
Wang L, Li S, Yan Q, Zhou G (2018) Domain-specific named entity recognition with document-level optimization. ACM Trans Asian Low Resour Lang Inf Process 17(4):1–15
Wang W, Bao F, Gao G (2019) Learning morpheme representation for mongolian named entity recognition. Neural Process Lett 50(3):2647–2664
Wu F, Huang Y, Yuan Z (2017) Domain-specific sentiment classification via fusing sentiment knowledge from multiple sources. Inf Fusion 35:26–37
Wu C, Wu F, Liu J, Huang Y, Xie X. (2019) Sentiment lexicon enhanced neural sentiment classification. In: Proceedings of the 28th ACM international conference on information and knowledge management (CIKM 2019), Beijing, China, ACM, November 3, 2019, pp.1091–1100
Yan L, Bai B, Chen W, Wu D (2017) New word extraction from Chinese financial documents. IEEE Signal Process Lett 24(6):770–773
Zhang S, Wei Z, Wang Y (2018) Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary. Futur Gener Comput Syst 81:395–403
Zhang S, Hu Z, Zhu G, Jin M, Li K (2021) Sentiment classification model for Chinese micro-blog comments based on key sentences extraction. Soft Comput 25:463–476
Zhao W, Guan Z, Chen L, He X, Cai D, Wang B, Wang Q (2018) Weakly-supervised deep embedding for product review sentiment analysis. IEEE Trans Knowl Data Eng 30(1):185–197
Zhao M, Zhang T, Chai J. (2016) Based on SO-PMI algorithm to discriminate sentimental words' polarity in TV programs' subjective evaluation. In: 2016 9th International symposium on computational intelligence and design (ISCID), Hangzhou, China, May 12, 2016, pp.38–40
Zhou D, Zhang Z, Zhang M, He Y (2018) Weakly supervised POS tagging without disambiguation. ACM Trans Asian Low Resour Lang Inf Process 17(4):1–19
Zhu G, Pan Z, Wang Q, Zhang S, Li K (2020) Building multi-subtopic Bi-level network for micro-blog hot topic based on feature co-Occurrence and semantic community division. J Net Comput Appl 170:102815
Acknowledgments
This research work was supported in part by the National Natural Science Foundation of China (Grant No. 62076006), in part by the 2019 Anhui Provincial Natural Science Foundation Project (Grant No. 1908085MF189).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
All the authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
No humans or any individual participants are involved in this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, S., Xu, H., Zhu, G. et al. A data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews. Soft Comput 26, 853–866 (2022). https://doi.org/10.1007/s00500-021-06228-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-06228-9