Skip to main content
Log in

Semantic lexicons of English nouns for classification

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Sentiment classification is studied for a long time and there are many applications and many researches to service communities, commerce, politics, etc. In this research, we propose a new model to calculate the emotional values (or semantic scores) of English terms (English verbs, English nouns, English adjectives, English adverbs, etc.) as: first of all, we build our basis English sentiment dictionary (called bESD) by using Tanimoto Coefficient (Tanimoto measure, called TC) through Google search engine with AND operator and OR operator and then, we create many English noun phrases based on the English grammars (the English characteristics) and the valences of the English noun phrases are identified by their specific contexts. The English noun phrases often bring the semantics which the values (or emotional scores) are not fixed and are changed when they appear in their different contexts. Therefore, the results of the sentiment classification are not high accuracy if the English noun phrases bring the emotions and their semantic values (or their sentimental scores) are not changed in any context. For those reasons, we propose many rules based on English language grammars to calculate the sentimental values of the English noun phrases bearing emotion in their specific contexts. The results of this work are widely used in applications and researches of the English semantic classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Agarwal B, Mittal N (2016) Machine learning approach for sentiment analysis. Prominent feature extraction for sentiment analysis, pp 21–45. Print ISBN 978-3-319-25341-1. doi:10.1007/978-3-319-25343-5_3

  • Agarwal B, Mittal N (2016) Semantic orientation-based approach for sentiment analysis. Prominent feature extraction for sentiment analysis, pp 77–88. Print ISBN 978-3-319-25341-1. doi: 10.1007/978-3-319-25343-5_6

  • Ahmed S, Danti A (2016) Effective sentimental analysis and opinion mining of web reviews using rule based classifiers. In: Computational intelligence in data mining, India, volume 1, pp 171–179. Print ISBN 978-81-322-2732-8. doi:10.1007/978-81-322-2734-2_18

  • An NTT, Hagiwara M (2014) Adjective-based estimation of short sentence’s impression. In: International Conference On Kansei Engineering And Emotion Research, KEER2014, LINKÖPING

  • Andreevskaia A, Bergler S (2006) Mining WordNet for fuzzy sentiment: sentiment tag extraction from WordNet glosses. In: 11th Conference of the European Chapter of the Association for Computational Linguistics, Italy, pp 209–216

  • Awekar A, Samatova N (2009) Fast matching for all pairs similarity search. In: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT’09), vol 01, USA, pp 295–300

  • Bello-Orgaz G, Menéndez H, Okazaki S, Camacho D (2014) Combining social-based data mining techniques to extract collective trends from twitter. Malay J Comput Sci 27(2):95–111

    Google Scholar 

  • Bickerstaffe A, Zukerman I (2010) A hierarchical classifier applied to multi-way sentiment detection. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10), USA, pp 62–70

  • Brooke J, Tofiloski M, Taboada M (2009) Cross-linguistic sentiment analysis: from English to Spanish. In: Proceedings of International Conference Recent Advances in Natural Language Processing’2009, Bulgaria

  • Cambridge English Dictionary (2016) http://dictionary.cambridge.org/

  • Canuto S, Gonçalves MA, Benevenuto F (2016) Exploiting new sentiment-based meta-level features for effective sentiment analysis. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM’16), New York, USA, pp 53–62

  • Chen LS, Chiu HJ (2009) Developing a Neural Network based Index for Sentiment Classification. In: Proceedings of the International MultiConference of Engineers and Computer Scientists, Hong Kong

  • Choi Y, Cardie C (2008) Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, pp 793–801

  • Cimiano P, Wenderoth J (2007) Automatic acquisition of ranked Qualia structures from the web. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp 888–895

  • Collins English Dictionary (2016) http://www.collinsdictionary.com/dictionary/english

  • Constante P, Gordon A, Chang O, Pruna E, Acuna F, Escobar I (2016) Artificial vision techniques to optimize strawberry’s industrial classification. IEEE Latina Am Trans 14(6):2576–2581

    Article  Google Scholar 

  • Deshpande R, Vaze K, Rathod S, Jarhad T (2014) Comparative study of document similarity algorithms and clustering algorithms for sentiment analysis. Int J Emerg Trends Technol Comput Sci (IJETTCS) 3(5):196–199

    Google Scholar 

  • Efron M (2004) Cultural orientation: classifying subjective documents by Cociation sic analysis. In: Proceedings of the AAAI Fall Symposium on Style and Meaning in Language, Art, Music, and Design, pp 41–48

  • English Dictionary of Lingoes (2016) http://www.lingoes.net/

  • English Grammar of British Council (2016) https://learnenglish.britishcouncil.org/en/english-grammar

  • English Grammar of Cambridge (2016) http://www.cambridge.org/us/cambridgeenglish/

  • English Grammar of Oxford (2016) http://www.oxfordonlineenglish.com/free-english-grammar-lessons

  • English Grammar of Wikipedia (2016) https://en.wikipedia.org/wiki/English_grammar

  • Feng S, Zhang L, Li B, Wang D, Yu G, Wong KF (2013) Is twitter a better corpus for measuring sentiment similarity? In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, pp 897–902

  • Fligner MA, Verducci JS, Blower PE (2002) A modification of the Jaccard–Tanimoto similarity index for diverse selection of chemical compounds using binary strings. Technometrics 44(2):110–119

    Article  MathSciNet  Google Scholar 

  • Garavaglia SB (2001) Statistical analysis of the Tanimoto coefficient self-organizing map (TCSOM) applied to health behavioral survey data. In: International Joint Conference on Neural Networks, 2001 (IJCNN’01), vol 4, pp 2483–2488

  • Godbole N, Srinivasaiah M, Skiena S (2007) Large-scale sentiment analysis for news and blogs. In: ICWSM’2007 Boulder, Colorado, USA

  • Jadhao A, Agrawal AJ (2016) Text categorization using Jaccard Coefficient for Text Messages. Int J Sci Res (IJSR) 5(5):2046–2050

    Article  Google Scholar 

  • Kristensen TG, Pedersen CNS (2010) Data structures for accelerating Tanimoto queries on real valued vectors. Algorithms in Bioinformatics, Volume 6293 of the series Lecture Notes in Computer Science. pp 28–39

  • Kryszkiewicz M (2013) On cosine and Tanimoto near duplicates search among vectors with domains consisting of zero, a positive number and a negative number. Flexible query answering systems, volume 8132 of the series Lecture Notes in Computer Science, pp 531–542

  • Kryszkiewicz M (2014) Using non-zero dimensions and lengths of vectors for the Tanimoto similarity search among real valued vectors. intelligent information and database systems, volume 8397 of the series Lecture Notes in Computer Science, pp 173–182

  • Kryszkiewicz M, Podsiadly P (2014) Efficient search of cosine and Tanimoto near duplicates among vectors with domains consisting of zero, a positive number and a negative number. modern advances in applied intelligence, volume 8482 of the series Lecture Notes in Computer Science, pp 160–170

  • Kumar A, Singh R, Mohaar GS (2010) Computational approach to investigate similarity in natural products using Tanimoto coefficient and Euclidean distance. IUP J Inf Technol 6(1):16–23

    Google Scholar 

  • Kundi FM, Khan A, Asghar MZ, Ahamd S (2015) Context-aware spelling corrector for sentiment analysis. MAGNT Res Rep 2(6):1–11

    Google Scholar 

  • Longman English Dictionary (2016) http://www.ldoceonline.com/

  • Lu G, Huang P, He L, Cu C, Li X (2010) A new semantic similarity measuring method based on web search engines. J WSEAS Trans Comput 9(1):1–10

    Google Scholar 

  • MacMillan English Dictionary (2016) http://www.macmillandictionary.com/

  • Manek AS, Shenoy PD, Mohan MC, Venugopal KR (2016) Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World Wide Web, USA. Print ISSN1386-145X. doi:10.1007/s11280-015-0381-x, pp 1–20

  • Mao H, Gao P, Wang Y, Bollen J (2014) Automatic construction of financial semantic orientation lexicon from large-scale Chinese News Corpus. In: The 7th Financial Risks International Forum

  • Matthias Klusch and Patrick Kapahnke (2009) An adaptive hybrid semantic service matchmaker for OWL-S. Workshop on Semantic Matchmaking

  • Molinero MA, Sagot B, Nicolas L (2009) A morphological and syntactic wide-coverage Lexicon for Spanish: the Leffe. In: Proceedings of International Conference Recent Advances in Natural Language Processing’2009, Bulgaria

  • Nadaf M, Lahane S, Deshpande A, Tirth S (2015) Using business intelligence for mining online reviews for predicting sales performance. Int J Eng Comput Sci 4(5):11718–11717 (ISSN:2319-7242)

  • Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing. In: K-CAP ‘03 Proceedings of the 2nd international conference on Knowledge capture, New York, USA, pp 70–77

  • Ngoc PV, Ngoc CVT, Ngoc TVT, Duy DN (2017) A C4.5 algorithm for english emotional classification. Int J Evol Syst. doi:10.1007/s12530-017-9180-1

  • Oxford English Dictionary (2016) http://www.oxforddictionaries.com/

  • Phu VN, Tuoi PT (2014) Sentiment classification using enhanced contextual valence shifters. In: International Conference on Asian Language Processing (IALP), pp 224–229

  • Phu VN, Dat ND, Tran VTN, Chau VTN, Nguyen TA (2016) Fuzzy C-means for english sentiment classification in a distributed system. Int J Appl Intell (APIN). doi:10.1007/s10489-016-0858-z

    Google Scholar 

  • Phu VN, Chau VTN, Tran VTN, Dat ND (2017a) A Vietnamese adjective emotion dictionary based on exploitation of Vietnamese language characteristics. Int J Artif Intell Rev (AIR). doi:10.1007/s10462-017-9538-6

    Google Scholar 

  • Phu VN, Chau VTN, Tran VTN, Dat ND, Nguyen TA (2017b) STING algorithm used english sentiment classification in a parallel environment. Int J Patt Recognit Artif Intell. doi:10.1142/S0218001417500215

    Google Scholar 

  • Phu VN, Chau VTN, Dat ND, Tran VTN, Nguyen TA (2017c) A valences-totaling model for English sentiment classification. Knowl Inf Syst. doi:10.1007/s10115-017-1054-0

    Google Scholar 

  • Phu VN, Chau VTN, Tran VTN (2017d) Shifting semantic values of English phrases for classification. Int J Speech Technol (IJST). doi:10.1007/s10772-017-9420-6

    Google Scholar 

  • Phu VN, Chau VTN, Tran VTN (2017e) SVM for English semantic classification in parallel environment. Int J Speech Technol (IJST). doi:10.1007/s10772-017-9421-5

    Google Scholar 

  • Phu VN, Chau VTN, Tran VTN, Dat ND, Duy KLD (2017f) A valence-totaling model for Vietnamese sentiment classification. Int J Evol Syst (EVOS). doi:10.1007/s12530-017-9187-7

    Google Scholar 

  • Poria S, Peng H, Hussain A, Howard N, Cambria E (2017) Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing. doi:10.1016/j.neucom.2016.09.117

    Google Scholar 

  • Qian T, Van Durme B, Schubert L (2009) Building a semantic lexicon of English nouns via bootstrapping. In: Proceedings of the NAACL HLT Student Research Workshop and Doctoral Consortium, Boulder, Colorado, pp 37–42

  • Qiu G, Liu B, Bu J, Chen C (2009) Expanding domain sentiment lexicon through double propagation. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA

  • Remus R, Quasthoff U, Heyer G (2010) SentiWS—a publicly available German-language resource for sentiment analysis. In: Proceedings of the 7th International Language Ressources and Evaluation (LREC’10), pp 1168–1171

  • Rothfels J, Tibshirani J (2010) Unsupervised sentiment classification of english movie reviews using automatic selection of positive and negative sentiment items. CS224N-Final Project

  • Rubio JJ (2016) A method with neural networks for the classification of fruits and vegetables. Soft Comput. doi:10.1007/s00500-016-2263-2

    Google Scholar 

  • Rubio JJ, Ortiz F, Mariaca CR, Tovar JC (2013) A method for online pattern recognition for abnormal eye movements. Neural Comput Appl 22(3–4):597–605

    Article  Google Scholar 

  • Song J, He Y, Fu G (2015) Polarity classification of short product reviews via multiple cluster-based SVM classifiers. In: 29th Pacific Asia Conference on Language, Information and Computation: Posters, Shanghai, China, pp 267–274

  • Steinberger J, Ebrahim M, Ehrmann M, Hurriyetoglu A, Kabadjov M, Lenkova P, Steinberger R, Tanev H, Vázquez S, Zavarella V (2012) Creating sentiment dictionaries via triangulation. Decis Support Syst 53(4):689–694

    Article  Google Scholar 

  • Taboada M, Anthony C, Voll K (2006) Methods for creating semantic orientation dictionaries. In: Proceedings of Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, pp 427–432

  • Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307

    Article  Google Scholar 

  • Tan S, Wang Y, Cheng X (2008) Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples. In: SIGIR’08 Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, New York, USA, pp 743–744

  • Tran VTN, Phu VN, Tuoi PT (2014) Learning more Chi square feature selection to improve the fastest and most accurate sentiment classification. In: The Third Asian Conference on Information Systems, ACIS 2014

  • Turney P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of 40th ACL, pp 417–424

  • Turney PD, Littman ML (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst (TOIS) 21(4):315–346

    Article  Google Scholar 

  • Voll K, Taboada M (2007) Not all words are created equal: extracting semantic orientation as a function of adjective relevance. In: Proceedings of the 20th Australian Joint Conference on Artificial Intelligence, Gold Coast, Australia, pp 337–346

  • Wang G, Araki K (2007) Modifying SO-PMI for Japanese Weblog opinion mining by using a balancing factor and detecting neutral expressions. In: Proceedings of NAACL HLT 2007, Companion Volume, pp 189–192

  • Yuen RWM, Chan TYW, Lai TBY, Kwong OY, T’sou BKY (2004) Morpheme-based derivation of bipolar semantic orientation of Chinese words. In: Proceedings of the 20th international conference on Computational Linguistics, Stroudsburg, PA, USA

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vo Ngoc Phu.

Appendices

Appendix

See Tables 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

Table 6 Comparisons of our results with the surveys in Efron (2004), Yuen et al. (2004), Chen and Chiu (2009), Wang and Araki (2007), Taboada et al. (2006), Cimiano and Wenderoth (2007), Lu et al. (2010), Voll and Taboada (2007), Kundi et al. (2015), Mao et al. (2014), Turney and Littman (2003), Turney (2002), Rothfels and Tibshirani (2010), Nadaf et al. (2015), Feng et al. (2013), An and Hagiwara (2014) and Song et al. (2015)
Table 7 Comparisons of our model’s advantages and disadvantages with the surveys in Efron (2004), Yuen et al. (2004), Chen and Chiu (2009), Wang and Araki (2007), Taboada et al. (2006), Cimiano and Wenderoth (2007), Lu et al. (2010), Voll and Taboada (2007), Kundi et al. (2015), Mao et al. (2014), Turney and Littman (2003), Turney (2002), Rothfels and Tibshirani (2010), Nadaf et al. (2015), Feng et al. (2013), An and Hagiwara (2014) and Song et al. (2015)
Table 8 Types of pre-determiners
Table 9 Comparisons of our model’s results with the works related to the Tanimoto Coefficient (Tanimoto measure, called TC) in Kryszkiewicz (2014), Kryszkiewicz (2013), Kryszkiewicz and Podsiadly (2014), Awekar and Samatova (2009), Fligner et al. (2002), Kumar et al. (2010), Garavaglia (2001) and Kristensen and Pedersen (2010)
Table 10 Comparisons of our model’s advantage and disadvantages with the surveys related to the Tanimoto Coefficient (Tanimoto measure, called TC) in Kryszkiewicz (2014), Kryszkiewicz (2013), Kryszkiewicz and Podsiadly (2014), Awekar and Samatova (2009), Fligner et al. (2002), Kumar et al. (2010), Garavaglia (2001) and Kristensen and Pedersen (2010)
Table 11 Comparisons of our model’s results with the studies related to the sentiment classification, the semantic dictionary (or sentiment dictionary, or emotion lexicons, or semantic lexicons, or sentiment lexicons, etc.) as Manek et al. (2016), Agarwal and Mittal (2016a, b), Canuto et al. (2016), Ahmed and Danti (2016), Phu and Tuoi (2014), Tran et al. (2014), Godbole et al. (2007), Taboada et al. (2011), Nasukawa and Yi (2003), Choi and Cardie (2008), Qiu et al. (2009), Tan et al. (2008), Molinero et al. (2009), Andreevskaia and Bergler (2006), Steinberger et al. (2012), Brooke et al. (2009) and Remus et al. (2010)
Table 12 Comparisons of our model’s benefits and drawbacks with the surveys related to the sentiment classification, the semantic dictionary (or sentiment dictionary, or emotion lexicons, or semantic lexicons, or sentiment lexicons, etc.) as Manek et al. (2016), Agarwal and Mittal (2016a, b), Canuto et al. (2016), Ahmed and Danti (2016), Phu and Tuoi (2014), Tran et al. (2014), Godbole et al. (2007), Taboada et al. (2011), Nasukawa and Yi (2003), Choi and Cardie (2008), Qiu et al. (2009), Tan et al. (2008), Molinero et al. (2009), Andreevskaia and Bergler (2006), Steinberger et al. (2012), Brooke et al. (2009) and Remus et al. (2010)
Table 13 Comparisons of our model’s results with the researches related to the Tanimoto coefficient (TC) in emotional classification or in semantic classification in Bickerstaffe and Zukerman (2010), Deshpande et al. (2014), Gema Bello-Orgaz et al. (2014), Jadhao and Agrawal (2016) and Matthias Klusch and Patrick Kapahnke (2009)
Table 14 Comparisons of our model’s positives and negatives with the researches related to the Tanimoto coefficient (TC) in emotional classification or in semantic classification in Bickerstaffe and Zukerman (2010), Deshpande et al. (2014), Gema Bello-Orgaz et al. (2014), Jadhao and Agrawal (2016) and Matthias Klusch and Patrick Kapahnke (2009)

Appendices of all codes

figure a
figure b
figure c
figure d
figure e
figure f
figure g
figure h
figure i
figure j
figure k
figure l
figure m
figure n
figure o
figure p
figure q
figure r
figure s
figure t
figure u
figure v
figure w
figure x
figure y
figure z
figure aa
figure ab
figure ac
figure ad
figure ae
figure af
figure ag
figure ah
figure ai
figure aj
figure ak
figure al
figure am
figure an
figure ao
figure ap
figure aq
figure ar
figure as
figure at
figure au
figure av
figure aw
figure ax
figure ay
figure az
figure ba
figure bb
figure bc
figure bd
figure be
figure bf
figure bg
figure bh
figure bi
figure bj
figure bk
figure bl
figure bm
figure bn

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Phu, V.N., Tran, V.T.N., Chau, V.T.N. et al. Semantic lexicons of English nouns for classification. Evolving Systems 10, 501–565 (2019). https://doi.org/10.1007/s12530-017-9188-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-017-9188-6

Keywords

Navigation