Abstract
Vital to the task of mining sentiment from text is a sentiment lexicon, or a dictionary of terms annotated for their a priori information across the semantic dimension of sentiment. Each term has assigned a general, out-of-context sentiment polarity. Unfortunately, online dictionaries and similar lexical resources do not readily include information on the sentiment properties of their entries. Moreover, manually compiling sentiment lexicons is tedious in terms of annotator time and effort. This has resulted in the emergence of a large volume of research concentrated on automated sentiment lexicon generation algorithms. Most of these algorithms were designed for English, attributable to the abundance of readily available lexical resources in this language. This is not the case for low-resource languages such as the Malay language. Although there has been an exponential increase in research on Malay sentiment analysis over the past few years, the subtask of sentiment lexicon induction for this particular language remains under-investigated. We present a minimally-supervised sentiment lexicon induction model specifically designed for the Malay language. It takes as input only two initial paradigm positive and negative terms, and mines WordNet Bahasa’s synonym chains and Kamus Dewan’s gloss information to extract subjective, sentiment-laden terms. The model automatically bootstraps a reliable, high coverage sentiment lexicon that can be employed in Malay sentiment analysis on full-text. Intrinsic evaluation of the model against a manually annotated test set demonstrates that its ability to assign sentiment properties to terms is on par with human judgement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
The convention ‘term.pos.sense’ is used to define WordNet synsets here. For example, good.a.01 refers to the first sense of the adjective ‘good’, while bad.a.12 refers to the 12th sense of the adjective ‘bad’.
- 4.
References
Stone, P.J., Dunphy, D.C., Smith, M.S.: The General Inquirer: A Computer Approach to Content Analysis (1966)
Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, pp. 2200–2204 (2010)
Hatzivassiloglou, V., McKeown, K.R.: Predicting the semantic orientation of adjectives. In: Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, pp, 174–181. Association for Computational Linguistics (1997)
Hassan, A., Abu-Jbara, A., Jha, R., Radev, D.: Identifying the semantic orientation of foreign words. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers. vol. 2, pp. 592–597. Association for Computational Linguistics (2011)
Mihalcea, R., Banea, C., Wiebe, J.: Learning multilingual subjective language via cross-lingual projections. In: Annual Meeting-Association for Computational Linguistics. vol. 1, p. 976 (2007)
Wan, X.: Co-training for cross-lingual sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1, pp. 235–243. Association for Computational Linguistics (2009)
Mohammad, S.M., Salameh, M., Kiritchenko, S.: How translation alters sentiment. J. Artif. Intell. Res. (JAIR) 55, 95–130 (2016)
Tan, Y.-F., Lam, H.-S., Azlan, A., Soo, W.-K.: Sentiment analysis for telco popularity on Twitter big data using a novel Malaysian dictionary. In: ICADIWT, pp. 112–125 (2016)
Shamsudin, N.F., Basiron, H., Sa’aya, Z.: Lexical based sentiment analysis-verb, adverb and negation. J. Telecommun. Electron. Comput. Eng. (JTEC) 8(2), 161–166 (2016)
Sadanandan, A.A., Osman, N.A., Hussain Saifuddin, M.K., Ahamad, D.N.P., Hoe, H.: Improving accuracy in sentiment analysis for Malay language
Nasharuddin, N.A., Abdullah, M.T., Azman, A., Kadir, R.A.: English and Malay cross-lingual sentiment lexicon acquisition and analysis. In: Kim, K., Joukov, N. (eds.) ICISA 2017. LNEE, vol. 424, pp. 467–475. Springer, Singapore (2017). doi:10.1007/978-981-10-4154-9_54
Hijazi, M.H.A., Libin, L., Alfred, R., Coenen, F.: Bias aware lexicon-based sentiment analysis of Malay dialect on social media data: a study on the Sabah language. In: 2016 2nd International Conference on Science in Information Technology (ICSITech), pp. 356–361. IEEE (2016)
Alfred, R., Yee, W.W., Lim, Y., Obit, J.H.: Factors affecting sentiment prediction of Malay news headlines using machine learning approaches. In: Berry, M.W., Mohamed, A.H., Yap, B.W. (eds.) SCDS 2016. CCIS, vol. 652, pp. 289–299. Springer, Singapore (2016). doi:10.1007/978-981-10-2777-2_26
Puteh, M., Isa, N., Puteh, S., Redzuan, N.A.: Sentiment mining of Malay newspaper (SAMNews) using artificial immune system. In: Proceedings of the World Congress on Engineering (2013)
Isa, N., Puteh, M., Kamarudin, R.: Sentiment classification of Malay newspaper using immune network (SCIN). In: Proceedings of the World Congress on Engineering (2013)
Samsudin, N., Puteh, M., Hamdan, A.R., Nazri, M.Z.A.: Normalization of noisy texts in Malaysian online reviews. J. ICT 12, 147–159 (2013)
Arif, S.M., Mustapha, M.: The effect of noise elimination and stemming in sentiment analysis for Malay documents. In: Ahmad, A.-R., Kor, L.K., Ahmad, I., Idrus, Z. (eds.) Proceedings of the International Conference on Computing, Mathematics and Statistics (iCMS 2015), pp. 93–102. Springer, Singapore (2017). doi:10.1007/978-981-10-2772-7_10
Darwich, M., Noah, S.A.M., Omar, N.: Automatically generating a sentiment lexicon for the Malay language. Asia-Pacific J. Inf. Technol. Multimed. 5(1), 49–59 (2016)
Bond, F., Lim, L.T., Tang, E.K., Riza, H.: The combined wordnet bahasa. NUSA: Linguist. Stud. Lang. Around Indonesia 57, 83–100 (2014)
Perkamusan, D.: Kamus Dewan. Dewan Bahasa dan Pustaka, Kuala Lumpur (1984)
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: Opinionfinder: a system for subjectivity analysis. In: Proceedings of HLT/EMNLP on Interactive Demonstrations, pp. 34–35. Association for Computational Linguistics (2005)
Burt, R.S.: Models of network structure. Ann. Rev. Sociol. 6(1), 79–141 (1980)
Idris, A.A.: Modality in Malay (1980)
Kroeger, P.: External negation in Malay/Indonesian. Language 90(1), 137–184 (2014)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Andreevskaia, A., Bergler, S.: Mining WordNet for a fuzzy sentiment: sentiment tag extraction from WordNet glosses. In: EACL, pp. 209–216 (2006)
Esuli, A., Sebastiani, F.: Determining term subjectivity and term orientation for opinion mining. In: EACL, p. 2006 (2006)
Kim, S.-M., Hovy, E.: Determining the sentiment of opinions. In: Proceedings of the 20th international conference on Computational Linguistics, p. 1367. Association for Computational Linguistics (2004)
Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)
Acknowledgement
This research was partially supported by the Malaysia Ministry of Education Grant FRGS/1/2014/ICT02/UKM/01/1 awarded to the Center for Artificial Intelligence Technology at Universiti Kebangsaan Malaysia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Darwich, M., Noah, S.A.M., Omar, N. (2017). Minimally-Supervised Sentiment Lexicon Induction Model: A Case Study of Malay Sentiment Analysis. In: Phon-Amnuaisuk, S., Ang, SP., Lee, SY. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2017. Lecture Notes in Computer Science(), vol 10607. Springer, Cham. https://doi.org/10.1007/978-3-319-69456-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-69456-6_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69455-9
Online ISBN: 978-3-319-69456-6
eBook Packages: Computer ScienceComputer Science (R0)