Abstract
Social networks have become a standard means of communication that allows a massive amount of users to interact and consume information anywhere and anytime. In Thailand, millions of users have access to social networks, a majority of which include young children. The colloquial nature of social media inherently encourages certain expressions of language that do not conform to the standard, some of which may be considered abusive and offensive. Such ill-mannered language fashion has become increasingly used by a large number of Thai social media users. If these abusive languages are exposed to adolescents without proper guidance, they could compulsorily develop a familiar attitude towards such language styles. To address the issue, we present a set of algorithms based on machine learning, that automatically detect abusive Thai language in social networks. Our best results yield 86% f-measure (88.73% precision and 83.53% recall).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
References
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Aref, A., Tran, T.: Using ensemble of Bayesian classifying algorithms for medical systematic reviews. In: Sokolova, M., van Beek, P. (eds.) AI 2014. LNCS (LNAI), vol. 8436, pp. 263–268. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06483-3_23
Atsawintarangkun, P., Theeramunkong, T., Haruechaiyasak, C.: A statistical and rule-based method for chunking verbal units in thai texts. Thammasat Int. J. Sci. Technol. 17(2), 70–86 (2012)
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag New York, Inc., Secaucus (2006)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Burfoot, C., Baldwin, T.: Automatic satire detection: are you having a laugh? In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 161–164. Association for Computational Linguistics (2009)
Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: 2012 International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2012 International Conference on Social Computing (SocialCom), pp. 71–80. IEEE (2012)
Cohen, W.W.: Fast effective rule induction. In: Twelfth International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann (1995)
Dadvar, M., Trieschnigg, D., Ordelman, R., de Jong, F.: Improving cyberbullying detection with user context. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 693–696. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_62
Forman, G.: BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 263–270. ACM (2008)
Hall, M.A., Frank, E.: Combining Naive Bayes and decision tables. In: FLAIRS Conference, vol. 2118, pp. 318–319 (2008)
Haruechaiyasak, C., Kongthon, A.: Lextoplus: a Thai lexeme tokenization and normalization tool. In: WSSANLP-2013, p. 9 (2013)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)
Le Cessie, S., Van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. 41, 191–201 (1992)
Mitrpanont, J., Chongcharoen, P.: Th_wsd: Thai word sense disambiguation using cross-language knowledge sources approach. Int. J. Comput. Theory Eng. 7(6), 428 (2015)
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, San Francisco (2014)
Razavi, A.H., Inkpen, D., Uritsky, S., Matwin, S.: Offensive language detection using multi-level classification. In: Farzindar, A., Kešelj, V. (eds.) AI 2010. LNCS (LNAI), vol. 6085, pp. 16–27. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13059-5_5
Su, J., Zhang, H., Ling, C.X., Matwin, S.: Discriminative parameter learning for Bayesian networks. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1016–1023. ACM (2008)
Tuarob, S., Bhatia, S., Mitra, P., Giles, C.L.: Algorithmseer: a system for extracting and searching for algorithms in scholarly big data. IEEE Trans. Big Data 2(1), 3–17 (2016)
Tuarob, S., Mitra, P., Giles, C.L.: A hybrid approach to discover semantic hierarchical sections in scholarly documents. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1081–1085. IEEE (2015)
Tuarob, S., Tucker, C.S., Kumara, S., Giles, C.L., Pincus, A.L., Conroy, D.E., Ram, N.: How are you feeling? A personalized methodology for predicting mental states from temporally observable physical and behavioral information. J. Biomed. Inform. 68, 1–19 (2017)
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, pp. 19–26. Association for Computational Linguistics (2012)
Acknowledgment
This research project was partially supported by Faculty of Information and Communication Technology, Mahidol University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Tuarob, S., Mitrpanont, J.L. (2017). Automatic Discovery of Abusive Thai Language Usages in Social Networks. In: Choemprayong, S., Crestani, F., Cunningham, S. (eds) Digital Libraries: Data, Information, and Knowledge for Digital Lives. ICADL 2017. Lecture Notes in Computer Science(), vol 10647. Springer, Cham. https://doi.org/10.1007/978-3-319-70232-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-70232-2_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70231-5
Online ISBN: 978-3-319-70232-2
eBook Packages: Computer ScienceComputer Science (R0)