Automatic Discovery of Abusive Thai Language Usages in Social Networks

Tuarob, Suppawong; Mitrpanont, Jarernsri L.

doi:10.1007/978-3-319-70232-2_23

Suppawong Tuarob¹⁶ &
Jarernsri L. Mitrpanont¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10647))

Included in the following conference series:

International Conference on Asian Digital Libraries

1643 Accesses
12 Citations

Abstract

Social networks have become a standard means of communication that allows a massive amount of users to interact and consume information anywhere and anytime. In Thailand, millions of users have access to social networks, a majority of which include young children. The colloquial nature of social media inherently encourages certain expressions of language that do not conform to the standard, some of which may be considered abusive and offensive. Such ill-mannered language fashion has become increasingly used by a large number of Thai social media users. If these abusive languages are exposed to adolescents without proper guidance, they could compulsorily develop a familiar attitude towards such language styles. To address the issue, we present a set of algorithms based on machine learning, that automatically detect abusive Thai language in social networks. Our best results yield 86% f-measure (88.73% precision and 83.53% recall).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Detection of Cyberbullying in Social Media Using Machine Learning Techniques

Detecting sexism in social media: an empirical analysis of linguistic patterns and strategies

Article 03 September 2024

Addressing False Information and Abusive Language in Digital Space Using Intelligent Approaches

Notes

References

Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Aref, A., Tran, T.: Using ensemble of Bayesian classifying algorithms for medical systematic reviews. In: Sokolova, M., van Beek, P. (eds.) AI 2014. LNCS (LNAI), vol. 8436, pp. 263–268. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06483-3_23
Chapter Google Scholar
Atsawintarangkun, P., Theeramunkong, T., Haruechaiyasak, C.: A statistical and rule-based method for chunking verbal units in thai texts. Thammasat Int. J. Sci. Technol. 17(2), 70–86 (2012)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag New York, Inc., Secaucus (2006)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Burfoot, C., Baldwin, T.: Automatic satire detection: are you having a laugh? In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 161–164. Association for Computational Linguistics (2009)
Google Scholar
Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: 2012 International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2012 International Conference on Social Computing (SocialCom), pp. 71–80. IEEE (2012)
Google Scholar
Cohen, W.W.: Fast effective rule induction. In: Twelfth International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann (1995)
Google Scholar
Dadvar, M., Trieschnigg, D., Ordelman, R., de Jong, F.: Improving cyberbullying detection with user context. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 693–696. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_62
Chapter Google Scholar
Forman, G.: BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 263–270. ACM (2008)
Google Scholar
Hall, M.A., Frank, E.: Combining Naive Bayes and decision tables. In: FLAIRS Conference, vol. 2118, pp. 318–319 (2008)
Google Scholar
Haruechaiyasak, C., Kongthon, A.: Lextoplus: a Thai lexeme tokenization and normalization tool. In: WSSANLP-2013, p. 9 (2013)
Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)
Google Scholar
Le Cessie, S., Van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. 41, 191–201 (1992)
Article MATH Google Scholar
Mitrpanont, J., Chongcharoen, P.: Th_wsd: Thai word sense disambiguation using cross-language knowledge sources approach. Int. J. Comput. Theory Eng. 7(6), 428 (2015)
Article Google Scholar
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, San Francisco (2014)
Google Scholar
Razavi, A.H., Inkpen, D., Uritsky, S., Matwin, S.: Offensive language detection using multi-level classification. In: Farzindar, A., Kešelj, V. (eds.) AI 2010. LNCS (LNAI), vol. 6085, pp. 16–27. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13059-5_5
Chapter Google Scholar
Su, J., Zhang, H., Ling, C.X., Matwin, S.: Discriminative parameter learning for Bayesian networks. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1016–1023. ACM (2008)
Google Scholar
Tuarob, S., Bhatia, S., Mitra, P., Giles, C.L.: Algorithmseer: a system for extracting and searching for algorithms in scholarly big data. IEEE Trans. Big Data 2(1), 3–17 (2016)
Article Google Scholar
Tuarob, S., Mitra, P., Giles, C.L.: A hybrid approach to discover semantic hierarchical sections in scholarly documents. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1081–1085. IEEE (2015)
Google Scholar
Tuarob, S., Tucker, C.S., Kumara, S., Giles, C.L., Pincus, A.L., Conroy, D.E., Ram, N.: How are you feeling? A personalized methodology for predicting mental states from temporally observable physical and behavioral information. J. Biomed. Inform. 68, 1–19 (2017)
Article Google Scholar
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, pp. 19–26. Association for Computational Linguistics (2012)
Google Scholar

Download references

Acknowledgment

This research project was partially supported by Faculty of Information and Communication Technology, Mahidol University.

Author information

Authors and Affiliations

Faculty of Information and Communication Technology, Mahidol University, Salaya, Thailand
Suppawong Tuarob & Jarernsri L. Mitrpanont

Authors

Suppawong Tuarob
View author publications
You can also search for this author in PubMed Google Scholar
Jarernsri L. Mitrpanont
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suppawong Tuarob .

Editor information

Editors and Affiliations

Chulalongkorn University, Bangkok, Thailand
Songphan Choemprayong
University of Lugano, Lugano, Switzerland
Fabio Crestani
Waikato University, Hamilton, New Zealand
Sally Jo Cunningham

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tuarob, S., Mitrpanont, J.L. (2017). Automatic Discovery of Abusive Thai Language Usages in Social Networks. In: Choemprayong, S., Crestani, F., Cunningham, S. (eds) Digital Libraries: Data, Information, and Knowledge for Digital Lives. ICADL 2017. Lecture Notes in Computer Science(), vol 10647. Springer, Cham. https://doi.org/10.1007/978-3-319-70232-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-70232-2_23
Published: 03 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70231-5
Online ISBN: 978-3-319-70232-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics