Skip to main content

Automatic Discovery of Abusive Thai Language Usages in Social Networks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10647))

Abstract

Social networks have become a standard means of communication that allows a massive amount of users to interact and consume information anywhere and anytime. In Thailand, millions of users have access to social networks, a majority of which include young children. The colloquial nature of social media inherently encourages certain expressions of language that do not conform to the standard, some of which may be considered abusive and offensive. Such ill-mannered language fashion has become increasingly used by a large number of Thai social media users. If these abusive languages are exposed to adolescents without proper guidance, they could compulsorily develop a familiar attitude towards such language styles. To address the issue, we present a set of algorithms based on machine learning, that automatically detect abusive Thai language in social networks. Our best results yield 86% f-measure (88.73% precision and 83.53% recall).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.statista.com/statistics/490467/number-of-thailand-facebook-users/.

  2. 2.

    http://www.csie.ntu.edu.tw/cjlin/libsvm/.

  3. 3.

    http://www.cs.waikato.ac.nz/ml/weka/.

  4. 4.

    https://www.facebook.com/.

  5. 5.

    https://developers.facebook.com/docs/graph-api.

  6. 6.

    https://www.facebook.com/ejeab/.

  7. 7.

    https://www.facebook.com/nongngneverdie/.

  8. 8.

    https://www.facebook.com/sudlokomteen/.

  9. 9.

    https://github.com/wittawatj/ctwt.

  10. 10.

    http://thailang.nectec.or.th/best/?q=node/21.

  11. 11.

    http://www.sansarn.com/lexto/.

  12. 12.

    https://docs.oracle.com/javase/7/docs/api/java/text/BreakIterator.html.

References

  1. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  2. Aref, A., Tran, T.: Using ensemble of Bayesian classifying algorithms for medical systematic reviews. In: Sokolova, M., van Beek, P. (eds.) AI 2014. LNCS (LNAI), vol. 8436, pp. 263–268. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06483-3_23

    Chapter  Google Scholar 

  3. Atsawintarangkun, P., Theeramunkong, T., Haruechaiyasak, C.: A statistical and rule-based method for chunking verbal units in thai texts. Thammasat Int. J. Sci. Technol. 17(2), 70–86 (2012)

    Google Scholar 

  4. Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag New York, Inc., Secaucus (2006)

    MATH  Google Scholar 

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  6. Burfoot, C., Baldwin, T.: Automatic satire detection: are you having a laugh? In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 161–164. Association for Computational Linguistics (2009)

    Google Scholar 

  7. Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: 2012 International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2012 International Conference on Social Computing (SocialCom), pp. 71–80. IEEE (2012)

    Google Scholar 

  8. Cohen, W.W.: Fast effective rule induction. In: Twelfth International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann (1995)

    Google Scholar 

  9. Dadvar, M., Trieschnigg, D., Ordelman, R., de Jong, F.: Improving cyberbullying detection with user context. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 693–696. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_62

    Chapter  Google Scholar 

  10. Forman, G.: BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 263–270. ACM (2008)

    Google Scholar 

  11. Hall, M.A., Frank, E.: Combining Naive Bayes and decision tables. In: FLAIRS Conference, vol. 2118, pp. 318–319 (2008)

    Google Scholar 

  12. Haruechaiyasak, C., Kongthon, A.: Lextoplus: a Thai lexeme tokenization and normalization tool. In: WSSANLP-2013, p. 9 (2013)

    Google Scholar 

  13. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)

    Google Scholar 

  14. Le Cessie, S., Van Houwelingen, J.C.: Ridge estimators in logistic regression. Appl. Stat. 41, 191–201 (1992)

    Article  MATH  Google Scholar 

  15. Mitrpanont, J., Chongcharoen, P.: Th_wsd: Thai word sense disambiguation using cross-language knowledge sources approach. Int. J. Comput. Theory Eng. 7(6), 428 (2015)

    Article  Google Scholar 

  16. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)

    Google Scholar 

  17. Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, San Francisco (2014)

    Google Scholar 

  18. Razavi, A.H., Inkpen, D., Uritsky, S., Matwin, S.: Offensive language detection using multi-level classification. In: Farzindar, A., Kešelj, V. (eds.) AI 2010. LNCS (LNAI), vol. 6085, pp. 16–27. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13059-5_5

    Chapter  Google Scholar 

  19. Su, J., Zhang, H., Ling, C.X., Matwin, S.: Discriminative parameter learning for Bayesian networks. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1016–1023. ACM (2008)

    Google Scholar 

  20. Tuarob, S., Bhatia, S., Mitra, P., Giles, C.L.: Algorithmseer: a system for extracting and searching for algorithms in scholarly big data. IEEE Trans. Big Data 2(1), 3–17 (2016)

    Article  Google Scholar 

  21. Tuarob, S., Mitra, P., Giles, C.L.: A hybrid approach to discover semantic hierarchical sections in scholarly documents. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1081–1085. IEEE (2015)

    Google Scholar 

  22. Tuarob, S., Tucker, C.S., Kumara, S., Giles, C.L., Pincus, A.L., Conroy, D.E., Ram, N.: How are you feeling? A personalized methodology for predicting mental states from temporally observable physical and behavioral information. J. Biomed. Inform. 68, 1–19 (2017)

    Article  Google Scholar 

  23. Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, pp. 19–26. Association for Computational Linguistics (2012)

    Google Scholar 

Download references

Acknowledgment

This research project was partially supported by Faculty of Information and Communication Technology, Mahidol University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suppawong Tuarob .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tuarob, S., Mitrpanont, J.L. (2017). Automatic Discovery of Abusive Thai Language Usages in Social Networks. In: Choemprayong, S., Crestani, F., Cunningham, S. (eds) Digital Libraries: Data, Information, and Knowledge for Digital Lives. ICADL 2017. Lecture Notes in Computer Science(), vol 10647. Springer, Cham. https://doi.org/10.1007/978-3-319-70232-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70232-2_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70231-5

  • Online ISBN: 978-3-319-70232-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics