research-article

Improving the Detection of Multilingual South African Abusive Language via Skip-gram Using Joint Multilevel Domain Adaptation

Authors:

Oluwafemi Oriola,

Eduan KotzéAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing, Volume 23, Issue 2

Article No.: 25, Pages 1 - 28

https://doi.org/10.1145/3638759

Published: 08 February 2024 Publication History

Get Access

Abstract

The distinctiveness and sparsity of low-resource multilingual South African abusive language necessitate the development of a novel solution to automatically detect different classes of abusive language instances using machine learning. Skip-gram has been used to address sparsity in machine learning classification problems but is inadequate in detecting South African abusive language due to the considerable amount of rare features and class imbalance. Joint Domain Adaptation has been used to enlarge features of a low-resource target domain for improved classification outcomes by jointly learning from the target domain and large-resource source domain. This article, therefore, builds a Skip-gram model based on Joint Domain Adaptation to improve the detection of multilingual South African abusive language. Contrary to the existing Joint Domain Adaptation approaches, a Joint Multilevel Domain Adaptation model involving adaptation of monolingual source domain instances and multilingual target domain instances with high frequency of rare features was executed at the first level and adaptation of target-domain features and first-level features at the next level. Both surface-level and embedding word features were used to evaluate the proposed model. In the evaluation of surface-level features, the Joint Multilevel Domain Adaptation model outperformed the state-of-the-art models with accuracy of 0.92 and F1-score of 0.68. In the evaluation of embedding features, the proposed model outperformed the state-of-the-art models with accuracy of 0.88 and F1-score of 0.64. The Joint Multilevel Domain Adaptation model significantly improved the average information gain of the rare features in different language categories and reduced class imbalance.

References

[1]

OECD, Artificial Intelligence, Machine Learning and Big Data in Finance: Opportunities, Challenges, and Implications for Policy Makers. 2021. Retrieved from https://www.oecd.org/finance/artificial-intelligence-machine-learningbig-data-in-finance.htm.%0A

Abstract

References

Cited By

Index Terms

Recommendations

Multilingual stochastic n-gram class language models

A spell checker and corrector for the native South African language, South Sotho

Multi class-based n-gram language model for new words using web data

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Full Text

Share

Share this Publication link

Share on social media

Affiliations