As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
In this paper, we build a new dataset UIT-ViON (Vietnamese Online Newspaper) collected from well-known online newspapers in Vietnamese. We collect, process, and create the dataset, then experiment with different machine learning models. In particular, we propose an open-domain, large-scale, and high-quality dataset consisting of 260,000 textual data points annotated with multiple labels for evaluating Vietnamese short text classification. In addition, we present the proposed approach using transformer-based learning (PhoBERT) for Vietnamese short text classification on the dataset, which outperforms traditional machine learning (Naive Bayes and Logistic Regression) and deep learning (Text-CNN and LSTM). As a result, the proposed approach achieves the F1-score of 80.62%. This is a positive result and a premise for developing an automatic news classification system. The study is proposed to significantly save time, costs, and human resources and make it easier for readers to find news related to their interesting topics. In future, we will propose solutions to improve the quality of the dataset and improve the performance of classification models.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.