Abstract:
In modern society, some famous news websites such as Sina and Times server to provide information every day for millions of users. But with the continuous development of ...Show MoreMetadata
Abstract:
In modern society, some famous news websites such as Sina and Times server to provide information every day for millions of users. But with the continuous development of information technology, the amount of disorder data is increasing. How to organize the text and make automatically text classification has become a challenge. The traditional manual classification of news text not only consumes a lot of human and financial resources, but also hardly achieved classification task quickly. In this paper, the paper mainly makes a research about the news text classification. It proposes a news text classification model based on Latent Dirichlet Allocation (LDA). Due to the dimension of the news texts is too high, this model uses topic model to make text dimension reduced and get features. At the same time, the paper also makes a research on Softmax regression algorithm to solve multi-class of text problems in our life and make it as model's classifier. The paper evaluates proposed model on a real news dataset and the result of the experiment shows the improved model performs relatively well. The model can effectively reduce the features dimension of the news text and get good classification results.
Published in: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS)
Date of Conference: 26-29 June 2016
Date Added to IEEE Xplore: 25 August 2016
ISBN Information: