As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
With the rapid development of e-commerce and online review platforms, the number of reviews of product has been multiplied, which makes it significant to mine valuable information from them for both businesses and consumers. Usually text classification methods are the main approaches to deal with this kind of problems. There are several steps in the process of text classification, and many different choices of methods or components can be selected in each step, so there are many possible combinations of schemas. However, there was lack of comparison of those different combinations in the past. In this paper, different combinations of components of text classification are constructed and evaluated. In the feature selection and weighting step, mutual information, information gain, chi-square test and TF-IDF methods are used as the alternatives. In the text classification step, four frequently used machine learning methods are selected as the components. The experiments are conducted on an annotated Chinese car reviews corpus. Results show that the combination of using chi-square test and Support Vector Machine algorithm obtain the best performance. The relationship between the performance and the number of the features is also studied, and empirical size of the corpus in this kind of task is given.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.