Reference Hub3
Integrating Feature and Instance Selection Techniques in Opinion Mining

Integrating Feature and Instance Selection Techniques in Opinion Mining

Zi-Hung You, Ya-Han Hu, Chih-Fong Tsai, Yen-Ming Kuo
Copyright: © 2020 |Volume: 16 |Issue: 3 |Pages: 15
ISSN: 1548-3924|EISSN: 1548-3932|EISBN13: 9781799804994|DOI: 10.4018/IJDWM.2020070109
Cite Article Cite Article

MLA

You, Zi-Hung, et al. "Integrating Feature and Instance Selection Techniques in Opinion Mining." IJDWM vol.16, no.3 2020: pp.168-182. http://doi.org/10.4018/IJDWM.2020070109

APA

You, Z., Hu, Y., Tsai, C., & Kuo, Y. (2020). Integrating Feature and Instance Selection Techniques in Opinion Mining. International Journal of Data Warehousing and Mining (IJDWM), 16(3), 168-182. http://doi.org/10.4018/IJDWM.2020070109

Chicago

You, Zi-Hung, et al. "Integrating Feature and Instance Selection Techniques in Opinion Mining," International Journal of Data Warehousing and Mining (IJDWM) 16, no.3: 168-182. http://doi.org/10.4018/IJDWM.2020070109

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Opinion mining focuses on extracting polarity information from texts. For textual term representation, different feature selection methods, e.g. term frequency (TF) or term frequency–inverse document frequency (TF–IDF), can yield diverse numbers of text features. In text classification, however, a selected training set may contain noisy documents (or outliers), which can degrade the classification performance. To solve this problem, instance selection can be adopted to filter out unrepresentative training documents. Therefore, this article investigates the opinion mining performance associated with feature and instance selection steps simultaneously. Two combination processes based on performing feature selection and instance selection in different orders, were compared. Specifically, two feature selection methods, namely TF and TF–IDF, and two instance selection methods, namely DROP3 and IB3, were employed for comparison. The experimental results by using three Twitter datasets to develop sentiment classifiers showed that TF–IDF followed by DROP3 performs the best.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.