ABSTRACT
Support vector machines (SVM) are among the best text classifiers so far. Meantimes, ensembles of classifiers are proven to be effective on many domains. It is expected that ensembles of SVM classifiers could achieve better performance. In this paper two types of ensembles on SVM classifiers, the data partitioning ensembles and heterogeneous ensembles, have been proposed and experimentally evaluated on three well-accepted collections. Major conclusions are that disjunct partitioning ensembles with stacking could achieve the best performance, and that the parameter varying ensembles are proven to be effective, meanwhile have the advantage of being deterministic.
- Ali, K. M., and Pazzani, M. J. Error reduction through learning multiple descriptions. Machine Learning, 24(3):173--202, 1996.]] Google ScholarDigital Library
- Brank, J., Grobelnik, M., Milic-Frayling, N., and Mladenic, D. Interaction of feature selection methods and linear classification models. In Proceedings of the ICML-02 Workshop on Text Learning, Sydney, AU, 2002.]]Google Scholar
- Breiman, L. Bagging predictors. Machine Learning 24(2):123--140, 1996.]] Google ScholarCross Ref
- Chan, P., and Stolfo, S. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proc. 4th intl. Conf. on knowledge discovery and data mining, AAAI Press, Menlo Park, 164--168, 1998.]]Google Scholar
- Chang, Y. I. Boosting SVM classifiers with logistic regression. See www.stat.sinica.edu.tw/library/c_tec_rep/2003-03.pdf, 2003.]]Google Scholar
- Chawla, N. V., Lazarevic, A., Hall, L. O., and Bowyer, K. W. SMOTEBoost: Improving prediction of the minority class in boosting. In 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Springer, Berlin, New York, 107--119, 2003.]]Google ScholarCross Ref
- Dietterich, T. G. Ensemble methods in machine learning. In J. Kittler and F. Roli (Ed.), 1th Intl. Workshop on Multiple Classifier Systems, 1-15, Springer Verlag, New York, NY, USA, 2000.]] Google ScholarDigital Library
- Dumais, S., Platt, J., Heckerman, D., and Sahami, M. Inductive learning algorithms and representations for text categorization. In Proc. 7th Intl. Conf. on Information and knowledge management, ACM Press New York, NY, USA, 148--155, 1998.]] Google ScholarDigital Library
- Freund, Y., and Schapire, R. E. Experiments with a new boosting algorithm. In Proc. 13th ICML, Morgan Kaufmann, San Francisco, CA, USA, 325--332, 1996.]] Google ScholarDigital Library
- Joachims, T. Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms, Kluwer Academic Publishers, Norwell, MA, USA, 2002.]] Google ScholarDigital Library
- Kolcz, A., and Alspector, J. SVM-based filtering of E-mail spam with content-specific misclassification costs. In Proc. ICDM-2001 Workshop on Text Mining, 2001.]]Google Scholar
- Kubat, M., and Matwin, S. Addressing the curse of imbalanced training sets: biased selection. In Proc. 14th ICML, Morgan Kaufmann, San Francisco, CA, USA, 179--186, 1997.]]Google Scholar
- Lang, L, Newsweeder: Learning to filter netnews. In Proc. 12th ICML. 331--339, 1995.]]Google Scholar
- Lewis, D. D., and Ringuette, M. A comparison of two learning algorithms for text categorization. In 3rd Annual Symp. on Document Analysis and Information Retrieval, 81--93, 1994.]]Google Scholar
- Lewis, D. D., Yang, Y., Rose, T. G, and Li, Fan. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 2004(5):361--397, 2004.]] Google ScholarDigital Library
- McCallum, A. and Nigam, K. A comparison of event models for naĩve Bayes text classification, In Proc. of AAA1-98 Workshop on Learning for Text Categorization, AAAI Press, 41--48, 1998.]]Google Scholar
- Mladenic, D., and Grobelnik, M. Feature selection for unbalanced class distribution and naĩve Bayes. In Proc. 16th ICML, Morgan Kaufmann, San Francisco, CA, USA, 1999.]] Google ScholarDigital Library
- Ng, H. T., Goh, W. B., and Low, K. L. Feature selection, perception learning, and a usability case study for text categorization. In Proc. 20th ACM SIGIR, 67--73, ACM Press, New York, NY, 1997.]] Google ScholarDigital Library
- Salton, G., and Buchley, C. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513--523, 1988.]] Google ScholarDigital Library
- Steinbach, M., Karypis, G., and Kumar, V. A comparison of document clustering techniques. In Text Mining Workshop of the 6th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2000.]]Google Scholar
- Weiss, G. M., and Provost, F. The effect of class distribution on classifier learning: An empirical study. Technical Report ML.-TR-44, Dept. of Computer Science, Rutgers University, New Brunswick, NJ, 2001.]]Google Scholar
- Wiener, E., Pedersen, L O., and Weigend, A. S. A neural network approach to topic spotting. In Proc. of the Symp. on Document Analysis and Information Retrieval, 317--332, 1995.]]Google Scholar
Index Terms
- Text classification based on data partitioning and parameter varying ensembles
Recommendations
Ensemble Approaches of Support Vector Machines for Multiclass Classification
Pattern Recognition and Machine IntelligenceAbstractSupport vector machine (SVM) which was originally designed for binary classification has achieved superior performance in various classification problems. In order to extend it to multiclass classification, one popular approach is to consider the ...
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values
Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Ensemble approaches of support vector machines for multiclass classification
PReMI'07: Proceedings of the 2nd international conference on Pattern recognition and machine intelligenceSupport vector machine (SVM) which was originally designed for binary classification has achieved superior performance in various classification problems. In order to extend it to multiclass classification, one popular approach is to consider the ...
Comments