Article

Text classification based on data partitioning and parameter varying ensembles

Authors:
Yan-Shi Dong

Shanghai Jiao Tong University

Shanghai Jiao Tong University
View Profile

,
Ke-Song Han

China Research Center

China Research Center
View Profile

SAC '05: Proceedings of the 2005 ACM symposium on Applied computingMarch 2005Pages 1044–1048https://doi.org/10.1145/1066677.1066916

Published:13 March 2005Publication History

SAC '05: Proceedings of the 2005 ACM symposium on Applied computing

Pages 1044–1048

ABSTRACT

Support vector machines (SVM) are among the best text classifiers so far. Meantimes, ensembles of classifiers are proven to be effective on many domains. It is expected that ensembles of SVM classifiers could achieve better performance. In this paper two types of ensembles on SVM classifiers, the data partitioning ensembles and heterogeneous ensembles, have been proposed and experimentally evaluated on three well-accepted collections. Major conclusions are that disjunct partitioning ensembles with stacking could achieve the best performance, and that the parameter varying ensembles are proven to be effective, meanwhile have the advantage of being deterministic.

References

Ali, K. M., and Pazzani, M. J. Error reduction through learning multiple descriptions. Machine Learning, 24(3):173--202, 1996.]] Google ScholarDigital Library
Brank, J., Grobelnik, M., Milic-Frayling, N., and Mladenic, D. Interaction of feature selection methods and linear classification models. In Proceedings of the ICML-02 Workshop on Text Learning, Sydney, AU, 2002.]]Google Scholar
Breiman, L. Bagging predictors. Machine Learning 24(2):123--140, 1996.]] Google ScholarCross Ref
Chan, P., and Stolfo, S. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proc. 4th intl. Conf. on knowledge discovery and data mining, AAAI Press, Menlo Park, 164--168, 1998.]]Google Scholar
Chang, Y. I. Boosting SVM classifiers with logistic regression. See www.stat.sinica.edu.tw/library/c_tec_rep/2003-03.pdf, 2003.]]Google Scholar
Chawla, N. V., Lazarevic, A., Hall, L. O., and Bowyer, K. W. SMOTEBoost: Improving prediction of the minority class in boosting. In 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Springer, Berlin, New York, 107--119, 2003.]]Google ScholarCross Ref
Dietterich, T. G. Ensemble methods in machine learning. In J. Kittler and F. Roli (Ed.), 1th Intl. Workshop on Multiple Classifier Systems, 1-15, Springer Verlag, New York, NY, USA, 2000.]] Google ScholarDigital Library
Dumais, S., Platt, J., Heckerman, D., and Sahami, M. Inductive learning algorithms and representations for text categorization. In Proc. 7th Intl. Conf. on Information and knowledge management, ACM Press New York, NY, USA, 148--155, 1998.]] Google ScholarDigital Library
Freund, Y., and Schapire, R. E. Experiments with a new boosting algorithm. In Proc. 13th ICML, Morgan Kaufmann, San Francisco, CA, USA, 325--332, 1996.]] Google ScholarDigital Library
Joachims, T. Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms, Kluwer Academic Publishers, Norwell, MA, USA, 2002.]] Google ScholarDigital Library
Kolcz, A., and Alspector, J. SVM-based filtering of E-mail spam with content-specific misclassification costs. In Proc. ICDM-2001 Workshop on Text Mining, 2001.]]Google Scholar
Kubat, M., and Matwin, S. Addressing the curse of imbalanced training sets: biased selection. In Proc. 14th ICML, Morgan Kaufmann, San Francisco, CA, USA, 179--186, 1997.]]Google Scholar
Lang, L, Newsweeder: Learning to filter netnews. In Proc. 12th ICML. 331--339, 1995.]]Google Scholar
Lewis, D. D., and Ringuette, M. A comparison of two learning algorithms for text categorization. In 3rd Annual Symp. on Document Analysis and Information Retrieval, 81--93, 1994.]]Google Scholar
Lewis, D. D., Yang, Y., Rose, T. G, and Li, Fan. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 2004(5):361--397, 2004.]] Google ScholarDigital Library
McCallum, A. and Nigam, K. A comparison of event models for naĩve Bayes text classification, In Proc. of AAA1-98 Workshop on Learning for Text Categorization, AAAI Press, 41--48, 1998.]]Google Scholar
Mladenic, D., and Grobelnik, M. Feature selection for unbalanced class distribution and naĩve Bayes. In Proc. 16th ICML, Morgan Kaufmann, San Francisco, CA, USA, 1999.]] Google ScholarDigital Library
Ng, H. T., Goh, W. B., and Low, K. L. Feature selection, perception learning, and a usability case study for text categorization. In Proc. 20th ACM SIGIR, 67--73, ACM Press, New York, NY, 1997.]] Google ScholarDigital Library
Salton, G., and Buchley, C. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513--523, 1988.]] Google ScholarDigital Library
Steinbach, M., Karypis, G., and Kumar, V. A comparison of document clustering techniques. In Text Mining Workshop of the 6th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2000.]]Google Scholar
Weiss, G. M., and Provost, F. The effect of class distribution on classifier learning: An empirical study. Technical Report ML.-TR-44, Dept. of Computer Science, Rutgers University, New Brunswick, NJ, 2001.]]Google Scholar
Wiener, E., Pedersen, L O., and Weigend, A. S. A neural network approach to topic spotting. In Proc. of the Symp. on Document Analysis and Information Retrieval, 317--332, 1995.]]Google Scholar

Index Terms

Text classification based on data partitioning and parameter varying ensembles
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval

Recommendations

Ensemble Approaches of Support Vector Machines for Multiclass Classification
Pattern Recognition and Machine Intelligence
Abstract
Support vector machine (SVM) which was originally designed for binary classification has achieved superior performance in various classification problems. In order to extend it to multiclass classification, one popular approach is to consider the ...
Read More
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Read More
Ensemble approaches of support vector machines for multiclass classification
PReMI'07: Proceedings of the 2nd international conference on Pattern recognition and machine intelligence

Support vector machine (SVM) which was originally designed for binary classification has achieved superior performance in various classification problems. In order to extend it to multiclass classification, one popular approach is to consider the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '05: Proceedings of the 2005 ACM symposium on Applied computing
March 2005
1814 pages
ISBN:1581139640
DOI:10.1145/1066677
Conference Chair:
Hisham M. Haddad
Kennesaw State University
,
Editor:
Lorie M. Liebrock
New Mexico Institute of Mining and Technology, Socorro, NM
,
Program Chairs:
Andrea Omicini
Alma Mater Studiorum, Universita di Bologna, Italy
,
Roger L. Wainwright
Univerity of Tulsa, OK
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 March 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ensemble
support vector machines
text classification
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 367
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Text classification based on data partitioning and parameter varying ensembles

SAC '05: Proceedings of the 2005 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Ensemble Approaches of Support Vector Machines for Multiclass Classification

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Ensemble approaches of support vector machines for multiclass classification