skip to main content
10.1145/1066677.1066916acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Text classification based on data partitioning and parameter varying ensembles

Published:13 March 2005Publication History

ABSTRACT

Support vector machines (SVM) are among the best text classifiers so far. Meantimes, ensembles of classifiers are proven to be effective on many domains. It is expected that ensembles of SVM classifiers could achieve better performance. In this paper two types of ensembles on SVM classifiers, the data partitioning ensembles and heterogeneous ensembles, have been proposed and experimentally evaluated on three well-accepted collections. Major conclusions are that disjunct partitioning ensembles with stacking could achieve the best performance, and that the parameter varying ensembles are proven to be effective, meanwhile have the advantage of being deterministic.

References

  1. Ali, K. M., and Pazzani, M. J. Error reduction through learning multiple descriptions. Machine Learning, 24(3):173--202, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Brank, J., Grobelnik, M., Milic-Frayling, N., and Mladenic, D. Interaction of feature selection methods and linear classification models. In Proceedings of the ICML-02 Workshop on Text Learning, Sydney, AU, 2002.]]Google ScholarGoogle Scholar
  3. Breiman, L. Bagging predictors. Machine Learning 24(2):123--140, 1996.]] Google ScholarGoogle ScholarCross RefCross Ref
  4. Chan, P., and Stolfo, S. Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. In Proc. 4th intl. Conf. on knowledge discovery and data mining, AAAI Press, Menlo Park, 164--168, 1998.]]Google ScholarGoogle Scholar
  5. Chang, Y. I. Boosting SVM classifiers with logistic regression. See www.stat.sinica.edu.tw/library/c_tec_rep/2003-03.pdf, 2003.]]Google ScholarGoogle Scholar
  6. Chawla, N. V., Lazarevic, A., Hall, L. O., and Bowyer, K. W. SMOTEBoost: Improving prediction of the minority class in boosting. In 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Springer, Berlin, New York, 107--119, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  7. Dietterich, T. G. Ensemble methods in machine learning. In J. Kittler and F. Roli (Ed.), 1th Intl. Workshop on Multiple Classifier Systems, 1-15, Springer Verlag, New York, NY, USA, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dumais, S., Platt, J., Heckerman, D., and Sahami, M. Inductive learning algorithms and representations for text categorization. In Proc. 7th Intl. Conf. on Information and knowledge management, ACM Press New York, NY, USA, 148--155, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Freund, Y., and Schapire, R. E. Experiments with a new boosting algorithm. In Proc. 13th ICML, Morgan Kaufmann, San Francisco, CA, USA, 325--332, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Joachims, T. Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms, Kluwer Academic Publishers, Norwell, MA, USA, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kolcz, A., and Alspector, J. SVM-based filtering of E-mail spam with content-specific misclassification costs. In Proc. ICDM-2001 Workshop on Text Mining, 2001.]]Google ScholarGoogle Scholar
  12. Kubat, M., and Matwin, S. Addressing the curse of imbalanced training sets: biased selection. In Proc. 14th ICML, Morgan Kaufmann, San Francisco, CA, USA, 179--186, 1997.]]Google ScholarGoogle Scholar
  13. Lang, L, Newsweeder: Learning to filter netnews. In Proc. 12th ICML. 331--339, 1995.]]Google ScholarGoogle Scholar
  14. Lewis, D. D., and Ringuette, M. A comparison of two learning algorithms for text categorization. In 3rd Annual Symp. on Document Analysis and Information Retrieval, 81--93, 1994.]]Google ScholarGoogle Scholar
  15. Lewis, D. D., Yang, Y., Rose, T. G, and Li, Fan. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 2004(5):361--397, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. McCallum, A. and Nigam, K. A comparison of event models for naĩve Bayes text classification, In Proc. of AAA1-98 Workshop on Learning for Text Categorization, AAAI Press, 41--48, 1998.]]Google ScholarGoogle Scholar
  17. Mladenic, D., and Grobelnik, M. Feature selection for unbalanced class distribution and naĩve Bayes. In Proc. 16th ICML, Morgan Kaufmann, San Francisco, CA, USA, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ng, H. T., Goh, W. B., and Low, K. L. Feature selection, perception learning, and a usability case study for text categorization. In Proc. 20th ACM SIGIR, 67--73, ACM Press, New York, NY, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Salton, G., and Buchley, C. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513--523, 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Steinbach, M., Karypis, G., and Kumar, V. A comparison of document clustering techniques. In Text Mining Workshop of the 6th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2000.]]Google ScholarGoogle Scholar
  21. Weiss, G. M., and Provost, F. The effect of class distribution on classifier learning: An empirical study. Technical Report ML.-TR-44, Dept. of Computer Science, Rutgers University, New Brunswick, NJ, 2001.]]Google ScholarGoogle Scholar
  22. Wiener, E., Pedersen, L O., and Weigend, A. S. A neural network approach to topic spotting. In Proc. of the Symp. on Document Analysis and Information Retrieval, 317--332, 1995.]]Google ScholarGoogle Scholar

Index Terms

  1. Text classification based on data partitioning and parameter varying ensembles

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SAC '05: Proceedings of the 2005 ACM symposium on Applied computing
        March 2005
        1814 pages
        ISBN:1581139640
        DOI:10.1145/1066677

        Copyright © 2005 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 March 2005

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,650of6,669submissions,25%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader