skip to main content
article

Voting with a parameterized veto strategy: solving the KDD Cup 2006 problem by means of a classifier committee

Published: 01 December 2006 Publication History

Abstract

This paper presents our winner solution for the KDD Cup 2006 problem. It is based on the results of three different supervised learning techniques which are then combined in a classifier committee, and finally a single solution is obtained with a voting procedure. The voting procedure assigns weights to each member of the committee according to their average performance on a ten-fold cross-validation test and it also takes into account the confidence values returned by the three algorithms. The final decision of the committee is determined by means of a parameterized veto strategy, which takes into consideration the maximal allowed error rate beside the confidence values of the committee members. The solution presented here won Task 2 and became runner-up at Task 1 in the competition.

References

[1]
A. Blum. Empirical support for Winnow and weighted-majority based algorithms: results on a calendar scheduling domain. In Proc. of 12th Int. Conf. on Machine Learning, pages 64--72, San Francisco, CA, 1995. Morgan Kaufmann.
[2]
S.-B. Cho and J. Ryu. Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features. Proc. IEEE, 90:1744--1753, 2002.
[3]
W. J. Cohen and Y. Singer. Context-sensitive learning methods for text categorization. In Proc. of the 19th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '96), pages 307--315, 1999.
[4]
I. Dagan, Y. Karov, and D. Roth. Mistake-driven learning in text categorization. In C. Cardie and R. Weischedel, editors, Proc. of the 2nd Conf. on Empirical Methods in Natural Language Processing (EMNLP 97), pages 55--63. Association for Computational Linguistics, Somerset, New Jersey, 1997.
[5]
P. A. Devijver and J. Kittler. Pattern Recognition: A Statistical Approach. Prentice Hall, London, 1982.
[6]
R. O. Duda, P. Hart, and D. G. Stork. Pattern classification. Wiley, New York, 2001.
[7]
A. A. Ghaibeh, S. Kuroyanagi, and A. Iwata. Efficient subspace learning using a large scale neural network Combnet-II. In Proc. of the 9th Int. Conf. on Neural Information Processing (ICONIP'02), volume 1, pages 447--451, Singapore, 2002.
[8]
A. R. Golding and D. Roth. Applying Winnow to context-sensitive spelling correction. In Proc. of 13th Int. Conf. on Machine Learning, pages 182--190, Bari, Italy, 1996. Morgan Kaufmann.
[9]
T. Hastie, R. Tibshirani, and J. H. Friedman. The elements of statistical learning: data mining, inference, and prediction. Springer, 2001.
[10]
Z. T. Kardkovács, D. Tikk, and Z. Bánsághi. The ferrety algorithm for the KDD cup 2005 problem. ACM SIGKDD Explorations Newsletter, 7(2):111--116, 2005.
[11]
J. Kittler. Feature set search algorithms. In C. H. Chen, editor, Pattern Recognition and Signal Processing, pages 41--60. Sijthoff & Noordhoff, Alphen aan den Rijn, The Netherlands, 1978.
[12]
M. Kugler, K. Aoki, S. Kuroyanagi, A. Iwata, and A. S. Nugroho. Feature subset selection for support vector machines using confident margin. In Proc. of the IEEE Int. Joint Conf. on Neural Networks (IJCNN'05), volume 2, pages 907--912, Montréal, Canada, 2005.
[13]
Y. H. Li and A. K. Jain. Classification of text documents. The Computer Journal, 41(8):537--546, 1998.
[14]
N. Littlestone. Learning quickly when irrelevant attributes around: A new linear-threshold algorithm. Machine Learning, pages 285--318, 1988.
[15]
N. Littlestone. Comparing sereval linear-threshold learning algorithm on tasks involving superfluous attributes. In Proc. of 12th Int. Conf. on Machine Learning, pages 353--361, San Francisco, CA, 1995. Morgan Kaufmann.
[16]
T. Marill and D. Green. On the effectiveness of receptors in recognition systems. IEEE Trans. on Information Theory, 9:11--17, 1963.
[17]
F. Rosenblatt. The perceptron: a probabilistic model for information storage and organization in brain. Psychological Review, pages 386--407, 1958. (Reprinted in Neurocomputing, MIT Press, 1988).
[18]
R. E. Schapire and Y. Singer. Boostexter: a boosting-based system for text categorization. Machine Learning, 39(2/3):135--168, 2000.
[19]
F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1--47, March 2002.
[20]
D. Shen, R. Pan, J.-T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Q2C@UST: Our winning solution to query classification in KDDCUP 2005. ACM SIGKDD Explorations Newsletter, 7(2):100--110, 2005.
[21]
D. Shen, J. T. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In Proc. of SIGIR'06, 29th ACM Int. Conf. on Research and Development in Information Retrieval, pages 131--138, Seattle, Washington, USA, 6--11 August, 2006.
[22]
W. Siedlecki and J. Sklansky. On automatic feature selection. Int. J. of Pattern Recognition and Artificial Intelligence, 2(2):197--220, 1988.
[23]
D. Tikk and G. Biró. Experiment with a hierarchical text categorization method on the WIPO patent collection. In Proc. of the 4th Int. Symp. on Uncertainty Modeling and Analysis (ISUMA'03), University of Maryland, USA, September 21--24, 2003.
[24]
D. Tikk, G. Biró, and A. Törcsvári. A hierarchical online classifier for patent categorization. In H. A. do Prado and E. Ferneda, editors, Emerging Technologies of Text Mining: Techniques and Applications. Idea Group Inc., 2006. (in press).
[25]
D. Tikk, G. Biró, and J. D. Yang. A hierarchical text categorization approach and its application to FRT expansion. Australian Journal of Intelligent Information Processing Systems, 8(3):123--131, 2004.
[26]
D. Tikk, T. D. Gedeon, and K. W. Wong. A feature ranking algorithm for fuzzy modelling problems. In J. Casillas, O. Cordón, F. Herrera, and L. Magdalena, editors, Interpretability Issues in Fuzzy Modeling, number 128 in Studies in Fuzziness and Soft Computing, pages 176--192. Springer-Verlag, Heidelberg, 2003.
[27]
K. Tumer and J. Ghosh. Error correlation and error reduction in ensemble classifiers. Connection Sci., 8(3--4):385--403, 1996.
[28]
H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. of the 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 226--235, Washington D.C., USA, 2003.
[29]
S. M. Weiss, C. Apte, F. J. Damerau, D. E. Johnson, F. J. Oles, T. Goetz, and T. Hampp. Maximizing text-mining performance. IEEE Intelligent Systems, 14(4):2--8, July/August 1999.

Cited By

View all
  • (2012)Veto-based Malware DetectionProceedings of the 2012 Seventh International Conference on Availability, Reliability and Security10.1109/ARES.2012.85(47-54)Online publication date: 20-Aug-2012
  • (2012)Improving bagging performance through multi-algorithm ensemblesFrontiers of Computer Science10.1007/s11704-012-1163-6Online publication date: 8-Jul-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 8, Issue 2
December 2006
106 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1233321
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2006
Published in SIGKDD Volume 8, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2012)Veto-based Malware DetectionProceedings of the 2012 Seventh International Conference on Availability, Reliability and Security10.1109/ARES.2012.85(47-54)Online publication date: 20-Aug-2012
  • (2012)Improving bagging performance through multi-algorithm ensemblesFrontiers of Computer Science10.1007/s11704-012-1163-6Online publication date: 8-Jul-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media