article

Voting with a parameterized veto strategy: solving the KDD Cup 2006 problem by means of a classifier committee

Authors:

Zsolt T. Kardkovács,

Ferenc P. SzidarovszkyAuthors Info & Claims

ACM SIGKDD Explorations Newsletter, Volume 8, Issue 2

Pages 53 - 62

https://doi.org/10.1145/1233321.1233328

Published: 01 December 2006 Publication History

Abstract

This paper presents our winner solution for the KDD Cup 2006 problem. It is based on the results of three different supervised learning techniques which are then combined in a classifier committee, and finally a single solution is obtained with a voting procedure. The voting procedure assigns weights to each member of the committee according to their average performance on a ten-fold cross-validation test and it also takes into account the confidence values returned by the three algorithms. The final decision of the committee is determined by means of a parameterized veto strategy, which takes into consideration the maximal allowed error rate beside the confidence values of the committee members. The solution presented here won Task 2 and became runner-up at Task 1 in the competition.

References

[1]

A. Blum. Empirical support for Winnow and weighted-majority based algorithms: results on a calendar scheduling domain. In Proc. of 12th Int. Conf. on Machine Learning, pages 64--72, San Francisco, CA, 1995. Morgan Kaufmann.

[2]

S.-B. Cho and J. Ryu. Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features. Proc. IEEE, 90:1744--1753, 2002.

[3]

W. J. Cohen and Y. Singer. Context-sensitive learning methods for text categorization. In Proc. of the 19th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '96), pages 307--315, 1999.

Digital Library

[4]

I. Dagan, Y. Karov, and D. Roth. Mistake-driven learning in text categorization. In C. Cardie and R. Weischedel, editors, Proc. of the 2nd Conf. on Empirical Methods in Natural Language Processing (EMNLP 97), pages 55--63. Association for Computational Linguistics, Somerset, New Jersey, 1997.

[5]

P. A. Devijver and J. Kittler. Pattern Recognition: A Statistical Approach. Prentice Hall, London, 1982.

[6]

R. O. Duda, P. Hart, and D. G. Stork. Pattern classification. Wiley, New York, 2001.

Digital Library

[7]

A. A. Ghaibeh, S. Kuroyanagi, and A. Iwata. Efficient subspace learning using a large scale neural network Combnet-II. In Proc. of the 9th Int. Conf. on Neural Information Processing (ICONIP'02), volume 1, pages 447--451, Singapore, 2002.

[8]

A. R. Golding and D. Roth. Applying Winnow to context-sensitive spelling correction. In Proc. of 13th Int. Conf. on Machine Learning, pages 182--190, Bari, Italy, 1996. Morgan Kaufmann.

[9]

T. Hastie, R. Tibshirani, and J. H. Friedman. The elements of statistical learning: data mining, inference, and prediction. Springer, 2001.

[10]

Z. T. Kardkovács, D. Tikk, and Z. Bánsághi. The ferrety algorithm for the KDD cup 2005 problem. ACM SIGKDD Explorations Newsletter, 7(2):111--116, 2005.

Digital Library

[11]

J. Kittler. Feature set search algorithms. In C. H. Chen, editor, Pattern Recognition and Signal Processing, pages 41--60. Sijthoff & Noordhoff, Alphen aan den Rijn, The Netherlands, 1978.

[12]

M. Kugler, K. Aoki, S. Kuroyanagi, A. Iwata, and A. S. Nugroho. Feature subset selection for support vector machines using confident margin. In Proc. of the IEEE Int. Joint Conf. on Neural Networks (IJCNN'05), volume 2, pages 907--912, Montréal, Canada, 2005.

[13]

Y. H. Li and A. K. Jain. Classification of text documents. The Computer Journal, 41(8):537--546, 1998.

[14]

N. Littlestone. Learning quickly when irrelevant attributes around: A new linear-threshold algorithm. Machine Learning, pages 285--318, 1988.

Digital Library

[15]

N. Littlestone. Comparing sereval linear-threshold learning algorithm on tasks involving superfluous attributes. In Proc. of 12th Int. Conf. on Machine Learning, pages 353--361, San Francisco, CA, 1995. Morgan Kaufmann.

[16]

T. Marill and D. Green. On the effectiveness of receptors in recognition systems. IEEE Trans. on Information Theory, 9:11--17, 1963.

Digital Library

[17]

F. Rosenblatt. The perceptron: a probabilistic model for information storage and organization in brain. Psychological Review, pages 386--407, 1958. (Reprinted in Neurocomputing, MIT Press, 1988).

Digital Library

[18]

R. E. Schapire and Y. Singer. Boostexter: a boosting-based system for text categorization. Machine Learning, 39(2/3):135--168, 2000.

Digital Library

[19]

F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1--47, March 2002.

Digital Library

[20]

D. Shen, R. Pan, J.-T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Q²C@UST: Our winning solution to query classification in KDDCUP 2005. ACM SIGKDD Explorations Newsletter, 7(2):100--110, 2005.

Digital Library

[21]

D. Shen, J. T. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In Proc. of SIGIR'06, 29th ACM Int. Conf. on Research and Development in Information Retrieval, pages 131--138, Seattle, Washington, USA, 6--11 August, 2006.

Digital Library

[22]

W. Siedlecki and J. Sklansky. On automatic feature selection. Int. J. of Pattern Recognition and Artificial Intelligence, 2(2):197--220, 1988.

[23]

D. Tikk and G. Biró. Experiment with a hierarchical text categorization method on the WIPO patent collection. In Proc. of the 4th Int. Symp. on Uncertainty Modeling and Analysis (ISUMA'03), University of Maryland, USA, September 21--24, 2003.

Digital Library

[24]

D. Tikk, G. Biró, and A. Törcsvári. A hierarchical online classifier for patent categorization. In H. A. do Prado and E. Ferneda, editors, Emerging Technologies of Text Mining: Techniques and Applications. Idea Group Inc., 2006. (in press).

[25]

D. Tikk, G. Biró, and J. D. Yang. A hierarchical text categorization approach and its application to FRT expansion. Australian Journal of Intelligent Information Processing Systems, 8(3):123--131, 2004.

[26]

D. Tikk, T. D. Gedeon, and K. W. Wong. A feature ranking algorithm for fuzzy modelling problems. In J. Casillas, O. Cordón, F. Herrera, and L. Magdalena, editors, Interpretability Issues in Fuzzy Modeling, number 128 in Studies in Fuzziness and Soft Computing, pages 176--192. Springer-Verlag, Heidelberg, 2003.

[27]

K. Tumer and J. Ghosh. Error correlation and error reduction in ensemble classifiers. Connection Sci., 8(3--4):385--403, 1996.

[28]

H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. of the 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 226--235, Washington D.C., USA, 2003.

Digital Library

[29]

S. M. Weiss, C. Apte, F. J. Damerau, D. E. Johnson, F. J. Oles, T. Goetz, and T. Hampp. Maximizing text-mining performance. IEEE Intelligent Systems, 14(4):2--8, July/August 1999.

Digital Library

Cited By

Shahzad RLavesson N(2012)Veto-based Malware DetectionProceedings of the 2012 Seventh International Conference on Availability, Reliability and Security10.1109/ARES.2012.85(47-54)Online publication date: 20-Aug-2012
https://dl.acm.org/doi/10.1109/ARES.2012.85
Hsu KSrivastava J(2012)Improving bagging performance through multi-algorithm ensemblesFrontiers of Computer Science10.1007/s11704-012-1163-6Online publication date: 8-Jul-2012
https://doi.org/10.1007/s11704-012-1163-6

Recommendations

Parameterized computational complexity of control problems in voting systems

Voting systems are common tools in a variety of areas. This paper studies parameterized computational complexity of control of Plurality, Condorcet and Approval voting systems, respectively. The types of controls considered include adding or deleting ...
Frugal bribery in voting

Bribery in elections is an important problem in computational social choice theory. We introduce and study two important special cases of the classical $Bribery problem, namely, Frugal-bribery and Frugal-$bribery where the briber is frugal in nature. By ...
Computational aspects of approval voting and declared-strategy voting

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter

ACM SIGKDD Explorations Newsletter Volume 8, Issue 2

December 2006

106 pages

ISSN:1931-0145

EISSN:1931-0153

DOI:10.1145/1233321

Issue’s Table of Contents

Copyright © 2006 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2006

Published in SIGKDD Volume 8, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
181
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shahzad RLavesson N(2012)Veto-based Malware DetectionProceedings of the 2012 Seventh International Conference on Availability, Reliability and Security10.1109/ARES.2012.85(47-54)Online publication date: 20-Aug-2012
https://dl.acm.org/doi/10.1109/ARES.2012.85
Hsu KSrivastava J(2012)Improving bagging performance through multi-algorithm ensemblesFrontiers of Computer Science10.1007/s11704-012-1163-6Online publication date: 8-Jul-2012
https://doi.org/10.1007/s11704-012-1163-6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents