ABSTRACT
We introduce a classifier based on the L-infinity norm. This classifier, called CHIRP, is an iterative sequence of three stages (projecting, binning, and covering) that are designed to deal with the curse of dimensionality, computational complexity, and nonlinear separability. CHIRP is not a hybrid or modification of existing classifiers; it employs a new covering algorithm. The accuracy of CHIRP on widely-used benchmark datasets exceeds the accuracy of competitors. Its computational complexity is sub-linear in number of instances and number of variables and subquadratic in number of classes.
- M. R. Abdullah, K.-A. Toh, and D. Srinivasan. A framework for empirical classifiers comparison. In Industrial Electronics and Applications. IEEE, 2006.Google Scholar
- D. Achlioptas. Database-friendly random projections. In PODS '01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 274--281, New York, 2001. ACM. Google ScholarDigital Library
- R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the 1998 ACM SIGMOD, pages 94--105, 1998. Google ScholarDigital Library
- J. Aguilar, J. Riquelme, and M. Toro. Decision queue classifier for supervised learning using rotated hyperboxes. In Proceedings of the 6th Ibero-American Conference on AI: Progress in Artificial Intelligence, volume 4045 of Lecture Notes in Computer Science, pages 326--336. Springer, 1998. Google ScholarDigital Library
- B. Alpern and L. Carter. The hyperbox. In Proceedings of the IEEE Information Visualization 1991, pages 133--134, 1991. Google ScholarDigital Library
- A. Anand, L. Wilkinson, and D. N. Tuan. An L-infinity norm visual classifier. In ICDM, pages 687--692, 2009. Google ScholarDigital Library
- A. Asuncion and D. Newman. UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html, 2007.Google Scholar
- Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57:289--300, 1995.Google ScholarCross Ref
- P. Bickel and E. Levina. Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations. Bernoulli, 10:989--1010, 2004.Google ScholarCross Ref
- L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, Belmont, CA, 1984.Google Scholar
- S. Bu, L. V. S. Lakshmanan, and R. T. Ng. MDL summarization with holes. In VLDB '05: Proceedings of the 31st international conference on Very large data bases, pages 433--444. VLDB Endowment, 2005. Google ScholarDigital Library
- T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of AI Research, 2:263--286, 1995. Google ScholarDigital Library
- C. Ding and X. He. K-means clustering via principal component analysis. In ICML '04, page 29, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- T. E. Flick, L. K. Jones, R. G. Priest, and C. Herman. Pattern classification using projection pursuit. Pattern Recognition, 23:1367--1376, 1990. Google ScholarDigital Library
- Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT '95, pages 23--37, London, UK, 1995. Springer-Verlag. Google ScholarDigital Library
- B. Gao. Hyper-rectangle-based discriminative data generalization and applications in data. PhD thesis, Simon Fraser University, 2002. Google ScholarDigital Library
- B. J. Gao and M. Ester. Turning clusters into patterns: Rectangle-based discriminative data description. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 200--211, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- Y. Guo, T. Hastie, and R. Tibshirani. Regularized discriminant analysis and its application in microarrays. Biostatistics, 1:1--18, 2005.Google Scholar
- T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2001.Google Scholar
- C. Hegde and R. Baraniuk. Random projections for manifold learning. In NIPS 2007: Proceedings of the 2007 conference on Advances in neural information processing systems, Cambridge, MA, USA, 2007. MIT Press.Google Scholar
- L. O. Jimenez and D. A. Landgrebe. Projection pursuit for high dimensional feature reduction: paralleland sequential approaches. In Geoscience and Remote Sensing Symposium, 1995. IGARSS '95, volume 1, pages 148--150, 1995.Google ScholarCross Ref
- W. B. Johnson and J. Lindenstrauss. Lipschitz mapping into Hilbert space. Contemporary Mathematics, 26:189--206, 1984.Google ScholarCross Ref
- R. King, C. Feng, and A. Sutherland. Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9, 1995.Google Scholar
- P. F. Lazarsfeld and N. Henry. Latent Structure Analysis. Houghton Mifflin, Boston, 1968.Google Scholar
- E.-K. Lee, D. Cook, S. Klinke, and T. Lumley. Projection pursuit for exploratory supervised classification. Journal of Computational and Graphical Statistics, 14:831--846, 2005.Google ScholarCross Ref
- P. Li. Robust LogitBoost and Adaptive Base Class (ABC) LogitBoost. In UAI 2010 Proceedings. IEEE, 2010.Google Scholar
- P. Li, T. J. Hastie, and K. W. Church. Very sparse random projections. In KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 287--296, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- M. Marchand and J. Shawe-Taylor. The set covering machine. Journal of Machine Learning Research, 3:723--746, 2002. Google ScholarDigital Library
- K. Q. Pu and A. O. Mendelzon. Concise descriptions of subsets of structured sets. ACM Transactions on Database Systems, 30(1):211--248, 2005. Google ScholarDigital Library
- J. R. Quinlan. C4.5: Programs for Machine Learning (Morgan Kaufmann Series in Machine Learning). Morgan Kaufmann, 1993. Google ScholarDigital Library
- R. L. Rivest. Learning decision lists. Machine Learning, 2:229--246, 1987. Google ScholarDigital Library
- D. W. Scott. On optimal and data-based histograms. Biometrika, 66:605--610, 1979.Google ScholarCross Ref
- B. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall, New York, 1986.Google ScholarCross Ref
- P. K. Simpson. Fuzzy min-max neural network, i: Classification. IEEE Transactions on Neural Networks, 3:776--786, 1992.Google ScholarDigital Library
- M. Sokolova, N. Japkowicz, M. Marchand, and J. Shawe-taylor. The decision list machine. In Advances in Neural Information Processing Systems 15, pages 921--928. MIT Press, 2003.Google Scholar
- A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 21(5):631--643, 2005. Google ScholarDigital Library
- H. A. Sturges. The choice of a class interval. Journal of the American Statistical Association, 21:65--66, 1926.Google ScholarCross Ref
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267--288, 1995.Google Scholar
- R. Tibshirani and T. Hastie. Margin trees for high-dimensional classification. Journal of Machine Learning Research, 8:637--652, 2007. Google ScholarDigital Library
- J. Tukey. A quick, compact, two-sample test to Duckworth's specifications. Technometrics, pages 31--48, 1959.Google Scholar
- F. Üney and M. Türkay. A mixed-integer programming approach to multi-class data classification problem. European Journal of Operational Research, 173:910--920, 2006.Google ScholarCross Ref
- S. S. Vempala. The Random Projection Method. American Mathematical Society, Providence, RI, USA, 2004.Google Scholar
- H. Wainer. Estimating coefficients in linear models: It don't make no nevermind. Psychological Bulletin, 83(2):213--217, 1976.Google ScholarCross Ref
- M. P. Wand. Data-based choice of histogram bin width. The American Statistician, 51(1):59--64, 1997.Google Scholar
- I. H. Witten, E. Frank, L. Trigg, M. Hall, G. Holmes, and S. J. Cunningham. Weka: Practical machine learning tools and techniques with Java implementations. In Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems, pages 192--196, 1999.Google Scholar
Index Terms
- CHIRP: a new classifier based on composite hypercubes on iterated random projections
Recommendations
Substantial improvements in the set-covering projection classifier CHIRP (composite hypercubes on iterated random projections)
Special Issue on the Best of SIGKDD 2011In Wilkinson et al. [2011] we introduced a new set-covering random projection classifier that achieved average error lower than that of other classifiers in the Weka platform. This classifier was based on an L∞ norm distance function and exploited an ...
Ensembles of classifiers based on dimensionality reduction
We present a novel approach for the construction of ensemble classifiers based on dimensionality reduction. The ensemble members are trained based on dimension-reduced versions of the training set. In order to classify a test sample, it is first ...
A comparison between k-Optimum Path Forest and k-Nearest Neighbors supervised classifiers
This paper presents the k-Optimum Path Forest (k-OPF) supervised classifier, which is a natural extension of the OPF classifier. k-OPF is compared to the k-Nearest Neighbors (k-NN), Support Vector Machine (SVM) and Decision Tree (DT) classifiers, and we ...
Comments