research-article

Fast flux discriminant for large-scale sparse nonlinear classification

Authors:

Kilian Q. WeinbergerAuthors Info & Claims

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 621 - 630

https://doi.org/10.1145/2623330.2623627

Published: 24 August 2014 Publication History

Abstract

In this paper, we propose a novel supervised learning method, Fast Flux Discriminant (FFD), for large-scale nonlinear classification. Compared with other existing methods, FFD has unmatched advantages, as it attains the efficiency and interpretability of linear models as well as the accuracy of nonlinear models. It is also sparse and naturally handles mixed data types. It works by decomposing the kernel density estimation in the entire feature space into selected low-dimensional subspaces. Since there are many possible subspaces, we propose a submodular optimization framework for subspace selection. The selected subspace predictions are then transformed to new features on which a linear model can be learned. Besides, since the transformed features naturally expect non-negative weights, we only require smooth optimization even with the L1 regularization. Unlike other nonlinear models such as kernel methods, the FFD model is interpretable as it gives importance weights on the original features. Its training and testing are also much faster than traditional kernel models. We carry out extensive empirical studies on real-world datasets and show that the proposed model achieves state-of-the-art classification results with sparsity, interpretability, and exceptional scalability. Our model can be learned in minutes on datasets with millions of samples, for which most existing nonlinear methods will be prohibitively expensive in space and time.

Supplementary Material

MP4 File (p621-sidebyside.mp4)

Download
296.65 MB

References

[1]

D. Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Science, 66:671--687, 2003.

Digital Library

[2]

F. Bach. Learning with submodular functions: A convex optimization perspective. Foundations and Trends in Machine Learning, 2013.

[3]

C. M. Bishop. Pattern Recognition and Machine Learning. Springer-VerlagNewYork, Inc., Secaucus, NJ, USA, 2006.

Digital Library

[4]

L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.

Digital Library

[5]

C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011.

Digital Library

[6]

W. Chen, Y. Chen, Y. Mao, and B. Guo. Density-based logistic regression. In Proceedings of the 19th ACM SIGKDD, KDD'13, pages 140--148, New York, NY, USA, 2013. ACM.

Digital Library

[7]

B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of statistics, 32(2):407--499, 2004.

[8]

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9:1871--1874, 2008.

Digital Library

[9]

U. Feige and V. S. Mirrokni. Maximizing non-monotone submodular functions. In In Proceedings of 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), page 2007, 2007.

Digital Library

[10]

S. Fujishige. Submodular Functions and Optimization: Second Edition. Annals of Discrete Mathematics. Elsevier Science, 2005.

[11]

T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, 2009.

[12]

G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527--1554, 2006.

Digital Library

[13]

R. Iyer, S. Jegelka, and J. A. Bilmes. Fast semi differential-based submodular function optimization. In International Conference on Machine Learning (ICML), Atlanta, Georgia, 2013.

[14]

G. James, T. Hastie, D. Witten, and R. Tibshirani. An Introduction to Statistical Learning: With Applications in R. Springer, 2013.

Digital Library

[15]

P. Kar and H. Karnick. Random feature maps for dot product kernels. In Proc. AISTATS, 2012.

[16]

A. Krause. Sfo: A toolbox for submodular function optimization. The Journal of Machine Learning Research, 11:1141--1144, 2010.

Digital Library

[17]

Q. Le, T. Sarlos, and A. Smola. Fastfood - computing hilbert space expansions in loglinear time. In Proceedings of the 30th International Conference on Machine Learning, volume 28, pages 244--252, May 2013.

[18]

P. Li, T. J. Hastie, and K. W. Church. Very sparse random preojections. In Proc. SIGKDD, pages 287--296, 2006.

Digital Library

[19]

P. Li and A. Konig. b-bit minwise hasing. In Proc. WWW, 2010.

Digital Library

[20]

H. Lin and J. Bilmes. Multi-document summarization via budgeted maximization of submodular functions. ACL-HLT '10, pages 912--920, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.

Digital Library

[21]

K. Lin and M. Chen. Efficient kernel approximation for large-scale support vector machine classification. In Proc. SIGKDD, 2011.

[22]

O. Pele, B. Taskar, A. Globerson, and M. Werman. The pairwise piecewise-linear embedding for efficient non-linear classification. In Proceedings of The 30th International Conference on Machine Learning, pages 205--213, 2013.

[23]

N. Pham and R. Pagh. Fast and scalable polynomial kernels via explicit feature maps. In Proc. SIGKDD, 2013.

Digital Library

[24]

A. Rahimi and B. Recht. Random features for large-scale kernel machines. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, Cambridge, MA, 2008.

Digital Library

[25]

M. Schmidt, G. Fung, and R. Rosales. Fast optimization methods for l1 regularization: A comparative study and two new approaches. In ECML, volume 4701, pages 286--297. 2007.

Digital Library

[26]

B. Scholkopf and A. J. Smola. Learning with kernels. The MIT Press, 2002.

[27]

B. W. Silverman and P. J. Green. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.

[28]

J. Snoek, H. Larochelle, and R. Adams. Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems 25, pages 2960--2968. 2012.

[29]

R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.

[30]

A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(3):480--492, 2012.

Digital Library

[31]

C. wei Hsu, C. chung Chang, and C. jen Lin. A practical guide to support vector classification, 2010.

[32]

K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proc. ICML, 2009.

Digital Library

[33]

G. Yuan, C. Ho, and C. Lin. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100:2584--2603, 2012.

Cited By

Liu HWang FYang SHou BJiao LYang R(2019)Fast Semisupervised Classification Using Histogram-Based Density Estimation for Large-Scale Polarimetric SAR DataIEEE Geoscience and Remote Sensing Letters10.1109/LGRS.2019.291041316:12(1844-1848)Online publication date: Dec-2019
https://doi.org/10.1109/LGRS.2019.2910413
Kumar SXu B(2018)A Machine Learning Based Approach to Detect Malicious Fast Flux Networks2018 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI.2018.8628729(1676-1683)Online publication date: Nov-2018
https://doi.org/10.1109/SSCI.2018.8628729
Cui ZZhang MChen Y(2018)Deep Embedding Logistic Regression2018 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICBK.2018.00031(176-183)Online publication date: Nov-2018
https://doi.org/10.1109/ICBK.2018.00031
Show More Cited By

Index Terms

Fast flux discriminant for large-scale sparse nonlinear classification
1. Applied computing
  1. Life and medical sciences
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Facial expression recognition using sparse local Fisher discriminant analysis

In this paper, a novel sparse learning method, called sparse local Fisher discriminant analysis (SLFDA) is proposed for facial expression recognition. The SLFDA method is derived from the original local Fisher discriminant analysis (LFDA) and exploits ...
Sparse two-dimensional discriminant locality-preserving projection (S2DDLPP) for feature extraction

Two-dimensional locality-preserving projection (2DLPP) is an unsupervised method, so it can't use the discrimination information of the sample in the sparse data; elastic net regression can obtain a sparse results of the feature extraction. So, this ...
Sparse Fisher's linear discriminant analysis for partially labeled data

Classification is an important tool with many useful applications. Fisher's linear discriminant analysis LDA is a traditional model-based classification method which makes use of the Gaussian distributional information. However, in the high-dimensional, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2014

2028 pages

ISBN:9781450329569

DOI:10.1145/2623330

General Chairs:
Sofus Macskassy
Facebook
,
Claudia Perlich
Dstillery
,
Program Chairs:
Jure Leskovec
Stanford University
,
Wei Wang
UCLA
,
Rayid Ghani
University of Chicago

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

KDD '14

Sponsor:

KDD '14: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 24 - 27, 2014

New York, New York, USA

Acceptance Rates

KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
677
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu HWang FYang SHou BJiao LYang R(2019)Fast Semisupervised Classification Using Histogram-Based Density Estimation for Large-Scale Polarimetric SAR DataIEEE Geoscience and Remote Sensing Letters10.1109/LGRS.2019.291041316:12(1844-1848)Online publication date: Dec-2019
https://doi.org/10.1109/LGRS.2019.2910413
Kumar SXu B(2018)A Machine Learning Based Approach to Detect Malicious Fast Flux Networks2018 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI.2018.8628729(1676-1683)Online publication date: Nov-2018
https://doi.org/10.1109/SSCI.2018.8628729
Cui ZZhang MChen Y(2018)Deep Embedding Logistic Regression2018 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICBK.2018.00031(176-183)Online publication date: Nov-2018
https://doi.org/10.1109/ICBK.2018.00031
Ye HZhan DJiang Y(2016)Instance specific metric subspace learningProceedings of the Thirtieth AAAI Conference on Artificial Intelligence10.5555/3016100.3016216(2272-2278)Online publication date: 12-Feb-2016
https://dl.acm.org/doi/10.5555/3016100.3016216
Zhang JGarcía J(2016)Online classifier adaptation for cost-sensitive learningNeural Computing and Applications10.1007/s00521-015-1896-x27:3(781-789)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1007/s00521-015-1896-x
Wang JFujimaki RMotohashi YCao LZhang CJoachims TWebb GMargineantu DWilliams G(2015)Trading Interpretability for AccuracyProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2783407(1245-1254)Online publication date: 10-Aug-2015
https://dl.acm.org/doi/10.1145/2783258.2783407
Cui ZChen WHe YChen YCao LZhang CJoachims TWebb GMargineantu DWilliams G(2015)Optimal Action Extraction for Random Forests and Boosted TreesProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2783281(179-188)Online publication date: 10-Aug-2015
https://dl.acm.org/doi/10.1145/2783258.2783281
Shen B(2015)Universal knowledge discovery from big data using combined dual-cycleInternational Journal of Machine Learning and Cybernetics10.1007/s13042-015-0376-z9:1(133-144)Online publication date: 23-May-2015
https://doi.org/10.1007/s13042-015-0376-z

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten