skip to main content
10.1145/2623330.2623627acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Fast flux discriminant for large-scale sparse nonlinear classification

Published: 24 August 2014 Publication History

Abstract

In this paper, we propose a novel supervised learning method, Fast Flux Discriminant (FFD), for large-scale nonlinear classification. Compared with other existing methods, FFD has unmatched advantages, as it attains the efficiency and interpretability of linear models as well as the accuracy of nonlinear models. It is also sparse and naturally handles mixed data types. It works by decomposing the kernel density estimation in the entire feature space into selected low-dimensional subspaces. Since there are many possible subspaces, we propose a submodular optimization framework for subspace selection. The selected subspace predictions are then transformed to new features on which a linear model can be learned. Besides, since the transformed features naturally expect non-negative weights, we only require smooth optimization even with the L1 regularization. Unlike other nonlinear models such as kernel methods, the FFD model is interpretable as it gives importance weights on the original features. Its training and testing are also much faster than traditional kernel models. We carry out extensive empirical studies on real-world datasets and show that the proposed model achieves state-of-the-art classification results with sparsity, interpretability, and exceptional scalability. Our model can be learned in minutes on datasets with millions of samples, for which most existing nonlinear methods will be prohibitively expensive in space and time.

Supplementary Material

MP4 File (p621-sidebyside.mp4)

References

[1]
D. Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Science, 66:671--687, 2003.
[2]
F. Bach. Learning with submodular functions: A convex optimization perspective. Foundations and Trends in Machine Learning, 2013.
[3]
C. M. Bishop. Pattern Recognition and Machine Learning. Springer-VerlagNewYork, Inc., Secaucus, NJ, USA, 2006.
[4]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.
[5]
C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011.
[6]
W. Chen, Y. Chen, Y. Mao, and B. Guo. Density-based logistic regression. In Proceedings of the 19th ACM SIGKDD, KDD'13, pages 140--148, New York, NY, USA, 2013. ACM.
[7]
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of statistics, 32(2):407--499, 2004.
[8]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9:1871--1874, 2008.
[9]
U. Feige and V. S. Mirrokni. Maximizing non-monotone submodular functions. In In Proceedings of 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), page 2007, 2007.
[10]
S. Fujishige. Submodular Functions and Optimization: Second Edition. Annals of Discrete Mathematics. Elsevier Science, 2005.
[11]
T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, 2009.
[12]
G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527--1554, 2006.
[13]
R. Iyer, S. Jegelka, and J. A. Bilmes. Fast semi differential-based submodular function optimization. In International Conference on Machine Learning (ICML), Atlanta, Georgia, 2013.
[14]
G. James, T. Hastie, D. Witten, and R. Tibshirani. An Introduction to Statistical Learning: With Applications in R. Springer, 2013.
[15]
P. Kar and H. Karnick. Random feature maps for dot product kernels. In Proc. AISTATS, 2012.
[16]
A. Krause. Sfo: A toolbox for submodular function optimization. The Journal of Machine Learning Research, 11:1141--1144, 2010.
[17]
Q. Le, T. Sarlos, and A. Smola. Fastfood - computing hilbert space expansions in loglinear time. In Proceedings of the 30th International Conference on Machine Learning, volume 28, pages 244--252, May 2013.
[18]
P. Li, T. J. Hastie, and K. W. Church. Very sparse random preojections. In Proc. SIGKDD, pages 287--296, 2006.
[19]
P. Li and A. Konig. b-bit minwise hasing. In Proc. WWW, 2010.
[20]
H. Lin and J. Bilmes. Multi-document summarization via budgeted maximization of submodular functions. ACL-HLT '10, pages 912--920, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
[21]
K. Lin and M. Chen. Efficient kernel approximation for large-scale support vector machine classification. In Proc. SIGKDD, 2011.
[22]
O. Pele, B. Taskar, A. Globerson, and M. Werman. The pairwise piecewise-linear embedding for efficient non-linear classification. In Proceedings of The 30th International Conference on Machine Learning, pages 205--213, 2013.
[23]
N. Pham and R. Pagh. Fast and scalable polynomial kernels via explicit feature maps. In Proc. SIGKDD, 2013.
[24]
A. Rahimi and B. Recht. Random features for large-scale kernel machines. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, Cambridge, MA, 2008.
[25]
M. Schmidt, G. Fung, and R. Rosales. Fast optimization methods for l1 regularization: A comparative study and two new approaches. In ECML, volume 4701, pages 286--297. 2007.
[26]
B. Scholkopf and A. J. Smola. Learning with kernels. The MIT Press, 2002.
[27]
B. W. Silverman and P. J. Green. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
[28]
J. Snoek, H. Larochelle, and R. Adams. Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems 25, pages 2960--2968. 2012.
[29]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.
[30]
A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(3):480--492, 2012.
[31]
C. wei Hsu, C. chung Chang, and C. jen Lin. A practical guide to support vector classification, 2010.
[32]
K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proc. ICML, 2009.
[33]
G. Yuan, C. Ho, and C. Lin. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100:2584--2603, 2012.

Cited By

View all
  • (2019)Fast Semisupervised Classification Using Histogram-Based Density Estimation for Large-Scale Polarimetric SAR DataIEEE Geoscience and Remote Sensing Letters10.1109/LGRS.2019.291041316:12(1844-1848)Online publication date: Dec-2019
  • (2018)A Machine Learning Based Approach to Detect Malicious Fast Flux Networks2018 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI.2018.8628729(1676-1683)Online publication date: Nov-2018
  • (2018)Deep Embedding Logistic Regression2018 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICBK.2018.00031(176-183)Online publication date: Nov-2018
  • Show More Cited By

Index Terms

  1. Fast flux discriminant for large-scale sparse nonlinear classification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2014
      2028 pages
      ISBN:9781450329569
      DOI:10.1145/2623330
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 August 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. classification
      2. interpretability
      3. sparsity
      4. submodularity

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      KDD '14
      Sponsor:

      Acceptance Rates

      KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;
      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 03 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Fast Semisupervised Classification Using Histogram-Based Density Estimation for Large-Scale Polarimetric SAR DataIEEE Geoscience and Remote Sensing Letters10.1109/LGRS.2019.291041316:12(1844-1848)Online publication date: Dec-2019
      • (2018)A Machine Learning Based Approach to Detect Malicious Fast Flux Networks2018 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI.2018.8628729(1676-1683)Online publication date: Nov-2018
      • (2018)Deep Embedding Logistic Regression2018 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICBK.2018.00031(176-183)Online publication date: Nov-2018
      • (2016)Instance specific metric subspace learningProceedings of the Thirtieth AAAI Conference on Artificial Intelligence10.5555/3016100.3016216(2272-2278)Online publication date: 12-Feb-2016
      • (2016)Online classifier adaptation for cost-sensitive learningNeural Computing and Applications10.1007/s00521-015-1896-x27:3(781-789)Online publication date: 1-Apr-2016
      • (2015)Trading Interpretability for AccuracyProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2783407(1245-1254)Online publication date: 10-Aug-2015
      • (2015)Optimal Action Extraction for Random Forests and Boosted TreesProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2783281(179-188)Online publication date: 10-Aug-2015
      • (2015)Universal knowledge discovery from big data using combined dual-cycleInternational Journal of Machine Learning and Cybernetics10.1007/s13042-015-0376-z9:1(133-144)Online publication date: 23-May-2015

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media