Skip to main content

Budget-Driven Big Data Classification

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (Canadian AI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9091))

Included in the following conference series:

Abstract

A practical large-scale data classification approach is presented in this paper. By exploiting online learning framework, our approach learns a set of competing one-class Support Vector Machine models, one for each data class. The presented approach enjoys three budget-driven features: 1) it is capable of handling classification when data cannot fit in memory; 2) both training and labeling process is user controllable; 3) the classifiers can easily adapt to changes in dynamic data with minimal computational cost. Compared with the most popular big data classification tool, LibLinear, our approach is shown to be competent at processing extreme large data, while consuming a fractional of memory and time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yahoo! large-scale flickr-tag image classification grand challenge. http://acmmm13.org/submissions/call-for-multimedia-grand-challenge-solutions/yahoo-large-scale-flickr-tag-image-classification-challenge/

  2. Yahoo! webscope dataset ydata-flickr-ten-tag-images-v1_0. http://webscope.sandbox.yahoo.com/catalog.php?datatype=i

  3. Barla, A., Odone, F., Verri, A.: Histogram intersection kernel for image classification. In: Proceedings of the 2003 International Conference on Image Processing, ICIP 2003, vol. 3, pp. III–513. IEEE (2003)

    Google Scholar 

  4. Blondel, M., Seki, K., Uehara, K.: Block coordinate descent algorithms for large-scale sparse multiclass classification. Machine Learning 93(1), 31–52 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  5. Boullé, M.: A parameter-free classification method for large scale learning. The Journal of Machine Learning Research 10, 1367–1385 (2009)

    MATH  Google Scholar 

  6. Chang, K.W., Roth, D.: Selective block minimization for faster convergence of limited memory large-scale linear models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 699–707. ACM (2011)

    Google Scholar 

  7. Cheng, L., Gong, M., Schuurmans, D., Caelli, T.: Real-time discriminative background subtraction. IEEE Transactions on Image Processing 20(5), 1401–1414 (2011)

    Article  MathSciNet  Google Scholar 

  8. Chu, C., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. Advances in Neural Information Processing Systems 19, 281 (2007)

    Google Scholar 

  9. Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research 2, 265–292 (2002)

    MATH  Google Scholar 

  10. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. arXiv preprint cs/9501101 (1995)

  11. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. The Journal of Machine Learning Research 9, 1871–1874 (2008)

    MATH  Google Scholar 

  12. Gao, T., Koller, D.: Multiclass boosting with hinge loss based on output coding. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 569–576 (2011)

    Google Scholar 

  13. Ghoting, A., Krishnamurthy, R., Pednault, E., Reinwald, B., Sindhwani, V., Tatikonda, S., Tian, Y., Vaithyanathan, S.: Systemml: Declarative machine learning on mapreduce. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 231–242. IEEE (2011)

    Google Scholar 

  14. Gong, M., Qian, Y., Cheng, L.: Integrated foreground segmentation and boundary matting for live videos. IEEE Trans., Image Processing (TIP) (2015)

    Google Scholar 

  15. Gong, M., Cheng, L.: Foreground segmentation of live videos using locally competing 1svms. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2105–2112. IEEE (2011)

    Google Scholar 

  16. Graf, H.P., Cosatto, E., Bottou, L., Dourdanovic, I., Vapnik, V.: Parallel support vector machines: The cascade svm. In: Advances in Neural Information Processing Systems, pp. 521–528 (2004)

    Google Scholar 

  17. Joachims, T.: Training linear svms in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–226. ACM (2006)

    Google Scholar 

  18. Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. In: Advances in Neural Information Processing Systems, pp. 905–912 (2009)

    Google Scholar 

  19. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108(2), 212–261 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  20. Mantziou, E., Papadopoulos, S., Kompatsiaris, Y.: Scalable training with approximate incremental laplacian eigenmaps and pca. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 381–384. ACM (2013)

    Google Scholar 

  21. Nie, F., Huang, Y., Wang, X., Huang, H.: New primal svm solver with linear computational cost for big data classifications. In: Proceedings of the 31st International Conference on Machine Learning (ICML) (2014)

    Google Scholar 

  22. Qian, Y., Gong, M., Cheng, L.: Stocs: An efficient self-tuning multiclass classification approach. In: Advances in Artificial Intelligence - 28th Canadian Conference on Artificial Intelligence, Canadian AI 2015. Springer (2015)

    Google Scholar 

  23. Rai, P., Daumé III, H., Venkatasubramanian, S.: Streamed learning: One-pass svms. In: IJCAI, vol. 9, 1211–1216 (2009)

    Google Scholar 

  24. Rifkin, R., Klautau, A.: In defense of one-vs-all classification. The Journal of Machine Learning Research 5, 101–141 (2004)

    MATH  MathSciNet  Google Scholar 

  25. Shafer, J.C., Agrawal, R., Mehta, M.: Sprint: A scalable parallel classifier for data mining. In: Proceedings of the 22th International Conference on Very Large Data Bases, VLDB 1996, pp. 544–555. Morgan Kaufmann Publishers Inc., San Francisco (1996)

    Google Scholar 

  26. Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming 127(1), 3–30 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  27. Su, Y.C., Chiu, T.H., Wu, G.L., Yeh, C.Y., Wu, F., Hsu, W.: Flickr-tag prediction using multi-modal fusion and meta information. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 353–356. ACM (2013)

    Google Scholar 

  28. Tong, H.: Big data classification. Data Classification: Algorithms and Applications, p. 275 (2014)

    Google Scholar 

  29. Yu, H.F., Hsieh, C.J., Chang, K.W., Lin, C.J.: Large linear classification when data cannot fit in memory. ACM Transactions on Knowledge Discovery from Data (TKDD) 5(4), 23 (2012)

    Article  Google Scholar 

  30. Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification. Proceedings of the IEEE 100(9), 2584–2603 (2012)

    Article  Google Scholar 

  31. Zhang, K., Lan, L., Wang, Z., Moerchen, F.: Scaling up kernel svm on limited resources: A low-rank linearization approach. In: International Conference on Artificial Intelligence and Statistics, pp. 1425–1434 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minglun Gong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Qian, Y., Yuan, H., Gong, M. (2015). Budget-Driven Big Data Classification. In: Barbosa, D., Milios, E. (eds) Advances in Artificial Intelligence. Canadian AI 2015. Lecture Notes in Computer Science(), vol 9091. Springer, Cham. https://doi.org/10.1007/978-3-319-18356-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18356-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18355-8

  • Online ISBN: 978-3-319-18356-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics