Budget-Driven Big Data Classification

Qian, Yiming; Yuan, Hao; Gong, Minglun

doi:10.1007/978-3-319-18356-5_7

Yiming Qian⁶,
Hao Yuan⁶ &
Minglun Gong⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9091))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

2696 Accesses
3 Citations

Abstract

A practical large-scale data classification approach is presented in this paper. By exploiting online learning framework, our approach learns a set of competing one-class Support Vector Machine models, one for each data class. The presented approach enjoys three budget-driven features: 1) it is capable of handling classification when data cannot fit in memory; 2) both training and labeling process is user controllable; 3) the classifiers can easily adapt to changes in dynamic data with minimal computational cost. Compared with the most popular big data classification tool, LibLinear, our approach is shown to be competent at processing extreme large data, while consuming a fractional of memory and time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yahoo! large-scale flickr-tag image classification grand challenge. http://acmmm13.org/submissions/call-for-multimedia-grand-challenge-solutions/yahoo-large-scale-flickr-tag-image-classification-challenge/
Yahoo! webscope dataset ydata-flickr-ten-tag-images-v1_0. http://webscope.sandbox.yahoo.com/catalog.php?datatype=i
Barla, A., Odone, F., Verri, A.: Histogram intersection kernel for image classification. In: Proceedings of the 2003 International Conference on Image Processing, ICIP 2003, vol. 3, pp. III–513. IEEE (2003)
Google Scholar
Blondel, M., Seki, K., Uehara, K.: Block coordinate descent algorithms for large-scale sparse multiclass classification. Machine Learning 93(1), 31–52 (2013)
Article MATH MathSciNet Google Scholar
Boullé, M.: A parameter-free classification method for large scale learning. The Journal of Machine Learning Research 10, 1367–1385 (2009)
MATH Google Scholar
Chang, K.W., Roth, D.: Selective block minimization for faster convergence of limited memory large-scale linear models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 699–707. ACM (2011)
Google Scholar
Cheng, L., Gong, M., Schuurmans, D., Caelli, T.: Real-time discriminative background subtraction. IEEE Transactions on Image Processing 20(5), 1401–1414 (2011)
Article MathSciNet Google Scholar
Chu, C., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. Advances in Neural Information Processing Systems 19, 281 (2007)
Google Scholar
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research 2, 265–292 (2002)
MATH Google Scholar
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. arXiv preprint cs/9501101 (1995)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. The Journal of Machine Learning Research 9, 1871–1874 (2008)
MATH Google Scholar
Gao, T., Koller, D.: Multiclass boosting with hinge loss based on output coding. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 569–576 (2011)
Google Scholar
Ghoting, A., Krishnamurthy, R., Pednault, E., Reinwald, B., Sindhwani, V., Tatikonda, S., Tian, Y., Vaithyanathan, S.: Systemml: Declarative machine learning on mapreduce. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 231–242. IEEE (2011)
Google Scholar
Gong, M., Qian, Y., Cheng, L.: Integrated foreground segmentation and boundary matting for live videos. IEEE Trans., Image Processing (TIP) (2015)
Google Scholar
Gong, M., Cheng, L.: Foreground segmentation of live videos using locally competing 1svms. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2105–2112. IEEE (2011)
Google Scholar
Graf, H.P., Cosatto, E., Bottou, L., Dourdanovic, I., Vapnik, V.: Parallel support vector machines: The cascade svm. In: Advances in Neural Information Processing Systems, pp. 521–528 (2004)
Google Scholar
Joachims, T.: Training linear svms in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–226. ACM (2006)
Google Scholar
Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. In: Advances in Neural Information Processing Systems, pp. 905–912 (2009)
Google Scholar
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108(2), 212–261 (1994)
Article MATH MathSciNet Google Scholar
Mantziou, E., Papadopoulos, S., Kompatsiaris, Y.: Scalable training with approximate incremental laplacian eigenmaps and pca. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 381–384. ACM (2013)
Google Scholar
Nie, F., Huang, Y., Wang, X., Huang, H.: New primal svm solver with linear computational cost for big data classifications. In: Proceedings of the 31st International Conference on Machine Learning (ICML) (2014)
Google Scholar
Qian, Y., Gong, M., Cheng, L.: Stocs: An efficient self-tuning multiclass classification approach. In: Advances in Artificial Intelligence - 28th Canadian Conference on Artificial Intelligence, Canadian AI 2015. Springer (2015)
Google Scholar
Rai, P., Daumé III, H., Venkatasubramanian, S.: Streamed learning: One-pass svms. In: IJCAI, vol. 9, 1211–1216 (2009)
Google Scholar
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. The Journal of Machine Learning Research 5, 101–141 (2004)
MATH MathSciNet Google Scholar
Shafer, J.C., Agrawal, R., Mehta, M.: Sprint: A scalable parallel classifier for data mining. In: Proceedings of the 22th International Conference on Very Large Data Bases, VLDB 1996, pp. 544–555. Morgan Kaufmann Publishers Inc., San Francisco (1996)
Google Scholar
Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming 127(1), 3–30 (2011)
Article MATH MathSciNet Google Scholar
Su, Y.C., Chiu, T.H., Wu, G.L., Yeh, C.Y., Wu, F., Hsu, W.: Flickr-tag prediction using multi-modal fusion and meta information. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 353–356. ACM (2013)
Google Scholar
Tong, H.: Big data classification. Data Classification: Algorithms and Applications, p. 275 (2014)
Google Scholar
Yu, H.F., Hsieh, C.J., Chang, K.W., Lin, C.J.: Large linear classification when data cannot fit in memory. ACM Transactions on Knowledge Discovery from Data (TKDD) 5(4), 23 (2012)
Article Google Scholar
Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification. Proceedings of the IEEE 100(9), 2584–2603 (2012)
Article Google Scholar
Zhang, K., Lan, L., Wang, Z., Moerchen, F.: Scaling up kernel svm on limited resources: A low-rank linearization approach. In: International Conference on Artificial Intelligence and Statistics, pp. 1425–1434 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Memorial University of Newfoundland, St. John’s, Newfoundland, NL, A1B 3X5, Canada
Yiming Qian, Hao Yuan & Minglun Gong

Authors

Yiming Qian
View author publications
You can also search for this author in PubMed Google Scholar
Hao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Minglun Gong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minglun Gong .

Editor information

Editors and Affiliations

University of Alberta, Edmonton, Canada
Denilson Barbosa
Dalhousie University, Halifax, Canada
Evangelos Milios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qian, Y., Yuan, H., Gong, M. (2015). Budget-Driven Big Data Classification. In: Barbosa, D., Milios, E. (eds) Advances in Artificial Intelligence. Canadian AI 2015. Lecture Notes in Computer Science(), vol 9091. Springer, Cham. https://doi.org/10.1007/978-3-319-18356-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-18356-5_7
Published: 29 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18355-8
Online ISBN: 978-3-319-18356-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics