Abstract
A practical large-scale data classification approach is presented in this paper. By exploiting online learning framework, our approach learns a set of competing one-class Support Vector Machine models, one for each data class. The presented approach enjoys three budget-driven features: 1) it is capable of handling classification when data cannot fit in memory; 2) both training and labeling process is user controllable; 3) the classifiers can easily adapt to changes in dynamic data with minimal computational cost. Compared with the most popular big data classification tool, LibLinear, our approach is shown to be competent at processing extreme large data, while consuming a fractional of memory and time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yahoo! large-scale flickr-tag image classification grand challenge. http://acmmm13.org/submissions/call-for-multimedia-grand-challenge-solutions/yahoo-large-scale-flickr-tag-image-classification-challenge/
Yahoo! webscope dataset ydata-flickr-ten-tag-images-v1_0. http://webscope.sandbox.yahoo.com/catalog.php?datatype=i
Barla, A., Odone, F., Verri, A.: Histogram intersection kernel for image classification. In: Proceedings of the 2003 International Conference on Image Processing, ICIP 2003, vol. 3, pp. III–513. IEEE (2003)
Blondel, M., Seki, K., Uehara, K.: Block coordinate descent algorithms for large-scale sparse multiclass classification. Machine Learning 93(1), 31–52 (2013)
Boullé, M.: A parameter-free classification method for large scale learning. The Journal of Machine Learning Research 10, 1367–1385 (2009)
Chang, K.W., Roth, D.: Selective block minimization for faster convergence of limited memory large-scale linear models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 699–707. ACM (2011)
Cheng, L., Gong, M., Schuurmans, D., Caelli, T.: Real-time discriminative background subtraction. IEEE Transactions on Image Processing 20(5), 1401–1414 (2011)
Chu, C., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. Advances in Neural Information Processing Systems 19, 281 (2007)
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research 2, 265–292 (2002)
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. arXiv preprint cs/9501101 (1995)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: A library for large linear classification. The Journal of Machine Learning Research 9, 1871–1874 (2008)
Gao, T., Koller, D.: Multiclass boosting with hinge loss based on output coding. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 569–576 (2011)
Ghoting, A., Krishnamurthy, R., Pednault, E., Reinwald, B., Sindhwani, V., Tatikonda, S., Tian, Y., Vaithyanathan, S.: Systemml: Declarative machine learning on mapreduce. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 231–242. IEEE (2011)
Gong, M., Qian, Y., Cheng, L.: Integrated foreground segmentation and boundary matting for live videos. IEEE Trans., Image Processing (TIP) (2015)
Gong, M., Cheng, L.: Foreground segmentation of live videos using locally competing 1svms. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2105–2112. IEEE (2011)
Graf, H.P., Cosatto, E., Bottou, L., Dourdanovic, I., Vapnik, V.: Parallel support vector machines: The cascade svm. In: Advances in Neural Information Processing Systems, pp. 521–528 (2004)
Joachims, T.: Training linear svms in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217–226. ACM (2006)
Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. In: Advances in Neural Information Processing Systems, pp. 905–912 (2009)
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108(2), 212–261 (1994)
Mantziou, E., Papadopoulos, S., Kompatsiaris, Y.: Scalable training with approximate incremental laplacian eigenmaps and pca. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 381–384. ACM (2013)
Nie, F., Huang, Y., Wang, X., Huang, H.: New primal svm solver with linear computational cost for big data classifications. In: Proceedings of the 31st International Conference on Machine Learning (ICML) (2014)
Qian, Y., Gong, M., Cheng, L.: Stocs: An efficient self-tuning multiclass classification approach. In: Advances in Artificial Intelligence - 28th Canadian Conference on Artificial Intelligence, Canadian AI 2015. Springer (2015)
Rai, P., Daumé III, H., Venkatasubramanian, S.: Streamed learning: One-pass svms. In: IJCAI, vol. 9, 1211–1216 (2009)
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. The Journal of Machine Learning Research 5, 101–141 (2004)
Shafer, J.C., Agrawal, R., Mehta, M.: Sprint: A scalable parallel classifier for data mining. In: Proceedings of the 22th International Conference on Very Large Data Bases, VLDB 1996, pp. 544–555. Morgan Kaufmann Publishers Inc., San Francisco (1996)
Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: Primal estimated sub-gradient solver for svm. Mathematical Programming 127(1), 3–30 (2011)
Su, Y.C., Chiu, T.H., Wu, G.L., Yeh, C.Y., Wu, F., Hsu, W.: Flickr-tag prediction using multi-modal fusion and meta information. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 353–356. ACM (2013)
Tong, H.: Big data classification. Data Classification: Algorithms and Applications, p. 275 (2014)
Yu, H.F., Hsieh, C.J., Chang, K.W., Lin, C.J.: Large linear classification when data cannot fit in memory. ACM Transactions on Knowledge Discovery from Data (TKDD) 5(4), 23 (2012)
Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification. Proceedings of the IEEE 100(9), 2584–2603 (2012)
Zhang, K., Lan, L., Wang, Z., Moerchen, F.: Scaling up kernel svm on limited resources: A low-rank linearization approach. In: International Conference on Artificial Intelligence and Statistics, pp. 1425–1434 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Qian, Y., Yuan, H., Gong, M. (2015). Budget-Driven Big Data Classification. In: Barbosa, D., Milios, E. (eds) Advances in Artificial Intelligence. Canadian AI 2015. Lecture Notes in Computer Science(), vol 9091. Springer, Cham. https://doi.org/10.1007/978-3-319-18356-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-18356-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18355-8
Online ISBN: 978-3-319-18356-5
eBook Packages: Computer ScienceComputer Science (R0)