Abstract
Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them, Support Vector Machines (SVMs) are used extensively due to their generalization properties. SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. Notably, SVM training is a computationally intensive process especially when the training dataset is large. This paper presents a resource aware parallel multiclass SVM algorithm (named RAMSMO) for large-scale image annotation which partitions the training dataset into smaller binary chunks and optimizes SVM training in parallel using a cluster of computers. A genetic algorithm-based load balancing scheme is designed to optimize the performance of RAMSMO in balancing the computation of multiclass data chunks in heterogeneous computing environments. RAMSMO is evaluated in both experimental and simulation environments, and the results show that it reduces the training time significantly while maintaining a high level of accuracy in classifications.
Similar content being viewed by others
References
Smeulders A, Worring M, Santini S, Gupta A, Jain R (2000) Content based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Tsai C, Hung C (2008) Automatically annotating images with keywords: a review of image annotation. Recent Patents Comput Sci 1:55–68
Wong W, Hsu S (2006) Application of SVM and ANN for image retrieval. Eur J Oper Res 173(3):938–950
Gao Y, Fan J (2005) Semantic image classification with hierarchical feature subset selection. In: Proceedings of the ACM multimedia workshop on multimedia information retrieval, pp 135–142
Boutell M, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recogn 37(9):1757–1771
Chen Y, Wang JZ (2004) Image categorization by learning and reasoning with regions. J Mach Learn Res 5:913–939
Cusano C, Ciocca G, Schettini R (2004) Image annotation using SVM. In: Proceedings of the SPIE conference on internet imaging, pp 330–338
Fan J, Gao Y, Luo H, Xu G (2004) Automatic image annotation by using concept-sensitive salient objects for image content representation. In: Proceedings of the 27th annual international conference on research and development in information retrieval (SIGIR), pp 361–368
Le Saux B, Amato G (2004) Image recognition for digital libraries. In: Proceedings of the ACM multimedia workshop on multimedia information retrieval (MIR), pp 91–98
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge, pp 1–172
Waring C, Liu X (2005) Face detection using spectral histograms and SVMs. IEEE Trans Syst Man Cybern B Cybern 35(3):467–476
Colas F, Brazdil P (2006) Comparison of SVM and some older classification algorithms in text classification tasks. In: Proceedings of IFIP-AI world computer congress, pp 169–178
Do T, Nguyen V, Poulet F (2008) Speed up SVM algorithm for massive classification tasks. In: Proceedings of the 4th international conference on advanced data mining and applications (ADMA), pp 147–157
Khalid Alham N, Li M, Hammoud S, Qi H (2009) Evaluating machine learning techniques for automatic image annotations. In: Proceedings of the 6th international conference on fuzzy systems and knowledge discovery (FSKD), pp 245–249
Guo J, Takahashi N, Nishi T (2006) An efficient method for simplifying decision functions of support vector machines. IEICE Trans 89-A(10):2795–2802
Duan K, Keerthi S (2005) Which is the best multiclass SVM method? an empirical study. In: Proceedings of the 6th international workshop on multiple classifier systems, pp 278–285
Duda R, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York
Knerr S, Personnaz L, Dreyfus G (1990) Single-layer learning revisited: a stepwise procedure for building and training a neural network. In: Fogelman Soulié F, Hérault J (eds) Neurocomputing: algorithms, architectures and applications, vol F68 of NATO ASI Series, Springer, pp 41–50
Platt J, Cristanini N, Shawe-Taylor J (1999) Large margin DAGs for multiclass classification. In: Proceedings of neural information processing systems (NIP), MIT Press, pp 547–553
Chih-Wei H, Chih-Jen L (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Herrero-Lopez S, Williams J, Sanchez v (2010) Parallel multiclass classification using SVMs on GPUs. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units (GPGPU), pp 2–11
Cao L, Keerthi S, Ong C-J , Zhang J, Periyathamby U, Fu XJ, Lee H (2006) Parallel sequential minimal optimization for the training of support vector machines. IEEE Trans Neural Netw 17(4):1039–1049
Munoz-Mari J, Plaza A, Gualtieri JA, Camps-Valls G (2009) Parallel implementations of SVM for earth observation. In: Xhafa F (ed) Parallel programming, models and applications in grid and P2P systems, pp 292–312
Do T, Poulet F (2006) Classifying one billion data with a new distributed SVM algorithm. In: Proceedings of the international conference on research, innovation and vision for the future (RIVF), pp 59–66
Cao LJ, Keerthi SS, Ong CJ, Uvaraj P, Fu XJ, Lee HP (2006) Developing parallel sequential minimal optimization for fast training support vector machine. Neurocomputing 70(1–3):93–104
Zhang C, Li P, Rajendran A, Deng Y (2006) Parallel multicategory support vector machines (PMC-SVM) for classifying microcarray data. In: Proceedings of the 1st international multi-symposiums on computer and computational sciences (IMSCCS), pp 110–115, IEEE CS
Platt JC (1998) Sequential minimal optimization: a fast algorithm for training support vector machines. Technical Report, MSR-TR-98-14, Microsoft research
Lämmel R (2008) Google’s MapReduce programming model—revisited. Sci Comput Program 70(1):1–30
Apache Hadoop [Online]: http://hadoop.apache.org/ (Last accessed: 3 April 2011)
Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292
Keerthi S, Sundararajan S, Chang K, Hsieh C, Lin C (2008) A sequential dual method for large scale multi-class linear SVMs. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 408–416
Dietterich T, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. Artif Intell Res 2:263–286
Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437
Medusa [Online]: http://www.lsc-group.phys.uwm.edu/beowulf/medusa/index.html (Last accessed: 3 April 2011)
Lu BL, Wang KA, Utiyama M, Isahara H (2004) A part-versus-part method for massively parallel training of support vector machines. In: Proceedings of international joint conference on neural networks (IJCNN), pp 735–740, IEEE CS
Keerthi SS, Shevade SK, Bhattacharyya C, Murthy KRK (2001) Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Comput 13(3):637–649
Hastie T, Tibshirani R (1997) Classification by pairwise coupling. In: Proceedings of the conference on advances in neural information processing systems
Roth V, Tsuda K (2001) Pairwise coupling for machine recognition of hand-printed Japanese characters. In: Proceedings of the international conference on computer vision and pattern recognition (CVPR), pp 1120–1125
Ghemawat S, Gobioff H, Leung S (2003) The google file system. In: Proceedings of the 19th ACM symposium on operating systems principles (SOSP), pp 29–43
Weka 3 [Online]: http://www.cs.waikato.ac.nz/ml/weka/ (Last accessed: 3 April 2011)
Sikora T (2001) The MPEG-7 visual standard for content description-an overview. IEEE Trans Circuits Syst Video Technol 11(6):696–702
Corel Image Databases [Online]: http:// www.corel.com (Last accessed: 3 April 2011)
Lire, an Open Source Java content based image retrieval library [Online]: http://www.semanticmetadata.net/lire/ (Last accessed: 3 April 2011)
Alham NK, Li M, Hammoud S, Liu Y, Ponraj M (2010) A distributed SVM for image annotation. In: Proceedings of the 6th international conference on fuzzy systems and knowledge discovery (FSKD), pp 2983–2987
Liu Y, Li M, Alham NK, Hammoud S (2011) HSim: a MapReduce simulator in enabling cloud computing. Futur Gener Comput Syst. Published online at http://dx.doi.org/10.1016/j.future.2011.05.007
Ibarra OH, Kim CE (1977) Heuristic algorithms for scheduling independent tasks on non-identical processors. J Assoc Comput Mach (JACM) 24(2):280–289
Chen J, Wang C, Wang R (2009) Adaptive binary tree for fast SVM multiclass classification. Neurocomputing 72:3370–3375
Manikandan J, Venkataramani B (2010) Study and evaluation of a multi-class SVM classifier using diminishing learning technique. Neurocomputing 73:1676–1685
Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving MapReduce performance in heterogeneous environments, (OSDI) 2008, pp 29–42
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alham, N.K., Li, M. & Liu, Y. Parallelizing multiclass support vector machines for scalable image annotation. Neural Comput & Applic 24, 367–381 (2014). https://doi.org/10.1007/s00521-012-1237-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-012-1237-2