Abstract
Nowadays, dictionary learning has become an important tool in many classification tasks, especially for images. The tailor-made atoms in a dictionary are trained for the reconstruction of the test sample. In the classification, atoms are associated with different classes from several subspaces such that the test sample is labeled according to the distances of each subspace. However, it is hard to fix the number of atoms to obtain the optimal result for each scenario since the optimal subspaces required are different. To improve the classification performance as well as the robustness, we proposed subspace-level dictionary fusion (SLDF) to construct a dictionary-based classifier. A full-size dictionary and a locality-constrained dictionary are constructed in parallel. Then, the reconstruction coefficients of the two dictionaries are obtained, which leads to a pair of distances between the test sample and the subspaces. Finally, a decision is made according to the pair-wise fusion of the distances. The experimental results on multimedia datasets from distinct categories such as image, text, and audio show that the proposed method outperforms other state-of-the-art dictionary-based classification methods with accuracies of 99.74% (image), 83.96% (Text), and 87.07% (Audio).
Similar content being viewed by others
References
Aharon M, Elad M, Bruckstein A (2006) K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on signal processing 54(11):4311–4322
Aharon M, Elad M, Bruckstein AM, Katz Y (2005) K-svd : An algorithm for designing of overcomplete dictionaries for sparse representation
Akhtar N, Shafait F, Mian A (2017) Efficient classification with sparsity augmented collaborative representation. Pattern Recogn 65:136–145
Atawneh S, Almomani A, Al Bazar H, Sumari P, Gupta B (2017) Secure and imperceptible digital image steganographic algorithm based on diamond encoding in dwt domain. Multimedia tools and applications 76(18):18451–18472
Benavente AMMR (1998) The ar face database. Tech. Rep. 24, The Ohio State University
Boyd S, Parikh N, Chu E (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc
Cai S, Zhang L, Zuo W, Feng X (2016) A probabilistic collaborative representation based approach for pattern classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2950–2959
Choi K, Fazekas G, Sandler M, Cho K (2017) Convolutional recurrent neural networks for music classification. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 2392–2396. IEEE
Cortes C, Vapnik V (1995) Support-vector networks. Machine learning 20(3):273–297
Cunningham P, Delany SJ (2007) k-nearest neighbour classifiers. Multiple Classifier Systems 34(8):1–17
Deterding DH (1989) Speaker normalisation for automatic speech recognition
do Campo SB (2006) Fei face database. https://fei.edu.br/cet/facedatabase.html
Dorgham O, Al-Rahamneh B, Almomani A, Khatatneh KF et al (2018) Enhancing the security of exchanging and storing dicom medical images on the cloud. International Journal of Cloud Applications and Computing (IJCAC) 8 (1):154–172
Forina M (1991) Wine data set. https://archive.ics.uci.edu/ml/datasets/Wine
Gangeh MJ, Farahat AK, Ghodsi A, Kamel MS (2015) Supervised dictionary learning and sparse representation-a review. ArXiv:1502.05928
Goléa NE-H, Melkemi KE (2019) Roi-based fragile watermarking for medical image tamper detection. International Journal of High Performance Computing and Networking 13(2):199–210
Gupta BB (2020) An efficient kp design framework of attribute-based searchable encryption for user level revocation in cloud. Concurrency and Computation: Practice and Experience 32(18):e5291
Gupta B, Agrawal DP, Yamaguchi S (2016) Handbook of research on modern cryptographic solutions for computer and cyber security. IGI global, Pennsylvania
Jeong D, Kim B-G, Dong S-Y (2020) Deep joint spatiotemporal network (djstn) for efficient facial expression recognition. Sensors 20(7):1936
Kim J-H, Kim B-G, Roy PP, Jeong D-M (2019) Efficient facial expression recognition algorithm based on hierarchical deep neural network structure. IEEE Access 7:41273–41285
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS
Kumar A (2019) Design of secure image fusion technique using cloud for privacy-preserving and copyright protection. International Journal of Cloud Applications and Computing (IJCAC) 9(3):22–36
Kumar S, Gahalawat M, Roy PP, Dogra DP, Kim B-G (2020) Exploring impact of age and gender on sentiment analysis using machine learning. Electronics 9(2):374
Lan Z-Z, Bao L, Yu S-I, Liu W, Hauptmann AG (2013) Multimedia classification and event detection using double fusion. Multimedia Tools and Applications 71:333–347
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition
Li Z, Lai Z, Xu Y, Yang J, Zhang D (2015) A locality-constrained and label embedding dictionary learning algorithm for image classification. IEEE transactions on neural networks and learning systems 28(2):278–293
Mairal J, Bach FR, Ponce J, Sapiro G, Zisserman A (2008) Supervised dictionary learning. In: NIPS
Merz CLBCJ (1998) Uci repository of machine learning databases. Tech. Rep. 24, University of California
Milborrow S, Morkel J, Nicolls F (2010) The MUCT landmarked face database. Pattern Recognition Association of South Africa. http://www.milbo.org/muct
Mu Y, Zhou Z (2019) Visual vocabulary tree-based partial-duplicate image retrieval for coverless image steganography. International Journal of High Performance Computing and Networking 14(3):333–341
Olivetti (1994) Orl face database. http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
Pati YC, Rezaiifar R, Krishnaprasad PS (1993) Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. Proceedings of 27th Asilomar Conference on Signals, Systems and Computers 1:40–44
Pham D-S, Venkatesh S (2008) Joint learning and dictionary construction for pattern recognition. 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Pouyanfar S, Chen S-C, Shyu M-L (2017) An efficient deep residual-inception network for multimedia classification. 2017 IEEE International Conference on Multimedia and Expo (ICME), pp 373–378
Salamon J, Bello JP (2017) Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters 24:279–283
Shamsolmoali P, Jain DK, Zareapoor M, Yang J, Alam MA (2018) High-dimensional multimedia classification using deep cnn and extended residual units. Multimedia Tools and Applications, pp 1–16
Su Y, Shan S, Chen X, Gao W (2009) Hierarchical ensemble of global and local classifiers for face recognition. IEEE Transactions on image processing 18 (8):1885–1896
Wang H, Li Z, Li Y, Gupta BB, Choi C (2020) Visual saliency guided complex image retrieval. Pattern Recogn Lett 130:64–72
Wang L, Li L, Li J, Li J, Gupta BB, Liu X (2018) Compressive sensing of medical images with confidentially homomorphic aggregations. IEEE Internet Things J 6(2):1402–1409
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2008) Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence 31(2):210–227
Xu J, An W, Zhang L, Zhang D (2019) Sparse, collaborative, or nonnegative representation: which helps pattern classification?. Pattern Recogn 88:679–688
Xu Y, Li Z, Yang J, Zhang D (2017) A survey of dictionary learning algorithms for face recognition. IEEE Access 5:8502–8514
Xu Y, Zhang D, Yang J, Yang J-Y (2011) A two-phase test sample sparse representation method for use with face recognition. IEEE Transactions on Circuits and Systems for Video Technology 21(9):1255–1262
Zeng S, Yang X, Gou J (2017) Multiplication fusion of sparse and collaborative representation for robust face recognition. Multimedia Tools and Applications 76(20):20889–20907
Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: Which helps face recognition?. In: 2011 International conference on computer vision, pp 471–478. IEEE
Zhang L, Zhang L, Zhang D, Zhu H (2011) Ensemble of local and global information for finger–knuckle-print recognition. Pattern recognition 44 (9):1990–1998
Zhang Q, Li B (2010) Discriminative k-svd for dictionary learning in face recognition. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2691–2698
Zhou J, Zeng S, Zhang B (2019) Two-stage image classification supervised by a single teacher single student model. In: 30th British machine vision conference
Zhou J, Zhang B (2019) Collaborative representation using non-negative samples for image classification. Sensors 19(11):2609
Acknowledgements
This work was supported by the University of Macau (File no. MYRG2018-00053-FST).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhou, J., Zeng, S. & Zhang, B. Subspace-level dictionary fusion for robust multimedia classification. Multimed Tools Appl 80, 21885–21898 (2021). https://doi.org/10.1007/s11042-021-10661-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10661-1