Skip to main content
Log in

Subspace-level dictionary fusion for robust multimedia classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Nowadays, dictionary learning has become an important tool in many classification tasks, especially for images. The tailor-made atoms in a dictionary are trained for the reconstruction of the test sample. In the classification, atoms are associated with different classes from several subspaces such that the test sample is labeled according to the distances of each subspace. However, it is hard to fix the number of atoms to obtain the optimal result for each scenario since the optimal subspaces required are different. To improve the classification performance as well as the robustness, we proposed subspace-level dictionary fusion (SLDF) to construct a dictionary-based classifier. A full-size dictionary and a locality-constrained dictionary are constructed in parallel. Then, the reconstruction coefficients of the two dictionaries are obtained, which leads to a pair of distances between the test sample and the subspaces. Finally, a decision is made according to the pair-wise fusion of the distances. The experimental results on multimedia datasets from distinct categories such as image, text, and audio show that the proposed method outperforms other state-of-the-art dictionary-based classification methods with accuracies of 99.74% (image), 83.96% (Text), and 87.07% (Audio).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Aharon M, Elad M, Bruckstein A (2006) K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on signal processing 54(11):4311–4322

    Article  Google Scholar 

  2. Aharon M, Elad M, Bruckstein AM, Katz Y (2005) K-svd : An algorithm for designing of overcomplete dictionaries for sparse representation

  3. Akhtar N, Shafait F, Mian A (2017) Efficient classification with sparsity augmented collaborative representation. Pattern Recogn 65:136–145

    Article  Google Scholar 

  4. Atawneh S, Almomani A, Al Bazar H, Sumari P, Gupta B (2017) Secure and imperceptible digital image steganographic algorithm based on diamond encoding in dwt domain. Multimedia tools and applications 76(18):18451–18472

    Article  Google Scholar 

  5. Benavente AMMR (1998) The ar face database. Tech. Rep. 24, The Ohio State University

  6. Boyd S, Parikh N, Chu E (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc

  7. Cai S, Zhang L, Zuo W, Feng X (2016) A probabilistic collaborative representation based approach for pattern classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2950–2959

  8. Choi K, Fazekas G, Sandler M, Cho K (2017) Convolutional recurrent neural networks for music classification. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 2392–2396. IEEE

  9. Cortes C, Vapnik V (1995) Support-vector networks. Machine learning 20(3):273–297

    MATH  Google Scholar 

  10. Cunningham P, Delany SJ (2007) k-nearest neighbour classifiers. Multiple Classifier Systems 34(8):1–17

    Google Scholar 

  11. Deterding DH (1989) Speaker normalisation for automatic speech recognition

  12. do Campo SB (2006) Fei face database. https://fei.edu.br/cet/facedatabase.html

  13. Dorgham O, Al-Rahamneh B, Almomani A, Khatatneh KF et al (2018) Enhancing the security of exchanging and storing dicom medical images on the cloud. International Journal of Cloud Applications and Computing (IJCAC) 8 (1):154–172

    Article  Google Scholar 

  14. Forina M (1991) Wine data set. https://archive.ics.uci.edu/ml/datasets/Wine

  15. Gangeh MJ, Farahat AK, Ghodsi A, Kamel MS (2015) Supervised dictionary learning and sparse representation-a review. ArXiv:1502.05928

  16. Goléa NE-H, Melkemi KE (2019) Roi-based fragile watermarking for medical image tamper detection. International Journal of High Performance Computing and Networking 13(2):199–210

    Article  Google Scholar 

  17. Gupta BB (2020) An efficient kp design framework of attribute-based searchable encryption for user level revocation in cloud. Concurrency and Computation: Practice and Experience 32(18):e5291

    Google Scholar 

  18. Gupta B, Agrawal DP, Yamaguchi S (2016) Handbook of research on modern cryptographic solutions for computer and cyber security. IGI global, Pennsylvania

    Book  Google Scholar 

  19. Jeong D, Kim B-G, Dong S-Y (2020) Deep joint spatiotemporal network (djstn) for efficient facial expression recognition. Sensors 20(7):1936

    Article  Google Scholar 

  20. Kim J-H, Kim B-G, Roy PP, Jeong D-M (2019) Efficient facial expression recognition algorithm based on hierarchical deep neural network structure. IEEE Access 7:41273–41285

    Article  Google Scholar 

  21. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS

  22. Kumar A (2019) Design of secure image fusion technique using cloud for privacy-preserving and copyright protection. International Journal of Cloud Applications and Computing (IJCAC) 9(3):22–36

    Article  Google Scholar 

  23. Kumar S, Gahalawat M, Roy PP, Dogra DP, Kim B-G (2020) Exploring impact of age and gender on sentiment analysis using machine learning. Electronics 9(2):374

    Article  Google Scholar 

  24. Lan Z-Z, Bao L, Yu S-I, Liu W, Hauptmann AG (2013) Multimedia classification and event detection using double fusion. Multimedia Tools and Applications 71:333–347

    Article  Google Scholar 

  25. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition

  26. Li Z, Lai Z, Xu Y, Yang J, Zhang D (2015) A locality-constrained and label embedding dictionary learning algorithm for image classification. IEEE transactions on neural networks and learning systems 28(2):278–293

    Article  MathSciNet  Google Scholar 

  27. Mairal J, Bach FR, Ponce J, Sapiro G, Zisserman A (2008) Supervised dictionary learning. In: NIPS

  28. Merz CLBCJ (1998) Uci repository of machine learning databases. Tech. Rep. 24, University of California

  29. Milborrow S, Morkel J, Nicolls F (2010) The MUCT landmarked face database. Pattern Recognition Association of South Africa. http://www.milbo.org/muct

  30. Mu Y, Zhou Z (2019) Visual vocabulary tree-based partial-duplicate image retrieval for coverless image steganography. International Journal of High Performance Computing and Networking 14(3):333–341

    Article  Google Scholar 

  31. Olivetti (1994) Orl face database. http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

  32. Pati YC, Rezaiifar R, Krishnaprasad PS (1993) Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. Proceedings of 27th Asilomar Conference on Signals, Systems and Computers 1:40–44

    Article  Google Scholar 

  33. Pham D-S, Venkatesh S (2008) Joint learning and dictionary construction for pattern recognition. 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  34. Pouyanfar S, Chen S-C, Shyu M-L (2017) An efficient deep residual-inception network for multimedia classification. 2017 IEEE International Conference on Multimedia and Expo (ICME), pp 373–378

  35. Salamon J, Bello JP (2017) Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters 24:279–283

    Article  Google Scholar 

  36. Shamsolmoali P, Jain DK, Zareapoor M, Yang J, Alam MA (2018) High-dimensional multimedia classification using deep cnn and extended residual units. Multimedia Tools and Applications, pp 1–16

  37. Su Y, Shan S, Chen X, Gao W (2009) Hierarchical ensemble of global and local classifiers for face recognition. IEEE Transactions on image processing 18 (8):1885–1896

    Article  MathSciNet  Google Scholar 

  38. Wang H, Li Z, Li Y, Gupta BB, Choi C (2020) Visual saliency guided complex image retrieval. Pattern Recogn Lett 130:64–72

    Article  Google Scholar 

  39. Wang L, Li L, Li J, Li J, Gupta BB, Liu X (2018) Compressive sensing of medical images with confidentially homomorphic aggregations. IEEE Internet Things J 6(2):1402–1409

    Article  Google Scholar 

  40. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2008) Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence 31(2):210–227

    Article  Google Scholar 

  41. Xu J, An W, Zhang L, Zhang D (2019) Sparse, collaborative, or nonnegative representation: which helps pattern classification?. Pattern Recogn 88:679–688

    Article  Google Scholar 

  42. Xu Y, Li Z, Yang J, Zhang D (2017) A survey of dictionary learning algorithms for face recognition. IEEE Access 5:8502–8514

    Article  Google Scholar 

  43. Xu Y, Zhang D, Yang J, Yang J-Y (2011) A two-phase test sample sparse representation method for use with face recognition. IEEE Transactions on Circuits and Systems for Video Technology 21(9):1255–1262

    Article  MathSciNet  Google Scholar 

  44. Zeng S, Yang X, Gou J (2017) Multiplication fusion of sparse and collaborative representation for robust face recognition. Multimedia Tools and Applications 76(20):20889–20907

    Article  Google Scholar 

  45. Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: Which helps face recognition?. In: 2011 International conference on computer vision, pp 471–478. IEEE

  46. Zhang L, Zhang L, Zhang D, Zhu H (2011) Ensemble of local and global information for finger–knuckle-print recognition. Pattern recognition 44 (9):1990–1998

    Article  Google Scholar 

  47. Zhang Q, Li B (2010) Discriminative k-svd for dictionary learning in face recognition. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2691–2698

  48. Zhou J, Zeng S, Zhang B (2019) Two-stage image classification supervised by a single teacher single student model. In: 30th British machine vision conference

  49. Zhou J, Zhang B (2019) Collaborative representation using non-negative samples for image classification. Sensors 19(11):2609

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the University of Macau (File no. MYRG2018-00053-FST).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bob Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, J., Zeng, S. & Zhang, B. Subspace-level dictionary fusion for robust multimedia classification. Multimed Tools Appl 80, 21885–21898 (2021). https://doi.org/10.1007/s11042-021-10661-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10661-1

Keywords

Navigation