Abstract
Processing multimedia data has emerged as a key area for the application of machine learning methods Building a robust classification model to use in high dimensional space requires the combination of a deep feature extractor and a powerful classifier. We present a new classification pipeline to facilitate multimedia data analysis based on convolutional neural network and the modified residual network which can integrate with the other feedforward network style in an endwise training fashion. The proposed residual network is producing attention-aware features. We proposed a unified deep CNN model to achieve promising performance in classifying high dimensional multimedia data by getting the advantages of the residual network. In every residual module, up-down and vice-versa feedforward structure is implemented to unfold the feedforward and backward process into a unique process. The hybrid proposed model evaluated on four datasets and have been shown promising results which outperform the previous best results. Last but not the least, the proposed model achieves detection speeds that are much faster than other approaches.







Similar content being viewed by others
References
Abdur R, Kashif J, Haroon AB, Mehreen S (2015) Relative discrimination criterion – A novel feature ranking method for text data. Expert Syst Appl 42(7):3670–3681
Bianco S, Cusano C, Napoletano P, Schettini R (2017) Improving CNN-Based Texture Classification by Color Balancing. J Imaging 3:33
Cheng D, Zhang S, Liu X, Sun K, Zong M (2017) Feature selection by combining subspace learning with sparse representation. Multimedia Systems 23(3):285–291
Coates A, Lee H, Ng AY (2011) An analysis of single layer networks in unsupervised feature learning AISTATS
Cui G, Yang J, Zareapoor M (2017) Unsupervised feature selection algorithm based on sparse representation. International Conference on Systems and Informatics, ICSAI 2016, p 1028–1033
Cunningham JP, Ghahramani Z (2015) Linear dimensionality reduction: survey, insights, and generalizations. JMLR
Daniel E, Lars H, Bernd H (2011) A survey of dimension reduction methods for high-dimensional data analysis and visualization. In VLUDS, pp 135–149
Dominik S, Arthur F, Nenad T (2014) A case for hubness removal in high–dimensional multimedia retrieval. European Conference on Information Retrieval, Lecture Notes in Computer Science book series, vol 8416, p 687–692
Du S, Liu J, Liu Y, Zhang X, Xue J (2017) Precise glasses detection algorithm for face with in-plane rotation. Multimedia Systems 23(3):293–302
Fang W, Le K, Yi L (2015) Sketch-based 3d shape retrieval using convolution neural networks. In CVPR, 2015
Gao L, Song J, Liu X, Shao J, Liu J, Shao J (2017) Learning in high-dimensional multimedia data: the state of the art. Multimedia Systems 23(3):303–313
Girish C, Ferat S (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In CVPR
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, p 27–30
He Y, Xueliang L, Richang H (2016) Image classification via fusing the latent deep CNN feature. Proceedings of the International Conference on Internet Multimedia Computing and Service, p 110–113
Ian J (2002) Principal component analysis. Wiley Online Library, New York
Ionescu B, Lucian Gînsca A, Boteanu B, Popescu A, Lupu M, Müller H (2015) Retrieving diverse social images at MediaEval 2015: challenge, dataset and evaluation, MediaEval workshop
Itti L, Koch C (2011) Computational modelling of visual attention. Nat Rev Neurosci 2:194–203
Jiang W, Er GH, Dai QH, Gu JW (2006) Similarity-based online feature selection in content-based image retrieval. IEEE Trans Image Process 15:702–712
Jianqing F, Yingying F (2008) High-dimensional classification using features annealed independence rules. Institute of Mathematical Statistics in the Annals of Statistics, vol 36(6), p 2605–2637
Jingkuan S, Yi Y, Zi H, Heng TS, Jiebo L (2013) Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans Multimedia 15(8):1997–2008
Jinguk J, Jongho N (2004) An efficient bitmap indexing method for similarity search in high dimensional multimedia databases. IEEE International Conference on Multimedia and Expo
Juha R (2003) Overfitting in making comparisons between variable selection methods. JMLR 3:1371–1382
Kim KW, Hong HG, Nam GPP, Ark KR (2017) A Study of Deep CNN-Based Classification of Open and Closed Eyes Using a Visible Light Camera Sensor. Sensors 17:1534
Lu C, Qu Y, Shi C, Fan J, Wu Y, Wang H (2015) Hierarchical learning for large-scale image classification via CNN and maximum confidence path. Conference on Advances in multimedia information processing, vol 9315, pp 236–245. https://doi.org/10.1007/978-3-319-24078-7_23
Mikhail B, Partha N (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In NIPS
Napoletano P (2017) Hand-crafted vs learned descriptors for color texture classification. International workshop on computational color imaging. Springer, Berlin, pp 259–271
Nie W, Cao Q, Liu A, Y S (2017) Convolutional deep learning for 3D object retrieval. Multimedia Systems 23(3):325–332
Reuter T, Papadopoulos S, Mezaris V, Cimiano P (2014) ReSEED: social event dEtection dataset, MMSys '14 Proceedings of the 5th ACM Multimedia Systems Conference, 2014, p 35–40
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Salah R, Pascal V, Xavier M, Xavier G, Yoshua B (2011) Contractive auto-encoders: explicit invariance during feature extraction. In ICML, pp 833–840
Salakhutdinov R, Hinton GE (2009) Deep boltzmann machines. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, Clearwater Beach, FL, USA, 16–18; p 448–455
Seeja KR, Zareapoor M (2014) FraudMiner: A novel credit card fraud detection model based on frequent itemset mining. Sci World J 2014:1–10
Shamsolmoali P, Zareapoor M, Jain DK et al (2018) Deep convolution network for surveillance records super-resolution. Multimed Tools Appl. https://doi.org/10.1007/s11042-018-5915-7
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. ICLR
Socher R, Huval B, Bath B, Manning CD, Ng AY (2012) Convolutional-recursive deep learning for 3D object classifcation. In: Advances in Neural Information Processing Systems. In: NIPS, p 665–673
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In CVPR
Uljarevic D, Veinovic M, Kunjadic G, Tepsic D (2017) A new way of covert communication by steganography via JPEG images within a Microsoft Word document. Multimedia Systems 23(3):333–341
Walther D, Itti L, Riesenhuber M, Poggio T, Koch C (2002) Attentional selection for object recognitiona gentle way. In International Workshop on Biologically Motivated Computer Vision, pp 472–479. Springer
Wei W, Yan H, Yizhou W, Liang W (2014) Generalized autoencoder: a neural network framework for dimensionality reduction. In CVPR Workshops, pp 496–503
Yan Y, Chen M, Ling Shyu M, Ching Chen S (2015) Deep learning for imbalanced multimedia data classification. International Symposium on Multimedia, ISM, pp 483–488
Yuanjun X, Kai Z, Dahua L, Xiaoou T (2015) Recognize complex events from static images by fusing deep channels, Computer Vision and Pattern Recognition (CVPR)
Zareapoor M, Shamsolmoali P (2015) Application of credit card fraud detection: Based on bagging ensemble classifier. Procedia Comp Sci 48(C):679–686
Zareapoor M, Shamsolmoali P (2018) Boosting prediction performance on imbalanced dataset. Int J Inf Commun Technol 13(2):186–195
Zareapoor M, Yang J (2017) A novel strategy for mining highly imbalanced data in credit card transactions. Intell Autom Soft Comput. https://doi.org/10.1080/10798587.2017.1321228
Zareapoor M, Shamsolmoali P, Kumar DJ, Wang H, Yang J (2017) Kernelized support vector machine with deep learning: An efficient approach for extreme multiclass dataset. Pattern Recogn Lett. https://doi.org/10.1016/j.patrec.2017.09.018
Zhao B, Wu X, Feng J, Peng Q, Yan S (2016) Diversified visual attention networks for fine-grained object classification. arXiv preprint arXiv:1606.08572
Zhicheng Z, Rui X, Fei S (2018) Complex event detection via attention-based video representation and classification. Multimed Tools Appl 77(3):3209–3227
Zhou W, Newsam S, Li C, Shao Z (2017) Learning Low Dimensional Convolutional Neural Networks for High-Resolution Remote Sensing Image Retrieval. Remote Sens 9(5):489–508
Zhu Y, Liang Z, Liu X, Sun K (2017) Self-representation graph feature selection method for classification. Multimedia Systems 23(3):351–356
Zhu X, Jin Z, Ji R (2017) Learning high-dimensional multimedia data. Multimedia Systems 23(3):281–283
Acknowledgements
This research is partly supported by NSFC, China (No: 61572315) and Committee of Science and Technology, Shanghai, China (No: 17JC1403000).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shamsolmoali, P., Kumar Jain, D., Zareapoor, M. et al. High-dimensional multimedia classification using deep CNN and extended residual units. Multimed Tools Appl 78, 23867–23882 (2019). https://doi.org/10.1007/s11042-018-6146-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6146-7