Multi-modal active learning with deep reinforcement learning for target feature extraction in multi-media image processing applications

Dhiman, Gaurav; Kumar, A. Vignesh; Nirmalan, R.; Sujitha, S.; Srihari, K.; Yuvaraj, N.; Arulprakash, P.; Raja, R. Arshath

doi:10.1007/s11042-022-12178-7

Multi-modal active learning with deep reinforcement learning for target feature extraction in multi-media image processing applications

1178: Pattern Recognition for Adaptive User Interfaces
Published: 25 February 2022

Volume 82, pages 5343–5367, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Gaurav Dhiman^1,2,3,
A. Vignesh Kumar⁴,
R. Nirmalan⁵,
S. Sujitha⁶,
K. Srihari⁷,
N. Yuvaraj⁸,
P. Arulprakash⁹ &
…
R. Arshath Raja¹⁰

1748 Accesses
1 Altmetric
Explore all metrics

Abstract

The advancement in on demand Multimedia Streaming Applications (MAS) enables faster video transmission as per the user request in various fields. This system suffers from poor speed, flexibility and efficiency in accessing and presenting the multimedia contents from the archive. It mostly undergoes delay, packet loss and congestion during data delivery. Hence, the requirement of manual annotation is required for access and retrieval but it suffers from poor retrieval accuracy over large databases. The need of automatic annotation in MAS takes the lead for increased retrieval accuracy on most similar image retrieval systems based on various low-level features. Thus, it eliminates the gap between the high-level semantics and low-level feature representation. The approach on automated annotation of images is considered dependent on the accuracy of a model while detecting edges, color, texture, shape and spatial information. In this paper, we develop an automated annotation model that retrieves visually similar images from online multimedia streams with optimal feature extraction. The automated annotation model is designed with a Multi-modal Active Learning (MAL) that uses Convolutional Recurrent Neural Network (CRNN) for automatic annotation of labels based on visually similar contents or features like edges, color, texture, shape and spatial information. Further, a Deep Reinforcement Learning (DRL) algorithm is used that increases the performance of the retrieval engine based on validating the visually extracted features. The simulation of MAL-CNN is conducted over large online streaming databases and it is then validated by DRL on an online real-time streaming. The performance is validated in terms of its retrieval accuracy, sensitivity, specificity, f-measure, geometric mean and mean absolute percentage error (MAPE). The results confirm the accuracy of the proposed MAL-DRL model against conventional machine learning, reinforcement learning and deep learning automatic annotation models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Content-Based Video Retrieval Using Deep Learning Algorithms

Content-based video recommendation system (CBVRS): a novel approach to predict videos using multilayer feed forward neural network and Monte Carlo sampling method

Article 11 August 2022

Intentional Image Similarity Search

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

20BN-something-something Dataset:https://20bn.com/datasets/something-something
Abdel-Mottaleb M, Wu HL, Dimitrova N (1996) Aspects of multimedia retrieval. Philips J Res 50(1–2):227–251
Article Google Scholar
Abu-El-Haija S, Kothari N, Lee J, Natsev P, Toderici G, Varadarajan B, Vijayanarasimhan S (2016) Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675
ActivityNet C dataset: https://paperswithcode.com/sota/dense-video-captioning-on-activitynet
Alansary A, Oktay O, Li Y, Le Folgoc L, Hou B, Vaillant G, Rueckert D (2019) Evaluating reinforcement learning agents for anatomical landmark detection. Med Image Anal 53:156–164
Article Google Scholar
Chatterjee I (2021) Artificial intelligence and patentability: review and discussions. Int J Mod Res 1:15–21
Google Scholar
DALY dataset: http://thoth.inrialpes.fr/daly/
Duraimurugan S, Jayarin PJ (2020) Maximizing the quality of service in distributed multimedia streaming in heterogeneous wireless network. Multimed Tools Appl 79(5):4185–4198
Article Google Scholar
Goyal R, Kahou SE, Michalski V, Materzynska J, Westphal S, Kim H, Hoppe F (2017) The” Something Something” video database for learning and evaluating visual common sense. In: ICCV, vol 1, no 4, p 5
Hashemzehi R, Mahdavi SJS, Kheirabadi M, Kamel SR (2020) Detection of brain tumors from MRI images base on deep learning using hybrid model CNN and NADE. Biocybern Biomed Eng. https://doi.org/10.1016/j.bbe.2020.06.001
He S, Wu J, Lian C, Gach HM, Mutic S, Bosch W, Li H (2020) An adaptive low-rank modeling-based active learning method for medical image annotation. IRBM. In Press, Corrected Proof. https://doi.org/10.1016/j.irbm.2020.06.001
Huang G, Liu Z, Pleiss G, Van Der Maaten L, Weinberger K (2019) Convolutional networks with dense connectivity. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2918284
Article Google Scholar
Ide H, Kobayashi T, Watanabe K, Kurita T (2020) Robust pruning for efficient CNNs. Pattern Recognit Lett 135:90–98
Article Google Scholar
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014)Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 1725-1732
Ke X, Zhou M, Niu Y, Guo W (2017) Data equilibrium based automatic image annotation by fusing deep model and semantic propagation. Pattern Recogn 71:60–77
Article Google Scholar
Khalil T, Akram MU, Raja H, Jameel A, Basit I (2018) Detection of glaucoma using cup to disc ratio from spectral domain optical coherence tomography images. IEEE Access 6:4560–4576
Article Google Scholar
Kiran R, Kumar P, Bhasker B (2020) OSLCFit (Organic Simultaneous LSTM and CNN Fit): A novel deep learning based solution for sentiment polarity classification of reviews. Expert Syst Appl 113488
Koriem SM (2004) Modeling concurrent, sequential, storage, retrieval, and scheduling activities of multimedia systems. J King Saud Univ - Comput Inf Sci 17:65–103
Google Scholar
Krishna R, Hata K, Ren F, Fei-Fei L, Niebles C (2017) J. Dense-captioning events in videos. In: Proceedings of the IEEE international conference on computer vision, pp 706-715
Kumar R, Dhiman G (2021) A comparative study of fuzzy optimization through fuzzy number. Int J Mod Res 1:1–14
Google Scholar
Kuminski E, Shamir L (2018) A hybrid approach to machine learning annotation of large galaxy image databases. Astron Comput 25:257–269
Article Google Scholar
Li H, Zhang B, Zhang Y, Liu W, Mao Y, Huang J, Wei L (2020) A semi-automated annotation algorithm based on weakly supervised learning for medical images. Biocybernet Biomed Eng 40(2):787–802
Article Google Scholar
Luo C, Yu L, Yang E, Zhou H, Ren P (2019) A benchmark image dataset for industrial tools. Pattern Recognit Lett 125:341–348
Article Google Scholar
Mishkin D, Sergievskiy N, Matas J (2017) Systematic evaluation of convolution neural network advances on the imagenet. Comput Vis Image Underst 161:11–19
Article Google Scholar
Mishra SR, Mishra TK, Sanyal G, Sarkar A, Satapathy SC (2020) Real time human action recognition using triggered frame extraction and a typical CNN heuristic. Pattern Recognit Lett 135:329–336
Article Google Scholar
Mo K, Zhu S, Chang AX, Yi L, Tripathi S, Guibas LJ, Su H (2019) Partnet: A large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 909-918
MPII-Cooking dataset: https://pgram.com/dataset/mpii-cooking-activities-dataset/
Piras L, Giacinto G (2017) Information fusion in content based image retrieval: A comprehensive overview. Inf Fusion 37:50–60
Article Google Scholar
Qi X, Han Y (2007) Incorporating multiple SVMs for automatic image annotation. Pattern Recogn 40(2):728–741
Article MATH Google Scholar
Qin J, Pan W, Xiang X, Tan Y, Hou G (2020) A biological image classification method based on improved CNN. Eco Inform 58:101093
Article Google Scholar
Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5296-5305
Rohrbach M, Amin S, Andriluka M, Schiele B (2012) A database for fine grained activity detection of cooking activities. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 1194-1201
Sherstinsky A (2020) Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network. Physica D 404:132306
Article MATH Google Scholar
Sports-1M dataset: https://github.com/gtoderici/sports-1m-dataset/blob/wiki/ProjectHome.md
Tian F, Wang Q, Li X, Sun N (2019) Heterogeneous multimedia cooperative annotation based on multimodal correlation learning. J Vis Commun Image Represent 58:544–553
Article Google Scholar
Tran D, Bolonkin M, Paluri M, Torresani L (2016) VideoMCC: a New benchmark for video comprehension. arXiv preprint arXiv:1606.07373
Vaishnav PK, Sharma S, Sharma P (2021) Analytical review analysis for screening COVID-19. Int J Mod Res 1:22–29
Google Scholar
VideoMCC dataset: https://archive.org/details/vicomdataset
Wang R, Xie Y, Yang J, Xue L, Hu M, Zhang Q (2017) Large scale automatic image annotation based on convolutional neural network. J Vis Commun Image Represent 49:213–224
Article Google Scholar
Wang R, Xu J, Han TX (2019) Object instance detection with pruned Alexnet and extended training data. Sig Process Image Commun 70:145–156
Article Google Scholar
Wang C, Song L, Wang G, Zhang Q, Wang X (2020)Multi-scale multi-patch person re-identification with exclusivity regularized softmax. Neurocomputing 382:64–70
Article Google Scholar
Weinzaepfel P, Martin X, Schmid C (2016) Human action localization with sparse spatial supervision. arXiv preprint arXiv:1605.05197
Xie Y, Zhou S, Xiao Y, Kulturel-Konak S, Konak A (2018) A β-accurate linearization method of Euclidean distance for the facility layout problem with heterogeneous distance metrics. Eur J Oper Res 265(1):26–38
Article MATH Google Scholar
Xue Z, Li G, Huang Q (2018) Joint multi-view representation and image annotation via optimal predictive subspace learning. Inf Sci 451:180–194
Article MATH Google Scholar
Youtube-8M dataset: http://research.google.com/youtube8m/
Youtube BoundingBoxes dataset: https://research.google.com/youtube-bb/
Zafar B, Ashraf R, Ali N, Ahmed M, Jabbar S, Naseer K, Jeon G (2018) Intelligent image classification-based on spatial weighted histograms of concentric circles. Comput Sci Inf Syst 15(3):615–633
Article Google Scholar
Zhao M, Chow TW, Zhang Z, Li B (2015) Automatic image annotation via compact graph based semi-supervised learning. Knowl Based Syst 76:148–165
Article Google Scholar
Zhao W, Yan L, Zhang Y (2018)Geometric-constrained multi-view image matching method based on semi-global optimization. Geo Spat Inf Sci 21(2):115–126
Article Google Scholar
Zhen Z, Xuan Z, Wang F, Sun R, Duić N, Jin T (2019) Image phase shift invariance based multi-transform-fusion method for cloud motion displacement calculation using sky images. Energy Conv Manag 197:111853
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Government Bikram College of Commerce, Patiala, India
Gaurav Dhiman
Department of Computer Science and Engineering, University Centre for Research & Development, Chandigarh University, Gharuan, Mohali, Punjab, India
Gaurav Dhiman
Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, India
Gaurav Dhiman
Department of Computer Science & Engineering, Jai Shriram Engineering College, Tiruppur, Tamil Nadu, India
A. Vignesh Kumar
Department of Computer Science & Engineering, Kalasalingam Academy of Research and Education, Anand Nagar, Krishnankoil, Tamil Nadu, India
R. Nirmalan
Department of Computer Science & Engineering, Sri Vidya College of Engineering and Technology, Virudhunagar, Tamil Nadu, India
S. Sujitha
Department of Computer Science & Engineering, SNS College of Technology, Coimbatore, Tamil Nadu, India
K. Srihari
Training and Research, ICT Academy, Chennai, India
N. Yuvaraj
Department of Computer Science and Engineering, Rathinam Technical Campus, Eachanari, Coimbatore, 641021, India
P. Arulprakash
Research & Publications, ICT Academy, IIT Madras Research Park, Tamil Nadu, India
R. Arshath Raja

Authors

Gaurav Dhiman
View author publications
You can also search for this author inPubMed Google Scholar
A. Vignesh Kumar
View author publications
You can also search for this author inPubMed Google Scholar
R. Nirmalan
View author publications
You can also search for this author inPubMed Google Scholar
S. Sujitha
View author publications
You can also search for this author inPubMed Google Scholar
K. Srihari
View author publications
You can also search for this author inPubMed Google Scholar
N. Yuvaraj
View author publications
You can also search for this author inPubMed Google Scholar
P. Arulprakash
View author publications
You can also search for this author inPubMed Google Scholar
R. Arshath Raja
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Gaurav Dhiman.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dhiman, G., Kumar, A.V., Nirmalan, R. et al. Multi-modal active learning with deep reinforcement learning for target feature extraction in multi-media image processing applications. Multimed Tools Appl 82, 5343–5367 (2023). https://doi.org/10.1007/s11042-022-12178-7

Download citation

Received: 05 July 2020
Revised: 01 November 2021
Accepted: 07 January 2022
Published: 25 February 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11042-022-12178-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal active learning with deep reinforcement learning for target feature extraction in multi-media image processing applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Content-Based Video Retrieval Using Deep Learning Algorithms

Content-based video recommendation system (CBVRS): a novel approach to predict videos using multilayer feed forward neural network and Monte Carlo sampling method

Intentional Image Similarity Search

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now