Video Search Based on Semantic Extraction and Locally Regional Object Proposal

Truong, Thanh-Dat; Nguyen, Vinh-Tiep; Tran, Minh-Triet; Trieu, Trang-Vinh; Do, Tien; Ngo, Thanh Duc; Le, Dinh-Duy

doi:10.1007/978-3-319-73600-6_49

Thanh-Dat Truong^21,22,
Vinh-Tiep Nguyen²¹,
Minh-Triet Tran²²,
Trang-Vinh Trieu²¹,
Tien Do²¹,
Thanh Duc Ngo²¹ &
…
Dinh-Duy Le²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10705))

Included in the following conference series:

International Conference on Multimedia Modeling

2931 Accesses
6 Citations

Abstract

In this paper, we propose a semantic concept-based video browsing system which mainly exploits the spatial information of both object and action concepts. In a video frame, we soft-assign each locally regional object proposal into cells of a grid. For action concepts, we also collect a dataset with about 100 actions. In many cases, actions can be predicted from a still image, not necessarily from a video shot. Therefore, we consider actions as object concepts and use a deep neural network based on YOLO detector for action detection. Moreover, instead of densely extracting concepts of a video shot, we focus on high-saliency objects and remove noisy concepts. To further improve the interaction, we develop a color-based sketch board to quickly remove irrelevant shots and an instant search panel to improve the recall of the system. Finally, metadata, such as video’s title and summary, is integrated into our system to boost its precision and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semantic Extraction and Object Proposal for Video Search

An Interactive Video Search Platform for Multi-modal Retrieval with Advanced Concepts

Object Priors for Classifying and Localizing Unseen Actions

Article Open access 19 April 2021

References

Cobârzan, C., Schoeffmann, K., Bailer, W., Hürst, W., Blažek, A., Lokoč, J., Vrochidis, S., Barthel, K.U., Rossetto, L.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimedia Tools Appl. 76(4), 5539–5571 (2017)
Article Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 487–495. Curran Associates, Inc., New York (2014)
Google Scholar
Patterson, G., Xu, C., Su, H., Hays, J.: The sun attribute database: beyond categories for deeper scene understanding. Int. J. Comput. Vis. 108(1–2), 59–81 (2014)
Article Google Scholar
Liu, N., Han, J.: Dhsnet: deep hierarchical saliency network for salient object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Google Scholar
Blažek, A., Lokoč, J., Skopal, T.: Video retrieval with feature signature sketches. In: Traina, A.J.M., Traina, C., Cordeiro, R.L.F. (eds.) SISAP 2014. LNCS, vol. 8821, pp. 25–36. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11988-5_3
Google Scholar
Nguyen, V.T., Le, D.D., Salvador, A., Zhu, C., Nguyen, D.L., Tran, M.T., Duc, T.N., Duong, D.A., Satoh, S., i Nieto, X.G.: Nii-hitachi-uit at trecvid 2015. In: TRECVID 2015 Workshop, Gaithersburg, MD, USA (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia Communications Laboratory, University of Information Technology, Vietnam National University-HCMC, Ho Chi Minh City, Vietnam
Thanh-Dat Truong, Vinh-Tiep Nguyen, Trang-Vinh Trieu, Tien Do, Thanh Duc Ngo & Dinh-Duy Le
University of Science, Vietnam National University-HCMC, Ho Chi Minh City, Vietnam
Thanh-Dat Truong & Minh-Triet Tran

Authors

Thanh-Dat Truong
View author publications
You can also search for this author in PubMed Google Scholar
Vinh-Tiep Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Triet Tran
View author publications
You can also search for this author in PubMed Google Scholar
Trang-Vinh Trieu
View author publications
You can also search for this author in PubMed Google Scholar
Tien Do
View author publications
You can also search for this author in PubMed Google Scholar
Thanh Duc Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Dinh-Duy Le
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanh-Dat Truong .

Editor information

Editors and Affiliations

Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Klaus Schoeffmann
Chulalongkorn University, Bangkok, Thailand
Thanarat H. Chalidabhongse
City University of Hong Kong, Hong Kong, China
Chong Wah Ngo
Chulalongkorn University, Bangkok, Thailand
Supavadee Aramvith
Dublin City University, Dublin, Ireland
Noel E. O’Connor
Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Tampere University of Technology, Tampere, Finland
Moncef Gabbouj
Rutgers University, Piscataway, New Jersey, USA
Ahmed Elgammal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Truong, TD. et al. (2018). Video Search Based on Semantic Extraction and Locally Regional Object Proposal. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_49

Download citation

DOI: https://doi.org/10.1007/978-3-319-73600-6_49
Published: 13 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73599-3
Online ISBN: 978-3-319-73600-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Video Search Based on Semantic Extraction and Locally Regional Object Proposal

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Extraction and Object Proposal for Video Search

An Interactive Video Search Platform for Multi-modal Retrieval with Advanced Concepts

Object Priors for Classifying and Localizing Unseen Actions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Video Search Based on Semantic Extraction and Locally Regional Object Proposal

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Extraction and Object Proposal for Video Search

An Interactive Video Search Platform for Multi-modal Retrieval with Advanced Concepts

Object Priors for Classifying and Localizing Unseen Actions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation