SQL-Like Interpretable Interactive Video Search

Wu, Jiaxin; Nguyen, Phuong Anh; Ma, Zhixin; Ngo, Chong-Wah

doi:10.1007/978-3-030-67835-7_34

Jiaxin Wu¹⁵,
Phuong Anh Nguyen¹⁵,
Zhixin Ma¹⁶ &
…
Chong-Wah Ngo¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12573))

Included in the following conference series:

International Conference on Multimedia Modeling

1897 Accesses
7 Citations

The original version of this chapter was revised: the acknowledgement section has been corrected. Additionally, the affiliations of the third and last author and the e-mail address of the last author have been corrected. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-67835-7_51

Abstract

Concept-free search, which embeds text and video signals in a joint space for retrieval, appears to be a new state-of-the-art. However, this new search paradigm suffers from two limitations. First, the search result is unpredictable and not interpretable. Second, the embedded features are in high-dimensional space hindering real-time indexing and search. In this paper, we present a new implementation of the Vireo video search system (Vireo-VSS), which employs a dual-task model to index each video segment with an embedding feature in a low dimension and a concept list for retrieval. The concept list serves as a reference to interpret its associated embedded feature. With these changes, a SQL-like querying interface is designed such that a user can specify the search content (subject, predicate, object) and constraint (logical condition) in a semi-structured way. The system will decompose the SQL-like query into multiple sub-queries depending on the constraint being specified. Each sub-query is translated into an embedding feature and a concept list for video retrieval. The search result is compiled by union or pruning of the search lists from multiple sub-queries. The SQL-like interface is also extended for temporal querying, by providing multiple SQL templates for users to specify the temporal evolution of a query.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

21 January 2021
The original version of the book was inadvertently published with an incorrect acknowledgement in chapter 34. The acknowledgement has been corrected and reads as follows:
Acknowledgement: The research was partially supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant and the National Natural Science Foundation of China (No. 61872256).
The affiliation of the third author, Zhixin Ma, was incorrect. In the contribution it read “School of Information System,” but correctly it should be “School of Computing and Information Systems”.
The affiliation of the last author, Chong-Wah Ngo, was not correct. In the book it read “Department of Computer Science, City University of Hong Kong, Hong Kong, China”. Instead, the correct affiliation is: “School of Computing and Information Systems, Singapore Management University, Singapore, Singapore”.
Additionally, his e-mail address “cscwngo@cityu.edu.hk” was also incorrect. The correct e-mail address is: “cwngo@smu.edu.sg”.
The chapter and the book have been updated with the changes.

Notes

1.
https://github.com/aaalgo/kgraph.

References

Awad, G., et al.: Trecvid 2016: evaluating video search, video event detection, localization, and hyperlinking. In: TRECVID 2016 Workshop (2016)
Google Scholar
Berns, F., Rossetto, L., Schoeffmann, K., Beecks, C., Awad, G.: V3C1 dataset: an evaluation of content characteristics. In: ICMR, pp. 334–338 (2019)
Google Scholar
Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2vv++: fully deep learning for ad-hoc video search. In: ACM MM (2019)
Google Scholar
Lokoč, J., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM TOMM 15(1), 29:1–29:18 (2019)
Google Scholar
Lokoč, J., Bailer, W., Schoeffmann, K., Muenzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE TMM 20(12), 3361–3376 (2018)
Google Scholar
Nguyen, P.A., Lu, Y.-J., Zhang, H., Ngo, C.-W.: Enhanced VIREO KIS at VBS 2018. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 407–412. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73600-6_42
Chapter Google Scholar
Nguyen, P.A., Ngo, C.-W., Francis, D., Huet, B.: VIREO @ video browser showdown 2019. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11296, pp. 609–615. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05716-9_54
Chapter Google Scholar
Nguyen, P.A., Wu, J., Ngo, C.-W., Francis, D., Huet, B.: VIREO @ video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 772–777. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_68
Chapter Google Scholar
Rossetto, L., et al.: Interactive video retrieval in the age of deep learning - detailed evaluation of VBS 2019. IEEE TMM 1 (2020)
Google Scholar
Wu, J., Ngo, C.W.: Interpretable embedding for ad-hoc video search. In: ACM MM (2020)
Google Scholar

Download references

Acknowledgement

The research was partially supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant and the National Natural Science Foundation of China (No. 61872256).

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Hong Kong, China
Jiaxin Wu & Phuong Anh Nguyen
School of Computing and Information Systems, Singapore Management University, Singapore, Singapore
Zhixin Ma & Chong-Wah Ngo

Authors

Jiaxin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Phuong Anh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Chong-Wah Ngo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiaxin Wu .

Editor information

Editors and Affiliations

Charles University, Prague, Czech Republic
Jakub Lokoč
Charles University, Prague, Czech Republic
Tomáš Skopal
Klagenfurt University, Klagenfurt, Austria
Klaus Schoeffmann
CERTH-ITI, Thessaloniki, Greece
Vasileios Mezaris
Renmin University of China, Beijing, China
Xirong Li
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Queen Mary University of London, London, UK
Ioannis Patras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, J., Nguyen, P.A., Ma, Z., Ngo, CW. (2021). SQL-Like Interpretable Interactive Video Search. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-67835-7_34
Published: 21 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67834-0
Online ISBN: 978-3-030-67835-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics