Skip to main content

SQL-Like Interpretable Interactive Video Search

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12573))

Included in the following conference series:

  • The original version of this chapter was revised: the acknowledgement section has been corrected. Additionally, the affiliations of the third and last author and the e-mail address of the last author have been corrected. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-67835-7_51

Abstract

Concept-free search, which embeds text and video signals in a joint space for retrieval, appears to be a new state-of-the-art. However, this new search paradigm suffers from two limitations. First, the search result is unpredictable and not interpretable. Second, the embedded features are in high-dimensional space hindering real-time indexing and search. In this paper, we present a new implementation of the Vireo video search system (Vireo-VSS), which employs a dual-task model to index each video segment with an embedding feature in a low dimension and a concept list for retrieval. The concept list serves as a reference to interpret its associated embedded feature. With these changes, a SQL-like querying interface is designed such that a user can specify the search content (subject, predicate, object) and constraint (logical condition) in a semi-structured way. The system will decompose the SQL-like query into multiple sub-queries depending on the constraint being specified. Each sub-query is translated into an embedding feature and a concept list for video retrieval. The search result is compiled by union or pruning of the search lists from multiple sub-queries. The SQL-like interface is also extended for temporal querying, by providing multiple SQL templates for users to specify the temporal evolution of a query.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 21 January 2021

    The original version of the book was inadvertently published with an incorrect acknowledgement in chapter 34. The acknowledgement has been corrected and reads as follows:

    Acknowledgement: The research was partially supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant and the National Natural Science Foundation of China (No. 61872256).

    The affiliation of the third author, Zhixin Ma, was incorrect. In the contribution it read “School of Information System,” but correctly it should be “School of Computing and Information Systems”.

    The affiliation of the last author, Chong-Wah Ngo, was not correct. In the book it read “Department of Computer Science, City University of Hong Kong, Hong Kong, China”. Instead, the correct affiliation is: “School of Computing and Information Systems, Singapore Management University, Singapore, Singapore”.

    Additionally, his e-mail address “cscwngo@cityu.edu.hk” was also incorrect. The correct e-mail address is: “cwngo@smu.edu.sg”.

    The chapter and the book have been updated with the changes.

Notes

  1. 1.

    https://github.com/aaalgo/kgraph.

References

  1. Awad, G., et al.: Trecvid 2016: evaluating video search, video event detection, localization, and hyperlinking. In: TRECVID 2016 Workshop (2016)

    Google Scholar 

  2. Berns, F., Rossetto, L., Schoeffmann, K., Beecks, C., Awad, G.: V3C1 dataset: an evaluation of content characteristics. In: ICMR, pp. 334–338 (2019)

    Google Scholar 

  3. Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2vv++: fully deep learning for ad-hoc video search. In: ACM MM (2019)

    Google Scholar 

  4. Lokoč, J., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM TOMM 15(1), 29:1–29:18 (2019)

    Google Scholar 

  5. Lokoč, J., Bailer, W., Schoeffmann, K., Muenzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE TMM 20(12), 3361–3376 (2018)

    Google Scholar 

  6. Nguyen, P.A., Lu, Y.-J., Zhang, H., Ngo, C.-W.: Enhanced VIREO KIS at VBS 2018. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 407–412. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73600-6_42

    Chapter  Google Scholar 

  7. Nguyen, P.A., Ngo, C.-W., Francis, D., Huet, B.: VIREO @ video browser showdown 2019. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11296, pp. 609–615. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05716-9_54

    Chapter  Google Scholar 

  8. Nguyen, P.A., Wu, J., Ngo, C.-W., Francis, D., Huet, B.: VIREO @ video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 772–777. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_68

    Chapter  Google Scholar 

  9. Rossetto, L., et al.: Interactive video retrieval in the age of deep learning - detailed evaluation of VBS 2019. IEEE TMM 1 (2020)

    Google Scholar 

  10. Wu, J., Ngo, C.W.: Interpretable embedding for ad-hoc video search. In: ACM MM (2020)

    Google Scholar 

Download references

Acknowledgement

The research was partially supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant and the National Natural Science Foundation of China (No. 61872256).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaxin Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, J., Nguyen, P.A., Ma, Z., Ngo, CW. (2021). SQL-Like Interpretable Interactive Video Search. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67835-7_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67834-0

  • Online ISBN: 978-3-030-67835-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics