Interactive Question Answering for Multimodal Lifelog Retrieval

Tran, Ly-Duyen; Zhou, Liting; Nguyen, Binh; Gurrin, Cathal

doi:10.1007/978-3-031-56435-2_6

Ly-Duyen Tran¹⁴,
Liting Zhou¹⁴,
Binh Nguyen^15,16 &
…
Cathal Gurrin¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14565))

Included in the following conference series:

International Conference on Multimedia Modeling

64 Accesses

Abstract

Supporting Question Answering (QA) tasks is the next step for lifelog retrieval systems, similar to the progression of the parent field of information retrieval. In this paper, we propose a new pipeline to tackle the QA task in the context of lifelogging, which is based on the open-domain QA pipeline. We incorporate this pipeline into a multimodal lifelog retrieval system, which allows users to submit questions prevalent to a lifelog and then suggests possible text answers based on multimodal data. A test collection is developed to facilitate the user study, the aim of which is to evaluate the effectiveness of the proposed system compared to a conventional lifelog retrieval system. The results show that the proposed system is more effective than the conventional system, in terms of both effectiveness and user satisfaction. The results also suggest that the proposed system is more valuable for novice users, while both systems are equally effective for experienced users.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Alam, N., Graham, Y., Gurrin, C.: Memento 2.0: an improved lifelog search engine for LSC 2022. In: Proceedings of the 5th Annual on Lifelog Search Challenge, pp. 2–7 (2022)
Google Scholar
Bain, M., Nagrani, A., Varol, G., Zisserman, A.: Frozen in time: a joint video and image encoder for end-to-end retrieval. In: IEEE International Conference on Computer Vision (2021)
Google Scholar
Chang, C.C., Fu, M.H., Huang, H.H., Chen, H.H.: An interactive approach to integrating external textual knowledge for multimodal lifelog retrieval. In: Proceedings of the ACM Workshop on Lifelog Search Challenge, pp. 41–44 (2019)
Google Scholar
Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading wikipedia to answer open-domain questions. arXiv preprint arXiv:1704.00051 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fabian Caba Heilbron, Victor Escorcia, B.G., Niebles, J.C.: ActivityNet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970 (2015)
Google Scholar
Gemmell, J., Bell, G., Lueder, R.: MyLifeBits: a personal database for everything. Commun. ACM 49(1), 88–95 (2006)
Article Google Scholar
Gurrin, C., et al.: Experiments in lifelog organisation and retrieval at NTCIR. In: Sakai, T., Oard, D.W., Kando, N. (eds.) Evaluating Information Retrieval and Access Tasks. TIRS, vol. 43, pp. 187–203. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5554-1_13
Chapter Google Scholar
Gurrin, C., et al.: Introduction to the fifth annual lifelog search challenge, LSC 2022. In: Proceedings of the International Conference on Multimedia Retrieval (ICMR 2022). ACM, Newark, NJ (2022)
Google Scholar
Gurrin, C., et al.: Introduction to the sixth annual lifelog search challenge, LSC 2023. In: Proceedings of the International Conference on Multimedia Retrieval (ICMR 2023). ICMR 2023, New York (2023)
Google Scholar
Izacard, G., Grave, E.: Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282 (2020)
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906 (2020)
Laugwitz, B., Held, T., Schrepp, M.: Construction and evaluation of a user experience questionnaire. In: Holzinger, A. (ed.) USAB 2008. LNCS, vol. 5298, pp. 63–76. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89350-9_6
Chapter Google Scholar
Lee, J., Yun, S., Kim, H., Ko, M., Kang, J.: Ranking paragraphs for improving answer recall in open-domain question answering. arXiv preprint arXiv:1810.00494 (2018)
Lee, K., Chang, M.W., Toutanova, K.: Latent retrieval for weakly supervised open domain question answering. arXiv preprint arXiv:1906.00300 (2019)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)
Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural. Inf. Process. Syst. 33, 9459–9474 (2020)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Nguyen, T.N., et al.: Lifeseeker 3.0: an interactive lifelog search engine for LSC 2021. In: Proceedings of the 4th Annual on Lifelog Search Challenge, pp. 41–46 (2021)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)
MathSciNet Google Scholar
Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for squad. arXiv preprint arXiv:1806.03822 (2018)
Spiess, F., Schuldt, H.: Multimodal interactive lifelog retrieval with vitrivr-VR. In: Proceedings of the 5th Annual on Lifelog Search Challenge, pp. 38–42 (2022)
Google Scholar
Tran, L.-D., Ho, T.C., Pham, L.A., Nguyen, B., Gurrin, C., Zhou, L.: LLQA - lifelog question answering dataset. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13141, pp. 217–228. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98358-1_18
Chapter Google Scholar
Tran, L.D., Nguyen, B., Zhou, L., Gurrin, C.: Myeachtra: event-based interactive lifelog retrieval system for LSC 2023. In: Proceedings of the 6th Annual ACM Lifelog Search Challenge, pp. 24–29. Association for Computing Machinery, New York (2023)
Google Scholar
Tran, L.D., et al.: Comparing interactive retrieval approaches at the lifelog search challenge 2021. IEEE Access 11, 30982–30995 (2023)
Article Google Scholar
Tran, L.D., Nguyen, M.D., Nguyen, B., Lee, H., Zhou, L., Gurrin, C.: E-myscéal: embedding-based interactive lifelog retrieval system for LSC 2022. In: Proceedings of the 5th Annual on Lifelog Search Challenge, pp. 32–37. LSC 2022, Association for Computing Machinery, New York (2022)
Google Scholar
Tran, L.D., Nguyen, M.D., Nguyen, B.T., Zhou, L.: Myscéal: a deeper analysis of an interactive lifelog search engine. Multimedia Tools Appl. 82, 1–18 (2023)
Article Google Scholar
Tran, Q.L., Tran, L.D., Nguyen, B., Gurrin, C.: MemoriEase: an interactive lifelog retrieval system for LSC 2023. In: Proceedings of the 6th Annual ACM Lifelog Search Challenge, pp. 30–35 (2023)
Google Scholar
Wang, S., et al.: R 3: reinforced ranker-reader for open-domain question answering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Yang, A., Miech, A., Sivic, J., Laptev, I., Schmid, C.: Zero-shot video question answering via frozen bidirectional language models. arXiv preprint arXiv:2206.08155 (2022)
Yang, W., et al.: End-to-end open-domain question answering with BERTserini. arXiv preprint arXiv:1902.01718 (2019)
Yu, J., Wang, Z., Vasudevan, V., Yeung, L., Seyedhosseini, M., Wu, Y.: Coca: contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917 (2022)
Zhou, L., et al.: Overview of the NTCIR-16 lifelog-4 task. In: Proceedings of the 16th NTCIR Conference on Evaluation of Information Access Technologies, pp. 130–135. National Institute of Informatics (2022)
Google Scholar
Zhu, F., Lei, W., Wang, C., Zheng, J., Poria, S., Chua, T.S.: Retrieving and reading: a comprehensive survey on open-domain question answering. arXiv preprint arXiv:2101.00774 (2021)

Download references

Acknowledgements

This work was conducted with the financial support of the Science Foundation Ireland Centre for Research Training in Digitally-Enhanced Reality (d-real) under Grant No. 18/CRT/6224. For the purpose of Open Access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Author information

Authors and Affiliations

Dublin City University, Dublin, Ireland
Ly-Duyen Tran, Liting Zhou & Cathal Gurrin
AISIA Research Lab, Ho Chi Minh, Vietnam
Binh Nguyen
Ho Chi Minh University of Science, Vietnam National University, Hanoi, Vietnam
Binh Nguyen

Authors

Ly-Duyen Tran
View author publications
You can also search for this author in PubMed Google Scholar
Liting Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Binh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Cathal Gurrin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ly-Duyen Tran .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic
Delft University of Technology, Delft, The Netherlands
Cynthia Liem
University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Reykjavik University, Reykjavik, Iceland
Björn Þór Jónsson
Microsoft Research Lab – Asia, Beijing, China
Bei Liu
The University of Tokyo, Tokyo, Japan
Yoko Yamakata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tran, LD., Zhou, L., Nguyen, B., Gurrin, C. (2024). Interactive Question Answering for Multimodal Lifelog Retrieval. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14565. Springer, Cham. https://doi.org/10.1007/978-3-031-56435-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-56435-2_6
Published: 20 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56434-5
Online ISBN: 978-3-031-56435-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Interactive Question Answering for Multimodal Lifelog Retrieval