Skip to main content

Interactive Question Answering for Multimodal Lifelog Retrieval

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14565))

Included in the following conference series:

  • 64 Accesses

Abstract

Supporting Question Answering (QA) tasks is the next step for lifelog retrieval systems, similar to the progression of the parent field of information retrieval. In this paper, we propose a new pipeline to tackle the QA task in the context of lifelogging, which is based on the open-domain QA pipeline. We incorporate this pipeline into a multimodal lifelog retrieval system, which allows users to submit questions prevalent to a lifelog and then suggests possible text answers based on multimodal data. A test collection is developed to facilitate the user study, the aim of which is to evaluate the effectiveness of the proposed system compared to a conventional lifelog retrieval system. The results show that the proposed system is more effective than the conventional system, in terms of both effectiveness and user satisfaction. The results also suggest that the proposed system is more valuable for novice users, while both systems are equally effective for experienced users.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Alam, N., Graham, Y., Gurrin, C.: Memento 2.0: an improved lifelog search engine for LSC 2022. In: Proceedings of the 5th Annual on Lifelog Search Challenge, pp. 2–7 (2022)

    Google Scholar 

  2. Bain, M., Nagrani, A., Varol, G., Zisserman, A.: Frozen in time: a joint video and image encoder for end-to-end retrieval. In: IEEE International Conference on Computer Vision (2021)

    Google Scholar 

  3. Chang, C.C., Fu, M.H., Huang, H.H., Chen, H.H.: An interactive approach to integrating external textual knowledge for multimodal lifelog retrieval. In: Proceedings of the ACM Workshop on Lifelog Search Challenge, pp. 41–44 (2019)

    Google Scholar 

  4. Chen, D., Fisch, A., Weston, J., Bordes, A.: Reading wikipedia to answer open-domain questions. arXiv preprint arXiv:1704.00051 (2017)

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  6. Fabian Caba Heilbron, Victor Escorcia, B.G., Niebles, J.C.: ActivityNet: a large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970 (2015)

    Google Scholar 

  7. Gemmell, J., Bell, G., Lueder, R.: MyLifeBits: a personal database for everything. Commun. ACM 49(1), 88–95 (2006)

    Article  Google Scholar 

  8. Gurrin, C., et al.: Experiments in lifelog organisation and retrieval at NTCIR. In: Sakai, T., Oard, D.W., Kando, N. (eds.) Evaluating Information Retrieval and Access Tasks. TIRS, vol. 43, pp. 187–203. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5554-1_13

    Chapter  Google Scholar 

  9. Gurrin, C., et al.: Introduction to the fifth annual lifelog search challenge, LSC 2022. In: Proceedings of the International Conference on Multimedia Retrieval (ICMR 2022). ACM, Newark, NJ (2022)

    Google Scholar 

  10. Gurrin, C., et al.: Introduction to the sixth annual lifelog search challenge, LSC 2023. In: Proceedings of the International Conference on Multimedia Retrieval (ICMR 2023). ICMR 2023, New York (2023)

    Google Scholar 

  11. Izacard, G., Grave, E.: Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282 (2020)

  12. Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906 (2020)

  13. Laugwitz, B., Held, T., Schrepp, M.: Construction and evaluation of a user experience questionnaire. In: Holzinger, A. (ed.) USAB 2008. LNCS, vol. 5298, pp. 63–76. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89350-9_6

    Chapter  Google Scholar 

  14. Lee, J., Yun, S., Kim, H., Ko, M., Kang, J.: Ranking paragraphs for improving answer recall in open-domain question answering. arXiv preprint arXiv:1810.00494 (2018)

  15. Lee, K., Chang, M.W., Toutanova, K.: Latent retrieval for weakly supervised open domain question answering. arXiv preprint arXiv:1906.00300 (2019)

  16. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019)

  17. Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural. Inf. Process. Syst. 33, 9459–9474 (2020)

    Google Scholar 

  18. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  19. Nguyen, T.N., et al.: Lifeseeker 3.0: an interactive lifelog search engine for LSC 2021. In: Proceedings of the 4th Annual on Lifelog Search Challenge, pp. 41–46 (2021)

    Google Scholar 

  20. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  21. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020)

    MathSciNet  Google Scholar 

  22. Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for squad. arXiv preprint arXiv:1806.03822 (2018)

  23. Spiess, F., Schuldt, H.: Multimodal interactive lifelog retrieval with vitrivr-VR. In: Proceedings of the 5th Annual on Lifelog Search Challenge, pp. 38–42 (2022)

    Google Scholar 

  24. Tran, L.-D., Ho, T.C., Pham, L.A., Nguyen, B., Gurrin, C., Zhou, L.: LLQA - lifelog question answering dataset. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13141, pp. 217–228. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98358-1_18

    Chapter  Google Scholar 

  25. Tran, L.D., Nguyen, B., Zhou, L., Gurrin, C.: Myeachtra: event-based interactive lifelog retrieval system for LSC 2023. In: Proceedings of the 6th Annual ACM Lifelog Search Challenge, pp. 24–29. Association for Computing Machinery, New York (2023)

    Google Scholar 

  26. Tran, L.D., et al.: Comparing interactive retrieval approaches at the lifelog search challenge 2021. IEEE Access 11, 30982–30995 (2023)

    Article  Google Scholar 

  27. Tran, L.D., Nguyen, M.D., Nguyen, B., Lee, H., Zhou, L., Gurrin, C.: E-myscéal: embedding-based interactive lifelog retrieval system for LSC 2022. In: Proceedings of the 5th Annual on Lifelog Search Challenge, pp. 32–37. LSC 2022, Association for Computing Machinery, New York (2022)

    Google Scholar 

  28. Tran, L.D., Nguyen, M.D., Nguyen, B.T., Zhou, L.: Myscéal: a deeper analysis of an interactive lifelog search engine. Multimedia Tools Appl. 82, 1–18 (2023)

    Article  Google Scholar 

  29. Tran, Q.L., Tran, L.D., Nguyen, B., Gurrin, C.: MemoriEase: an interactive lifelog retrieval system for LSC 2023. In: Proceedings of the 6th Annual ACM Lifelog Search Challenge, pp. 30–35 (2023)

    Google Scholar 

  30. Wang, S., et al.: R 3: reinforced ranker-reader for open-domain question answering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  31. Yang, A., Miech, A., Sivic, J., Laptev, I., Schmid, C.: Zero-shot video question answering via frozen bidirectional language models. arXiv preprint arXiv:2206.08155 (2022)

  32. Yang, W., et al.: End-to-end open-domain question answering with BERTserini. arXiv preprint arXiv:1902.01718 (2019)

  33. Yu, J., Wang, Z., Vasudevan, V., Yeung, L., Seyedhosseini, M., Wu, Y.: Coca: contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917 (2022)

  34. Zhou, L., et al.: Overview of the NTCIR-16 lifelog-4 task. In: Proceedings of the 16th NTCIR Conference on Evaluation of Information Access Technologies, pp. 130–135. National Institute of Informatics (2022)

    Google Scholar 

  35. Zhu, F., Lei, W., Wang, C., Zheng, J., Poria, S., Chua, T.S.: Retrieving and reading: a comprehensive survey on open-domain question answering. arXiv preprint arXiv:2101.00774 (2021)

Download references

Acknowledgements

This work was conducted with the financial support of the Science Foundation Ireland Centre for Research Training in Digitally-Enhanced Reality (d-real) under Grant No. 18/CRT/6224. For the purpose of Open Access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ly-Duyen Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tran, LD., Zhou, L., Nguyen, B., Gurrin, C. (2024). Interactive Question Answering for Multimodal Lifelog Retrieval. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14565. Springer, Cham. https://doi.org/10.1007/978-3-031-56435-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56435-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56434-5

  • Online ISBN: 978-3-031-56435-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics