Abstract
In this paper, we propose a Markov network based graphical framework to perform passage retrieval for multimodal question answering (MQA) with weak supervision in the cultural heritage domain. This framework encodes the dependencies between a question’s feature information and the passage containing its answer, with the assumption that there is a latent alignment between a question and its candidate answer. Experiments on a challenging multi-modal dataset show that this framework achieves an improvement of 5% in terms of mean average precision (mAP) compared with a state-of-the-art method employing the same features namely (i) image match and (ii) word co-occurrence information of a passage and a question. We additionally construct two extended graphical frameworks integrating one more feature, namely (question type)-(named entity) match, into this framework in order to further boost the performance. The performance has been further improved by 2% in terms of mAP in one of the extended models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
These full images are obtained by full-image retrieval described in the beginning of Sect. 4.
- 4.
- 5.
- 6.
20% of the data with 86 question-passage pairs.
- 7.
344 full-image level questions and 385 partial-image level questions.
- 8.
This reason is figured out by manually checking the mAP score for each ‘Who’ question in different models.
References
Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 39–48 (2016)
Chen, T., Van Durme, B.: Discriminative information retrieval for question answering sentence selection. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 719–725 (2017)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, Berlin (2013). https://doi.org/10.1007/978-1-4614-7138-7
Jayalakshmi, S., Sheshasaayee, A.: Question classification: a review of state-of-the-art algorithms and approaches. Indian J. Sci. Technol. 8(29) (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Lawless, S., Agosti, M., Clough, P., Conlan, O.: Exploration, navigation and retrieval of information in cultural heritage: ENRICH 2013. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, p. 1136 (2013)
Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7 (2002)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford coreNLP natural language processing toolkit. In: Proceedings of the Association for Computational Linguistics (System Demonstrations), pp. 55–60 (2014)
Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 472–479. ACM (2005)
Oh, J.H., Torisawa, K., Kruengkrai, C., Iida, R., Kloetzer, J.: Multi-column convolutional neural networks with causality-attention for why-question answering. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining, pp. 415–424 (2017)
Schmidt, M.: UGM: a Matlab toolbox for probabilistic undirected graphical models (2007). https://www.cs.ubc.ca/~schmidtm/Software/UGM.html
Sheng, S., Moens, M.F.: Simple baseline models for multimodal question answering in the cultural heritage domain. In: Busch, C., Sieck, J. (eds.) Kultur und Informatik: Mixed Reality, pp. 119–132. Verlag Werner Hülsbusch, Boizenburg (2017)
Sheng, S., Van Gool, L., Moens, M.F.: A dataset for multimodal question answering in the cultural heritage domain. In: Proceedings of the COLING 2016 Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH). ACL (2016)
Sun, H., Duan, N., Duan, Y., Zhou, M.: Answer extraction from passage graph for question answering. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 2169–2175 (2013)
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)
Venkitasubramanian, A.N., Tuytelaars, T., Moens, M.F.: Entity linking across vision and language. Multimed. Tools Appl. 76, 22599–22622 (2017)
Voorhees, E.M., et al.: The TREC-8 question answering track report. In: Text REtrieval Conference, pp. 77–82 (1999)
Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: a decade survey of instance retrieval. arXiv preprint arXiv:1608.01807 (2016)
Acknowledgments
This work is funded by the KU Leuven BOF/IF/RUN/2015. We additionally thank our anonymous reviewers for the helpful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Sheng, S., Venkitasubramanian, A.N., Moens, MF. (2018). A Markov Network Based Passage Retrieval Method for Multimodal Question Answering in the Cultural Heritage Domain. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-73603-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)