A Markov Network Based Passage Retrieval Method for Multimodal Question Answering in the Cultural Heritage Domain

Sheng, Shurong; Venkitasubramanian, Aparna Nurani; Moens, Marie-Francine

doi:10.1007/978-3-319-73603-7_1

A Markov Network Based Passage Retrieval Method for Multimodal Question Answering in the Cultural Heritage Domain

Shurong Sheng²¹,
Aparna Nurani Venkitasubramanian²² &
Marie-Francine Moens²¹

Conference paper
First Online: 13 January 2018

3362 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10704))

Abstract

In this paper, we propose a Markov network based graphical framework to perform passage retrieval for multimodal question answering (MQA) with weak supervision in the cultural heritage domain. This framework encodes the dependencies between a question’s feature information and the passage containing its answer, with the assumption that there is a latent alignment between a question and its candidate answer. Experiments on a challenging multi-modal dataset show that this framework achieves an improvement of 5% in terms of mean average precision (mAP) compared with a state-of-the-art method employing the same features namely (i) image match and (ii) word co-occurrence information of a passage and a question. We additionally construct two extended graphical frameworks integrating one more feature, namely (question type)-(named entity) match, into this framework in order to further boost the performance. The performance has been further improved by 2% in terms of mAP in one of the extended models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.europeana.eu/portal/en.
2.
https://www.google.com/culturalinstitute/beta/.
3.
These full images are obtained by full-image retrieval described in the beginning of Sect. 4.
4.
http://cogcomp.cs.illinois.edu/Data/QA/QC/.
5.
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html.
6.
20% of the data with 86 question-passage pairs.
7.
344 full-image level questions and 385 partial-image level questions.
8.
This reason is figured out by manually checking the mAP score for each ‘Who’ question in different models.

References

Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 39–48 (2016)
Google Scholar
Chen, T., Van Durme, B.: Discriminative information retrieval for question answering sentence selection. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 719–725 (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Google Scholar
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, Berlin (2013). https://doi.org/10.1007/978-1-4614-7138-7
Book MATH Google Scholar
Jayalakshmi, S., Sheshasaayee, A.: Question classification: a review of state-of-the-art algorithms and approaches. Indian J. Sci. Technol. 8(29) (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Lawless, S., Agosti, M., Clough, P., Conlan, O.: Exploration, navigation and retrieval of information in cultural heritage: ENRICH 2013. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, p. 1136 (2013)
Google Scholar
Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7 (2002)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford coreNLP natural language processing toolkit. In: Proceedings of the Association for Computational Linguistics (System Demonstrations), pp. 55–60 (2014)
Google Scholar
Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 472–479. ACM (2005)
Google Scholar
Oh, J.H., Torisawa, K., Kruengkrai, C., Iida, R., Kloetzer, J.: Multi-column convolutional neural networks with causality-attention for why-question answering. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining, pp. 415–424 (2017)
Google Scholar
Schmidt, M.: UGM: a Matlab toolbox for probabilistic undirected graphical models (2007). https://www.cs.ubc.ca/~schmidtm/Software/UGM.html
Sheng, S., Moens, M.F.: Simple baseline models for multimodal question answering in the cultural heritage domain. In: Busch, C., Sieck, J. (eds.) Kultur und Informatik: Mixed Reality, pp. 119–132. Verlag Werner Hülsbusch, Boizenburg (2017)
Google Scholar
Sheng, S., Van Gool, L., Moens, M.F.: A dataset for multimodal question answering in the cultural heritage domain. In: Proceedings of the COLING 2016 Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH). ACL (2016)
Google Scholar
Sun, H., Duan, N., Duan, Y., Zhou, M.: Answer extraction from passage graph for question answering. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 2169–2175 (2013)
Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)
Article Google Scholar
Venkitasubramanian, A.N., Tuytelaars, T., Moens, M.F.: Entity linking across vision and language. Multimed. Tools Appl. 76, 22599–22622 (2017)
Article Google Scholar
Voorhees, E.M., et al.: The TREC-8 question answering track report. In: Text REtrieval Conference, pp. 77–82 (1999)
Google Scholar
Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: a decade survey of instance retrieval. arXiv preprint arXiv:1608.01807 (2016)

Download references

Acknowledgments

This work is funded by the KU Leuven BOF/IF/RUN/2015. We additionally thank our anonymous reviewers for the helpful comments.

Author information

Authors and Affiliations

Department of Computer Science, KU Leuven, 3001, Leuven, Belgium
Shurong Sheng & Marie-Francine Moens
Department of Electrical Engineering (ESAT), KU Leuven, 3001, Leuven, Belgium
Aparna Nurani Venkitasubramanian

Authors

Shurong Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Aparna Nurani Venkitasubramanian
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Francine Moens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shurong Sheng .

Editor information

Editors and Affiliations

Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Klaus Schoeffmann
Chulalongkorn University, Bangkok, Thailand
Thanarat H. Chalidabhongse
City University of Hong Kong, Hong Kong, China
Chong Wah Ngo
Chulalongkorn University, Bangkok, Thailand
Supavadee Aramvith
Dublin City University, Dublin, Ireland
Noel E. O’Connor
Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Tampere University of Technology, Tampere, Finland
Moncef Gabbouj
Rutgers University, Piscataway, New Jersey, USA
Ahmed Elgammal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sheng, S., Venkitasubramanian, A.N., Moens, MF. (2018). A Markov Network Based Passage Retrieval Method for Multimodal Question Answering in the Cultural Heritage Domain. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-73603-7_1
Published: 13 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics