Skip to main content

A Markov Network Based Passage Retrieval Method for Multimodal Question Answering in the Cultural Heritage Domain

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10704))

Abstract

In this paper, we propose a Markov network based graphical framework to perform passage retrieval for multimodal question answering (MQA) with weak supervision in the cultural heritage domain. This framework encodes the dependencies between a question’s feature information and the passage containing its answer, with the assumption that there is a latent alignment between a question and its candidate answer. Experiments on a challenging multi-modal dataset show that this framework achieves an improvement of 5% in terms of mean average precision (mAP) compared with a state-of-the-art method employing the same features namely (i) image match and (ii) word co-occurrence information of a passage and a question. We additionally construct two extended graphical frameworks integrating one more feature, namely (question type)-(named entity) match, into this framework in order to further boost the performance. The performance has been further improved by 2% in terms of mAP in one of the extended models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.europeana.eu/portal/en.

  2. 2.

    https://www.google.com/culturalinstitute/beta/.

  3. 3.

    These full images are obtained by full-image retrieval described in the beginning of Sect. 4.

  4. 4.

    http://cogcomp.cs.illinois.edu/Data/QA/QC/.

  5. 5.

    http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html.

  6. 6.

    20% of the data with 86 question-passage pairs.

  7. 7.

    344 full-image level questions and 385 partial-image level questions.

  8. 8.

    This reason is figured out by manually checking the mAP score for each ‘Who’ question in different models.

References

  1. Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 39–48 (2016)

    Google Scholar 

  2. Chen, T., Van Durme, B.: Discriminative information retrieval for question answering sentence selection. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 719–725 (2017)

    Google Scholar 

  3. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

    Google Scholar 

  4. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, Berlin (2013). https://doi.org/10.1007/978-1-4614-7138-7

    Book  MATH  Google Scholar 

  5. Jayalakshmi, S., Sheshasaayee, A.: Question classification: a review of state-of-the-art algorithms and approaches. Indian J. Sci. Technol. 8(29) (2015)

    Google Scholar 

  6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  7. Lawless, S., Agosti, M., Clough, P., Conlan, O.: Exploration, navigation and retrieval of information in cultural heritage: ENRICH 2013. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, p. 1136 (2013)

    Google Scholar 

  8. Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–7 (2002)

    Google Scholar 

  9. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford coreNLP natural language processing toolkit. In: Proceedings of the Association for Computational Linguistics (System Demonstrations), pp. 55–60 (2014)

    Google Scholar 

  10. Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 472–479. ACM (2005)

    Google Scholar 

  11. Oh, J.H., Torisawa, K., Kruengkrai, C., Iida, R., Kloetzer, J.: Multi-column convolutional neural networks with causality-attention for why-question answering. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining, pp. 415–424 (2017)

    Google Scholar 

  12. Schmidt, M.: UGM: a Matlab toolbox for probabilistic undirected graphical models (2007). https://www.cs.ubc.ca/~schmidtm/Software/UGM.html

  13. Sheng, S., Moens, M.F.: Simple baseline models for multimodal question answering in the cultural heritage domain. In: Busch, C., Sieck, J. (eds.) Kultur und Informatik: Mixed Reality, pp. 119–132. Verlag Werner Hülsbusch, Boizenburg (2017)

    Google Scholar 

  14. Sheng, S., Van Gool, L., Moens, M.F.: A dataset for multimodal question answering in the cultural heritage domain. In: Proceedings of the COLING 2016 Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH). ACL (2016)

    Google Scholar 

  15. Sun, H., Duan, N., Duan, Y., Zhou, M.: Answer extraction from passage graph for question answering. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 2169–2175 (2013)

    Google Scholar 

  16. Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)

    Article  Google Scholar 

  17. Venkitasubramanian, A.N., Tuytelaars, T., Moens, M.F.: Entity linking across vision and language. Multimed. Tools Appl. 76, 22599–22622 (2017)

    Article  Google Scholar 

  18. Voorhees, E.M., et al.: The TREC-8 question answering track report. In: Text REtrieval Conference, pp. 77–82 (1999)

    Google Scholar 

  19. Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: a decade survey of instance retrieval. arXiv preprint arXiv:1608.01807 (2016)

Download references

Acknowledgments

This work is funded by the KU Leuven BOF/IF/RUN/2015. We additionally thank our anonymous reviewers for the helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shurong Sheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sheng, S., Venkitasubramanian, A.N., Moens, MF. (2018). A Markov Network Based Passage Retrieval Method for Multimodal Question Answering in the Cultural Heritage Domain. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73603-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73602-0

  • Online ISBN: 978-3-319-73603-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics