Difficulty-Controllable Visual Question Generation

Chen, Feng; Xie, Jiayuan; Cai, Yi; Wang, Tao; Li, Qing

doi:10.1007/978-3-030-85896-4_26

Feng Chen¹²,
Jiayuan Xie¹²,
Yi Cai¹²,
Tao Wang¹³ &
…
Qing Li¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12858))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

1834 Accesses
6 Citations

Abstract

Visual Question Generation (VQG) aims to generate questions from images. Existing studies on this topic focus on generating questions solely based on images while neglecting the difficulty of questions. However, to engage users, an automated question generator should produce questions with a level of difficulty that are tailored to a user’s capabilities and experience. In this paper, we propose a Difficulty-controllable Generation Network (DGN) to alleviate this limitation. We borrow difficulty index from education area to define a difficulty variable for representing the difficulty of questions, and fuse it into our model to guide the difficulty-controllable question generation. Experimental results demonstrate that our proposed model not only achieves significant improvements on several automatic evaluation metrics, but also can generate difficulty-controllable questions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Graph convolutional network for difficulty-controllable visual question generation

Article 11 September 2023

Visual Question Generation Under Multi-granularity Cross-Modal Interaction

Look, Read and Ask: Learning to Ask Questions by Reading Text in Images

References

Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR, pp. 6077–6086 (2018)
Google Scholar
Denkowski, M.J., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: WMT@ACL, pp. 376–380 (2014)
Google Scholar
Desai, T., Moldovan, D.I.: Towards predicting difficulty of reading comprehension questions. In: FLAIRS Conference, pp. 8–13 (2019)
Google Scholar
dos Santos, C.N., Melnyk, I., Padhi, I.: Fighting offensive language on social media with unsupervised text style transfer. In: ACL, pp. 189–194 (2018)
Google Scholar
Du, X., Shao, J., Cardie, C.: Learning to ask: neural question generation for reading comprehension. In: ACL, pp. 1342–1352 (2017)
Google Scholar
Egly, R., Driver, J., Rafal, R.D.: Shifting visual attention between objects and locations: evidence from normal and parietal lesion subjects. J. Exper. Psychol. Gen. 123(2), 161–77 (1994)
Article Google Scholar
Fan, Z., Wei, Z., Li, P., Lan, Y., Huang, X.: A question type driven framework to diversify visual question generation. In: Lang, J. (ed.) IJCAI, pp. 4048–4054 (2018)
Google Scholar
Gao, Y., Bing, L., Chen, W., Lyu, M.R., King, I.: Difficulty controllable generation of reading comprehension questions. In: IJCAI, pp. 4968–4974 (2019)
Google Scholar
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: CVPR, pp. 6325–6334 (2017)
Google Scholar
Ha, L.A., Yaneva, V., Baldwin, P., Mee, J.: Predicting the difficulty of multiple choice questions in a high-stakes medical exam. In: BEA@ACL, pp. 11–20 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Heilman, M., Smith, N.A.: Good question! statistical ranking for question generation. In: HLT-NAACL, pp. 609–617 (2010)
Google Scholar
Jain, U., Lazebnik, S., Schwing, A.G.: Two can play this game: visual dialog with discriminative question generation and answering. In: CVPR, pp. 5754–5763 (2018)
Google Scholar
Jain, U., Zhang, Z., Schwing, A.G.: Creativity: generating diverse questions using variational autoencoders. In: CVPR, pp. 5415–5424 (2017)
Google Scholar
Kim, J., Jun, J., Zhang, B.: Bilinear attention networks. In: NIPS, pp. 1571–1581 (2018)
Google Scholar
Kim, Y., Lee, H., Shin, J., Jung, K.: Improving neural question generation using answer separation. AAAI 33, 6602–6609 (2019)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Krishna, R., Bernstein, M., Fei-Fei, L.: Information maximizing visual question generation. In: CVPR, pp. 2008–2018 (2019)
Google Scholar
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123, 32–73 (2017)
Article MathSciNet Google Scholar
Kumar, V., Hua, Y., Ramakrishnan, G., Qi, G., Gao, L., Li, Y.: Difficulty-controllable multi-hop question generation from knowledge graphs. ISWC 11778, 382–398 (2019)
Google Scholar
Kunichika, H., Katayama, T., Hirashima, T., Takeuchi, A.: Automated question generation methods for intelligent English learning systems and its evaluation. In: Proceedings of ICCE (2004)
Google Scholar
Labutov, I., Basu, S., Vanderwende, L.: Deep questions without deep understanding. In: ACL, pp. 889–898 (2015)
Google Scholar
Li, J., Gao, Y., Bing, L., King, I., Lyu, M.R.: Improving question generation with to the point context. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) EMNLP-IJCNLP, pp. 3214–3224 (2019)
Google Scholar
Li, X., Zhou, Z., Chen, L., Gao, L.: Residual attention-based LSTM for video captioning. World Wide Web 22(2), 621–636 (2019)
Article Google Scholar
Li, Y., et al.: Visual question generation as dual task of visual question answering. In: CVPR, pp. 6116–6124 (2018)
Google Scholar
Liao, Y., Bing, L., Li, P., Shi, S., Lam, W., Zhang, T.: Quase: sequence editing under quantifiable guidance. In: EMNLP, pp. 3855–3864 (2018)
Google Scholar
Lin, C.: ROUGE: a package for automatic evaluation of summaries, pp. 74–81 (2004)
Google Scholar
Lindberg, D., Popowich, F., Nesbit, J.C., Winne, P.H.: Generating natural language questions to support learning on-line. In: ENLG, pp. 105–114 (2013)
Google Scholar
Ma, X., Zhu, Q., Zhou, Y., Li, X.: Improving question generation with sentence-level semantic matching and answer position inferring. In: AAAI, pp. 8464–8471 (2020)
Google Scholar
Mostafazadeh, N., Misra, I., Devlin, J., Mitchell, M., He, X., Vanderwende, L.: Generating natural questions about an image. In: ACL (2016)
Google Scholar
Nema, P., Mohankumar, A.K., Khapra, M.M., Srinivasan, B.V., Ravindran, B.: Let’s ask again: refine network for automatic question generation. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) EMNLP-IJCNLP, pp. 3312–3321 (2019)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)
Google Scholar
Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: NIPS, pp. 2953–2961 (2015)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Scholl, B.J.: Objects and attention: the state of the art. Cognition 80(1–2), 1–46 (2001)
Article Google Scholar
Scialom, T., Piwowarski, B., Staiano, J.: Self-attention architectures for answer-agnostic neural question generation. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) ACL, pp. 6027–6032 (2019)
Google Scholar
Sharma, S., El Asri, L., Schulz, H., Zumer, J.: Relevance of unsupervised metrics in task-oriented dialogue for evaluating natural language generation. arXiv:1706.09799 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Teney, D., Anderson, P., He, X., van den Hengel, A.: Tips and tricks for visual question answering: learnings from the 2017 challenge. In: CVPR 2018, pp. 4223–4232 (2017)
Google Scholar
Teney, D., Liu, L., van den Hengel, A.: Graph-structured representations for visual question answering. In: CVPR, pp. 3233–3241 (2017)
Google Scholar
Tian, H., Tao, Y., Pouyanfar, S., Chen, S.-C., Shyu, M.-L.: Multimodal deep representation learning for video classification. World Wide Web 22(3), 1325–1341 (2019)
Article Google Scholar
Tuan, L.A., Shah, D.J., Barzilay, R.: Capturing greater context for question generation. In: AAAI, pp. 9065–9072 (2020)
Google Scholar
Wajeeha, D., et al.: Difficulty index, discrimination index and distractor efficiency in multiple choice questions. Ann. PIMS 4 (2018). ISSN:1815–2287
Google Scholar
Xu, X., He, L., Lu, H., Gao, L., Ji, Y.: Deep adversarial metric learning for cross-modal retrieval. World Wide Web 22(2), 657–672 (2019)
Article Google Scholar
Zhang, S., Qu, L., You, S., Yang, Z., Zhang, J.: Automatic generation of grounded visual questions. In: Sierra, C. (ed.) IJCAI, pp. 4235–4243 (2017)
Google Scholar
Zhao, Y., Ni, X., Ding, Y., Ke, Q.: Paragraph-level neural question generation with maxout pointer and gated self-attention networks. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) EMNLP, pp. 3901–3910 (2018)
Google Scholar
Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: a preliminary study. NLPCC 10619, 662–671 (2017)
Google Scholar
Zhou, W., Zhang, M., Wu, Y.: Question-type driven question generation. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) EMNLP-IJCNLP, pp. 6031–6036 (2019)
Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 62076100), National Key Research and Development Program of China (Standard knowledge graph for epidemic prevention and production recovering intelligent service platform and its applications), the Fundamental Research Funds for the Central Universities, SCUT (No. D2201300, D2210010), the Science and Technology Programs of Guangzhou (201902010046), the Science and Technology Planning Project of Guangdong Province (No. 2020B0101100002).

Author information

Authors and Affiliations

School of Software Engineering, South China University of Technology, Guangzhou China and the Key Laboratory of Big Data and Intelligent Robot, Guangzhou, China
Feng Chen, Jiayuan Xie & Yi Cai
Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
Tao Wang
Department of Biostatistics and Health Informatics, King’s College London, London, UK
Qing Li

Authors

Feng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiayuan Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yi Cai
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Cai .

Editor information

Editors and Affiliations

University of Macau, Macau, China
Leong Hou U
University of Caen Normandie, Caen, France
Marc Spaniol
Osaka University, Osaka, Japan
Yasushi Sakurai
South China University of Technology, Guangzhou, China
Junying Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, F., Xie, J., Cai, Y., Wang, T., Li, Q. (2021). Difficulty-Controllable Visual Question Generation. In: U, L.H., Spaniol, M., Sakurai, Y., Chen, J. (eds) Web and Big Data. APWeb-WAIM 2021. Lecture Notes in Computer Science(), vol 12858. Springer, Cham. https://doi.org/10.1007/978-3-030-85896-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-85896-4_26
Published: 19 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85895-7
Online ISBN: 978-3-030-85896-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics