Skip to main content
Log in

Graph convolutional network for difficulty-controllable visual question generation

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

In this article, we address the problem of difficulty-controllable visual question generation, which is to generate questions that satisfy the given difficulty levels based on the images and target answer. The existing approach seems to generate questions following templates. For easy questions, the model presents both answers and it becomes a choice question; while for hard questions, the answer set is not part of the question. In fact, question difficulty should be reflected by the objects and their relationships in the question. Towards this end, we propose a graph-based model with three concrete modules: Difficulty-controllable Graph Convolutional Network (DGCN) module, fusion module and difficulty-controllable decoder, to generate questions with a controllable level of difficulty. We first define a difficulty label based on the difficult index from the education area to represent the difficulty of a question. Next, a DGCN module is proposed to learn image representations that capture relations between objects in an image conditioned on a given difficulty label. Then, we use a fusion module to jointly attend the image representations and answer representations to capture answer-related image features. Finally, a difficulty-controllable decoder combines difficulty information into decoder initialization and input at each time step to control the difficulty of generated questions. Experimental results demonstrate that our framework not only achieves significant improvements on several automatic evaluation metrics, but also can generate difficulty-controllable questions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of supporting data

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Ha, L.A., Yaneva, V., Baldwin, P., Mee, J.: Predicting the difficulty of multiple choice questions in a high-stakes medical exam. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 11–20 (2019)

  2. Chen, F., Xie, J., Cai, Y., Wang, T., Li, Q.: Difficulty-controllable visual question generation. In: U, L.H., Spaniol, M., Sakurai, Y., Chen, J. (eds.) Web and Big Data - 5th International Joint Conference, APWeb-WAIM, vol. 12858, pp. 332–347 (2021)

  3. Lu, P., Ji, L., Zhang,W., Duan, N., Zhou, M.,Wang, J.: R-VQA: learning visual relation facts with semantic attention for visual question answering. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD, pp. 1880–1889 (2018)

  4. Liao, Y., Bing, L., Li, P., Shi, S., Lam, W., Zhang, T.: Quase: Sequence editing under quantifiable guidance. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3855–3864 (2018)

  5. dos Santos, C.N., Melnyk, I., Padhi, I.: Fighting offensive language on social media with unsupervised text style transfer. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 189–194 (2018)

  6. Heilman, M., Smith, N.A.: Good question! statistical ranking for question generation. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, pp. 609–617 (2010)

  7. Lindberg, D., Popowich, F., Nesbit, J.C., Winne, P.H.: Generating natural language questions to support learning on-line. In: Proceedings of the 14th European Workshop on Natural Language Generation, pp. 105–114 (2013)

  8. Labutov, I., Basu, S., Vanderwende, L.: Deep questions without deep understanding. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (Volume 1: Long Papers), pp. 889–898 (2015)

  9. Zhao, Y., Ni, X., Ding, Y., Ke, Q.: Paragraph-level neural question generation with maxout pointer and gated self-attention networks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3901–3910 (2018)

  10. Zhou, W., Zhang, M., Wu, Y.: Question-type driven question generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 6031–6036 (2019)

  11. Li, J., Gao, Y., Bing, L., King, I., Lyu, M.R.: Improving question generation with to the point context. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3214–3224 (2019)

  12. Nema, P., Mohankumar, A.K., Khapra, M.M., Srinivasan, B.V., Ravindran, B.: Let’s ask again: Refine network for automatic question generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3312–3321 (2019)

  13. Scialom, T., Piwowarski, B., Staiano, J.: Self-attention architectures for answeragnostic neural question generation. In: Proceedings of the 57th Conference of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 6027–6032 (2019)

  14. Tuan, L.A., Shah, D.J., Barzilay, R.: Capturing greater context for question generation. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 9065–9072 (2020)

  15. Du, X., Shao, J., Cardie, C.: Learning to ask: Neural question generation for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1342–1352 (2017)

  16. [16] Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: A preliminary study. In: Natural Language Processing and Chinese Computing - 6th CCF International Conference, vol. 10619, pp. 662–671 (2017)

  17. Kim, Y., Lee, H., Shin, J., Jung, K.: Improving neural question generation using answer separation. In: The Thirty-Third AAAI Conference on Artificial Intelligence, pp. 6602–6609 (2019)

  18. Ma, X., Zhu, Q., Zhou, Y., Li, X.: Improving question generation with sentencelevel semantic matching and answer position inferring. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 8464–8471 (2020)

  19. Gao, Y., Bing, L., Chen,W., Lyu, M.R., King, I.: Difficulty controllable generation of reading comprehension questions. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 4968–4974 (2019)

  20. Kumar, V., Hua, Y., Ramakrishnan, G., Qi, G., Gao, L., Li, Y.: Difficultycontrollable multi-hop question generation from knowledge graphs. In: The 18th International Semantic Web Conference, vol. 11778, pp. 382–398 (2019)

  21. Li, X., Zhou, Z., Chen, L., Gao, L.: Residual attention-based lstm for video captioning. World Wide Web 22(2), 621–636 (2019)

    Article  Google Scholar 

  22. Tian, H., Tao, Y., Pouyanfar, S., Chen, S.-C., Shyu, M.-L.: Multimodal deep representation learning for video classification. World Wide Web 22(3), 1325–1341 (2019)

    Article  Google Scholar 

  23. Chen, J., Zhang, S., Zeng, J., Zou, F., Li, Y.-F., Liu, T., Lu, P.: Multi-level, multimodal interactions for visual question answering over text in images. World Wide Web, 1–17 (2021)

  24. Chaisorn, L., Chua, T.-S., Lee, C.-H.: A multi-modal approach to story segmentation for news video. World Wide Web 6(2), 187–208 (2003)

    Article  Google Scholar 

  25. Zhang, Z., Wang, Z., Li, X., Liu, N., Guo, B., Yu, Z.: Modalnet: an aspectlevel sentiment classification model by exploring multimodal data with fusion discriminant attentional network. World Wide Web 24(6), 1957–1974 (2021)

    Article  Google Scholar 

  26. Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: Advances in Neural Information Processing Systems, pp. 2953–2961 (2015)

  27. Mostafazadeh, N., Misra, I., Devlin, J., Mitchell, M., He, X., Vanderwende, L.: Generating natural questions about an image. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers), pp. 1802–1813 (2016)

  28. Krishna, R., Bernstein, M., Fei-Fei, L.: Information maximizing visual question generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2008–2018 (2019)

  29. Xu, X., Wang, T., Yang, Y., Hanjalic, A., Shen, H.T.: Radial graph convolutional network for visual question generation. IEEE Trans. Neural Netw. Learn. Syst. 32(4), 1654–1667 (2021)

    Article  Google Scholar 

  30. Zhang, S., Qu, L., You, S., Yang, Z., Zhang, J.: Automatic generation of grounded visual questions. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 4235–4243 (2017)

  31. Fan, Z., Wei, Z., Li, P., Lan, Y., Huang, X.: A question type driven framework to diversify visual question generation. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 4048–4054 (2018)

  32. Jain, U., Zhang, Z., Schwing, A.G.: Creativity: Generating diverse questions using variational autoencoders. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5415–5424 (2017)

  33. Li, Y., Duan, N., Zhou, B., Chu, X., Ouyang, W., Wang, X., Zhou, M.: Visual question generation as dual task of visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6116–6124 (2018)

  34. Jain, U., Lazebnik, S., Schwing, A.G.: Two can play this game: visual dialog with discriminative question generation and answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5754–5763 (2018)

  35. Teney, D., Anderson, P., He, X., van den Hengel, A.: Tips and tricks for visual question answering: Learnings from the 2017 challenge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4223–4232(2018)

  36. Teney, D., Liu, L., van den Hengel, A.: Graph-structured representations for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3233–3241 (2017)

  37. Zheng, C., Wu, Z., Wang, T., Yi, C., Li, Q.: Object-aware multimodal named entity recognition in social media posts with adversarial learning. IEEE Transactions on Multimedia (2020)

  38. Egly, R., Driver, J., Rafal, R.D.: Shifting visual attention between objects and locations: evidence from normal and parietal lesion subjects. J Exp Psychol Gen 123(2), 161 (1994)

    Article  Google Scholar 

  39. Scholl, B.J.: Objects and attention: The state of the art. Cognition 80(1–2), 1–46 (2001)

    Article  Google Scholar 

  40. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (2015)

  41. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  42. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

  43. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L., Shamma, D.A., Bernstein, M.S., Fei-Fei, L.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)

    Article  MathSciNet  Google Scholar 

  44. Huang, Q., Wei, J., Cai, Y., Zheng, C., Chen, J., Leung, H., Li, Q.: Aligned dual channel graph convolutional network for visual question answering. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7166–7176 (2020)

  45. Norcliffe-Brown,W., Vafeias, S., Parisot, S.: Learning conditioned graph structures for interpretable visual question answering. In: Advances in Neural Information Processing Systems, pp. 8344–8353 (2018)

  46. Monti, F., Boscaini, D., Masci, J., Rodol‘a, E., Svoboda, J., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5425–5434 (2017)

  47. Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6325–6334 (2017)

  48. Desai, T., Moldovan, D.I.: Towards predicting difficulty of reading comprehension questions. In: Proceedings of the Thirty-Second International Florida Artificial Intelligence Research Society Conference, pp. 8–13 (2019)

  49. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)

  50. Kim, J., Jun, J., Zhang, B.: Bilinear attention networks. In: Advances in Neural Information Processing Systems, pp. 1571–1581 (2018)

  51. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations (2015)

  52. Papineni, K., Roukos, S.,Ward, T., Zhu,W.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

  53. Denkowski, M.J., Lavie, A.: Meteor universal: Language specific translation evaluation for any target language. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 376–380 (2014)

  54. Lin, C.-Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)

  55. Gao, P., Jiang, Z., You, H., Lu, P., Hoi, S.C.H., Wang, X., Li, H.: Dynamic fusion with intra- and inter-modality attention flow for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6639–6648 (2019)

  56. Zhang, Y., Hare, J.S., Prügel-Bennett, A.: Learning to count objects in natural images for visual question answering. In: 6th International Conference on Learning Representations (2018)

Download references

Acknowledgements

I am over helmed in all humbleness and gratefulness to acknowledge my depth to all those who have helped me to put these ideas, well above the level of simplicity and into something concrete.

Funding

This work was supported by the National Natural Science Foundation of China (62076100), Fundamental Research Funds for the Central Universities, SCUT (x2rjD2230080), the Science and Technology Planning Project of Guangdong Province (2020B0101100002), CAAI-Huawei MindSpore Open Fund, CCF-Zhipu AI Large Model Fund.

Author information

Authors and Affiliations

Authors

Contributions

FC and JX participated equally in study design, data collection, analyses, and drafting of the manuscript. ZL, QL and TW provided statistical advice, and helped revise the manuscript. YC was the corresponding author, supervised the study, and helped revise the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yi Cai.

Ethics declarations

Ethical Approval and Consent to participate

Not applicable.

Human and Animal Ethics

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Feng Chen and Jiayuan Xie contributed equally to this work

This article belongs to the Topical Collection: APWeb-WAIM 2021 Guest Editors: Yi Cai, Leong Hou U, Marc Spaniol and Yasushi Sakurai.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, F., Xie, J., Cai, Y. et al. Graph convolutional network for difficulty-controllable visual question generation. World Wide Web 26, 3735–3757 (2023). https://doi.org/10.1007/s11280-023-01202-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-023-01202-x

Keywords

Navigation