Abstract
The launch of the new Google News in 2018 (https://www.blog.google/products/news/new-google-news-ai-meets-human-intelligence/.) introduced the Frequently asked questions feature to structurally summarize the news story in its full coverage page. While news summarization has been a research topic for decades, this new feature is poised to usher in a new line of news summarization techniques. There are two fundamental approaches: mining the questions from data associated with the news story and learning the questions from the content of the story directly. This paper provides the first study, to the best of our knowledge, of a learning based approach to generate a structured summary of news articles with question and answer pairs to capture salient and interesting aspects of the news story. Specifically, this learning-based approach reads a news article, predicts its attention map (i.e., important snippets in the article), and generates multiple natural language questions corresponding to each snippet. Furthermore, we describe a mining-based approach as the mechanism to generate weak supervision data for training the learning based approach. We evaluate our approach on the existing SQuAD dataset (https://rajpurkar.github.io/SQuAD-explorer/.) and a large dataset with 91K news articles we constructed. We show that our proposed system can achieve an AUC of 0.734 for document attention map prediction, a BLEU-4 score of 12.46 for natural question generation and a BLEU-4 score of 24.4 for question summarization, beating state-of-art baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Private communication with Google’s news team: FAQ is shown to improve users’ understanding of the news stories in user studies, which is an important launch criteria.
- 2.
Private communication.
- 3.
- 4.
- 5.
Note there can be multiple questions with the same answer snippet, for example, another question candidate could be: Under which name is the Black Eagle Brewery also known? Our learning based approach can learn those diverse questions provided that the training data captures the same diversity.
- 6.
- 7.
References
Angeli, G., Premkumar, M.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: ACL (2015)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Chen, D., Fisch, A., Weston, J., Bordes, A.: Read wikipedia to answer open-domain questions. In: ACL (2017)
Du, X., Cardie, C.: Identifying where to focus in reading comprehension for neural question generation. In: EMNLP (2017)
Erkan, G., Radev, D.R.: Centroid-based summarization of multiple documents: sentence extraction, utility based evaluation, and user studies. In: NAACL-ANLP Workshop on Automatic Summarization (2000)
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. In: JAIR (2004)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: EMNLP (2011)
Fader, A., Zettlemoyer, L., Etzioni, O.: Paraphrase-driven learning for open question answering. In: ACL (2013)
Feng, X., Huang, L., Tang, D., Qin, B., Ji, H., Liu, T.: A language-independent neural network for event detection. In: ACL (2016)
Gu, J., Lu, Z., Li, H., Li, V.O.: Incorporating copying mechanism in sequence-to-sequence learning. In: ACL (2016)
Höffner, K., Walter, S., Marx, E., Usbeck, R., Lehmann, J., Ngonga Ngomo, A.C.: Survey on challenges of question answering in the semantic web. Semant. Web 8(6), 895–920 (2017)
Joachims, T.: Optimizing search engines using clickthrough data. In: KDD (2002)
Kedzie, C., Diaz, F., McKeown, K.: Real-time web scale event summarization using sequential decision making. In: IJCAI (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Kolomiyets, O., Moens, M.F.: A survey on question answering technology from an information retrieval perspective. Inf. Sci. 181(24), 5412–5434 (2011)
Koutra, D., Bennett, P.N., Horvitz, E.: Events and controversies: influences of a shocking news event on information seeking. In: WWW (2015)
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: ACL (July 2004). https://www.microsoft.com/en-us/research/publication/rouge-a-package-for-automatic-evaluation-of-summaries/
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: EMNLP (2015)
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. In: LREC (2018)
Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Mining Text Data, pp. 43–76. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_3
Nguyen, D.B., Abujabal, A., Tran, K., Theobald, M., Weikum, G.: Query-driven on-the-fly knowledge base construction. In: VLDB (2017)
Nguyen, T.H., Cho, K., Grishman, R.: Joint event extraction via recurrent neural networks. In: NAACL (2016)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: EMNLP (2016)
See, A., Liu, P., Manning, C.: Get to the point: summarization with pointer-generator networks. In: ACL (2017)
Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. In: ICLR (2017)
Shen, C., Liu, F., Weng, F., Li, T.: A participant-based approach for event summarization using twitter streams. In: NAACL-HLT (2013)
Upstill, T.: The new Google news: AI meets human intelligence (2018). https://www.blog.google/products/news/new-google-news-ai-meets-human-intelligence/
Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 multilingual training corpus (February 2006). https://catalog.ldc.upenn.edu/ldc2006t06
Yu, A.W., et al.: QANet: Combining local convolution with global self-attention for reading comprehension. In: ICLR (2018)
Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: a preliminary study. In: NLPCC (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, X., Yu, C. (2019). Summarizing News Articles Using Question-and-Answer Pairs via Learning. In: Ghidini, C., et al. The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science(), vol 11778. Springer, Cham. https://doi.org/10.1007/978-3-030-30793-6_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-30793-6_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30792-9
Online ISBN: 978-3-030-30793-6
eBook Packages: Computer ScienceComputer Science (R0)