Abstract
Information Extraction (IE) and Summarization share the same goal of extracting and presenting the relevant information of a document. While IE was a primary element of early abstractive summarization systems, it’s been left out in more recent extractive systems. However, extracting facts, recognizing entities and events should provide useful information to those systems and help resolve semantic ambiguities that they cannot tackle. This paper explores novel approaches to taking advantage of cross-document IE for multi-document summarization. We propose multiple approaches to IE-based summarization and analyze their strengths and weaknesses. One of them, re-ranking the output of a high performing summarization system with IE-informed metrics, leads to improvements in both manually-evaluated content quality and readability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Banko, M., Cafarella, M.J., Soderland, S., Etzioni, O.: Open information extraction from the web. In: Proceeding of the International Joint Conferences on Artificial Intelligence (IJCAI 2007), Hyderabad (2007)
Banko, M., Etzioni, O.: The tradeoffs between open and traditional relation extraction. In: Proceeding of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2008), Columbus (2008)
Bellare, K., Sarma, A.D., Loiwal, N., Mehta, V., Ramakrishnan, G., Bhattacharyya, P.: Generic text summarization using wordNet. In: Proceeding of the 4th International Conference on Language Resource and Evaluation (LREC2004), Lisbon (2004)
Biadsy, F., Hirschberg, J., Filatova, E.: An unsupervised approach to biography production using wikipedia. In: Proceeding of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2008), Columbus, pp. 807–815. (2008)
Bollacker, K., Cook, R., Tufts, P.: Freebase: a shared database of structured general human knowledge. In: Proceeding of the National Conference on Artificial Intelligence, Vancouver, vol. 2 (2007)
Callison-Burch, C.: Syntactic constraints on paraphrases extracted from parallel corpora. In: Proceeding of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), Honolulu (2008)
Chaves, R.P.: WordNet and automated text summarization. In: Proceeding of the 6th Natural Language Processing Pacific Rim Symposium, Tokyo (2001)
Chen, Z., Tamang, S., Lee, A., Li, X., Lin, W., Artiles, J., Snover, M., Passantino, M., Ji, H.: CUNY-BLENDER TAC-KBP2010 entity linking and slot filling system description. In: Proceeding of the Text Analysis Conference (TAC2010), City University of New York (2010)
Dang, C., Luo, X., Zhang, H.: Wordnet-based summarization of unstructured document. J. WSEAS Trans. Comput. 7(9), 1467–1472 (2008)
Dang, H. T., Owczarzak, K.: Overview of the TAC 2009 summarization track. In: Proceeding of the Text Analysis Conference (TAC 2009), NIST (2009)
Fellbaum, C. (ed.). WordNet: An Electronic Lexical Database. MIT, Cambridge (1998)
Filatova, E., Hatzivassiloglou, V.: A formal model for information selection in multi-sentence text extraction. In: Proceeding of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva (2004)
Gillick, D., Favre, B., Hakkani-Tur, D., Bohnet, B., Liu, Y., Xie, S.: The ICSI/UTD summarization system at TAC 2009. In: Proceeding of the Text Analysis Conference (TAC 2009), NIST (2009)
Grishman, R., Hobbs, J., Hovy, E., Sanfilippo, A., Wilks, Y: Cross-lingual information extraction and automated text summarization. Linguist. Comput. XIV–XV (1997)
Grishman, R., Sundheim, B.: Message understanding conference - 6: a brief history. In: Proceeding of the 16th International Conference on Computational Linguistics (COLING 1996), Copenhagen, pp. 466–471. (1996)
Grishman, R., Westbrook, D., Meyers, A.: NYUs Chinese ACE 2005 EDR system description. In: Proceeding of the NIST Automatic Content Extraction Workshop (ACE2005) (2005)
Hachey, B.: Multi-document summarisation using generic relation extraction. In: Proceeding of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), Singapore, pp. 420–429. (2009)
Ji, H., Grishman, R.: Refining event extraction through cross-document inference. In: Proceeding of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2008), Columbus (2008)
Ji, H., Grishman, R., Chen, Z., Gupta, P.: Cross-document event extraction, ranking and tracking. In: Proceeding of the Recent Advances in Natural Language Processing (RANLP 2009), Borovets, pp. 166–172. (2009)
Ji, H., Grishman, R., Dang, H. T., Griffitt, K., Ellis, J.: An overview of the TAC2010 knowledge base population track. In: Proceeding of the Text Analysis Conference (TAC2010), Gaithersburg (2010)
Lin, C., Hovy, E.: Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceeding of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2003), Edmonton, pp. 150–156. (2003)
Liu, F., Liu, Y.: From extractive to abstractive meeting summaries: can it be done by sentence compression? In: Proceeding of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), Singapore (2009)
McKeown, K., Passonneau, R., Elson, D., Nenkova, A., Hirschberg, J.: Do summaries help? A task-based evaluation of multi-document summarization. In: Proceeding of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador (2005)
Melli, G., Shi, Z., Wang, Y., Liu, Y., Sarkar, A., Popowich, F.: Description of SQUASH, the SFU question answering summary handler for the DUC-2006 summarization task. In: Proceeding of the Document Understanding Conference (DUC 2006), Brooklyn (2006)
Melli, G., Wang, Y., Liu, Y., Kashani, M.M., Shi, Z., Gu, B., Sarkar, A., Popowich, F.: Description of SQUASH, the SFU question answering summary handler for the DUC-2005 summarization task. In: Proceeding of the Document Understanding Conference (DUC2005), Vancouver (2005)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceeding of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), Singapore (2009)
Nenkova, A., Passonneau, R.: Evaluating content selection in summarization: the pyramid method. In: Proceeding of the Human Language Technology Conference-North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT-NAACL 2004), Boston (2004)
Radev, D.R., McKeown, K.R.: Generating natural language summaries from multiple on-line sources. Comput. Linguist. 24(3), 469–500 (1998)
Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62, 107–136 (2006)
Rusu, D., Fortuna, B., Grobelink, M., Mladenic, D.: Semantic graphs derived from triplets with application in document summarization. Informatica, 33, 357–362 (2009)
Sauper, C., Barzilay, R.: Automatically generating wikipedia articles: a structure-aware approach. In: Proceeding of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), Singapore (2009)
Schlaefer, N., Ko, J., Betteridge, J., Sautter, G., Pathak, M., Nyberg, E.: Semantic extensions of the Ephyra QA system for TREC2007. In: Proceeding of the Text Retrieval Conference (TREC2007), Gaithersburg (2007)
Sekine, S.: On-demand information extraction. In: Proceeding of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics (COLING-ACL 2006), Sydney (2006)
Vanderwende, L., Banko, M., Menezes, A.: Event-centric summary generation. In: Proceeding of the Document Understanding Conference (DUC 2004), Boston (2004)
Vikas, O., Meshram, A.K., Meena, G., Gupta, A.: Multiple document summarization using principal component analysis incorporating semantic vector space model. Comput. Linguist. Chin. Lang. Process. 13(2), 141–156 (2008)
White, M., Korelsky, T., Cardie, C., Ng, V., Pierce, D., Wagstaff, K.: Multidocument summarization via information extraction. In: Proceeding of the Human Language Technologies (HLT 2001), Lisbon, pp. 263–269. (2001)
Yarowsky, D.: Word-sense disambiguation using statistical models of Rogets categories trained on large corpora. In: Proceeding of the 14th International Conference on Computational Linguistics (COLING 1992), Nantes (1992)
Acknowledgements
The first author and the third author were supported by the U.S. Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053, the U.S. NSF CAREER Award under Grant IIS-0953149 and PSC-CUNY Research Program. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Ji, H., Favre, B., Lin, WP., Gillick, D., Hakkani-Tur, D., Grishman, R. (2013). Open-Domain Multi-Document Summarization via Information Extraction: Challenges and Prospects. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28569-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-28569-1_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28568-4
Online ISBN: 978-3-642-28569-1
eBook Packages: Computer ScienceComputer Science (R0)