skip to main content
10.1145/3477495.3531802acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

What Makes a Good Podcast Summary?

Published: 07 July 2022 Publication History

Abstract

Abstractive summarization of podcasts is motivated by the growing popularity of podcasts and the needs of their listeners. Podcasting is a markedly different domain from news and other media that are commonly studied in the context of automatic summarization. As such, the qualities of a good podcast summary are yet unknown. Using a collection of podcast summaries produced by different algorithms alongside human judgments of summary quality obtained from the TREC 2020 Podcasts Track, we study the correlations between various automatic evaluation metrics and human judgments, as well as the linguistic aspects of summaries that result in strong evaluations.

References

[1]
Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020).
[2]
Manik Bhandari, Pranav Narayan Gour, Atabak Ashfaq, Pengfei Liu, and Graham Neubig. 2020. Re-evaluating Evaluation in Text Summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for Computational Linguistics, Online, 9347--9359. https://doi.org/10.18653/v1/2020.emnlp-main.751
[3]
Ann Clifton, Sravana Reddy, Yongze Yu, Aasish Pappu, Rezvaneh Rezapour, Hamed Bonab, Maria Eskevich, Gareth Jones, Jussi Karlgren, Ben Carterette, and Rosie Jones. 2020. 100,000 Podcasts: A Spoken English Document Corpus. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 5903--5917. https://doi.org/10.18653/v1/2020.coling-main.519
[4]
Hoa Trang Dang and Karolina Owczarzak. 2008. Overview of the TAC 2008 Update Summarization Task. In TAC .
[5]
Hoa Trang Dang and Karolina Owczarzak. 2009. Overview of the TAC 2009 summarization track. In proceedings of the Text Analysis Conference .
[6]
Alexander R. Fabbri, Bryan McCann, Caiming Xiong, Richard Socher, and Dragomir Radev. 2021. SummEval: Re-evaluating Summarization Evaluation. arxiv: 2007.12626 [cs.CL]
[7]
Tobias Falke, Leonardo F. R. Ribeiro, Prasetya Ajie Utama, Ido Dagan, and Iryna Gurevych. 2019. Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 2214--2220. https://doi.org/10.18653/v1/P19--1213
[8]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems, Vol. 27 (2014).
[9]
Max Grusky, Mor Naaman, and Yoav Artzi. 2018. Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 708--719. https://doi.org/10.18653/v1/N18--1065
[10]
Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching Machines to Read and Comprehend. In Advances in Neural Information Processing Systems (NIPS), C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates. http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend.pdf
[11]
Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. spaCy: Industrial-strength Natural Language Processing in Python. https://doi.org/10.5281/zenodo.1212303
[12]
Rosie Jones, Ben Carterette, Ann Clifton, Maria Eskevich, Gareth JF Jones, Jussi Karlgren, Aasish Pappu, Sravana Reddy, and Yongze Yu. 2020. TREC 2020 Podcasts Track Overview. In The 29th Text Retrieval Conference (TREC) notebook. NIST .
[13]
Rosie Jones, Hamed Zamani, Markus Schedl, Ching-Wei Chen, Sravana Reddy, Ann Clifton, Jussi Karlgren, Helia Hashemi, Aasish Pappu, Zahra Nazari, et almbox. 2021. Current Challenges and Future Directions in Podcast Information Access. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval .
[14]
Hannes Karlbom and Ann Clifton. 2020. Abstractive Podcast Summarization using BART with Longformer attention. In The 29th Text Retrieval Conference (TREC) notebook. NIST .
[15]
Sumanta Kashyapi and Laura Dietz. 2020. TREMA-UNH at TREC 2020. In The 29th Text Retrieval Conference (TREC) notebook. NIST .
[16]
Wojciech Kryscinski, Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, and Richard Socher. 2019. Neural Text Summarization: A Critical Evaluation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Association for Computational Linguistics, Hong Kong, China, 540--551. https://doi.org/10.18653/v1/D19--1051
[17]
Wojciech Kryscinski, Bryan McCann, Caiming Xiong, and Richard Socher. 2020. Evaluating the Factual Consistency of Abstractive Text Summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for Computational Linguistics, Online, 9332--9346. https://doi.org/10.18653/v1/2020.emnlp-main.750
[18]
Alon Lavie and Abhaya Agarwal. 2007. METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments. In Proceedings of the Second Workshop on Statistical Machine Translation. Association for Computational Linguistics, Prague, Czech Republic, 228--231. https://aclanthology.org/W07-0734
[19]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7871--7880. https://doi.org/10.18653/v1/2020.acl-main.703
[20]
Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out . Association for Computational Linguistics, Barcelona, Spain, 74--81. https://aclanthology.org/W04--1013
[21]
Annie Louis and Ani Nenkova. 2009. Automatically Evaluating Content Selection in Summarization without Human Models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, Singapore, 306--314. https://aclanthology.org/D09--1032
[22]
Potsawee Manakul and Mark Gales. 2020. CUED_speech at TREC 2020 Podcast Summarisation Track. In The 29th Text Retrieval Conference (TREC) notebook. NIST .
[23]
Matthew McLean. 2020. Podcast Discovery Stats in 2020: How Listeners Discover New Shows. The Podcast Host (Dec 2020). 'https://www.thepodcasthost.com/promotion/podcast-discoverability/' Accessed Dec 2020.
[24]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[25]
Jun-Ping Ng and Viktoria Abrecht. 2015. Better Summarization Evaluation with Word Embeddings for ROUGE. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, Lisbon, Portugal, 1925--1930. https://doi.org/10.18653/v1/D15--1222
[26]
Paul Owoicho and Jeff Dalton. 2020. Glasgow Representation and Information Learning Lab (GRILL) at TREC 2020 Podcasts Track. In The 29th Text Retrieval Conference (TREC) notebook. NIST .
[27]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics . 311--318.
[28]
Maja Popović. 2015. chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation . Association for Computational Linguistics, Lisbon, Portugal, 392--395. https://doi.org/10.18653/v1/W15--3049
[29]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, Vol. 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.html
[30]
Sravana Reddy, Mariya Lazarova, Yongze Yu, and Rosie Jones. 2021 a. Modeling Language Usage and Listener Engagement in Podcasts. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 632--643. https://doi.org/10.18653/v1/2021.acl-long.52
[31]
Sravana Reddy, Yongze Yu, Aasish Pappu, Aswin Sivaraman, Rezvaneh Rezapour, and Rosie Jones. 2021 b. Detecting Extraneous Content in Podcasts. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers .
[32]
Radim v Rehr uv rek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45--50.
[33]
Rezvaneh Rezapour, Sravana Reddy, Ann Clifton, and Rosie Jones. 2021. Spotify at TREC 2020: Genre-Aware Abstractive Podcast Summarization. In The 29th Text Retrieval Conference (TREC) notebook. NIST .
[34]
Kaiqiang Song, Chen Li, Xiaoyang Wang, Dong Yu, and Fei Liu. 2020. Automatic summarization of open-domain podcast episodes. In The 29th Text Retrieval Conference (TREC) notebook. NIST .
[35]
Damiano Spina, Johanne R Trippas, Lawrence Cavedon, and Mark Sanderson. 2017. Extracting audio summaries to support effective spoken document search. Journal of the Association for Information Science and Technology, Vol. 68, 9 (2017), 2101--2115.
[36]
Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4566--4575.
[37]
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019).
[38]
Chujie Zheng, Kunpeng Zhang, Harry Jiannan Wang, and Ling Fan. 2020. A Two-Phase Approach for Abstractive Podcast Summarization. In The 29th Text Retrieval Conference (TREC) notebook. NIST .

Cited By

View all
  • (2024)PodReels: Human-AI Co-Creation of Video Podcast TeasersProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661591(958-974)Online publication date: 1-Jul-2024
  • (2024)Enhancing the Podcast Browsing Experience through Topic Segmentation and Visualization with Generative AIProceedings of the 2024 ACM International Conference on Interactive Media Experiences10.1145/3639701.3656324(117-128)Online publication date: 7-Jun-2024
  • (2023)Enabling Goal-Focused Exploration of Podcasts in Interactive Recommender SystemsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584032(142-155)Online publication date: 27-Mar-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2022
3569 pages
ISBN:9781450387323
DOI:10.1145/3477495
© 2022 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. abstractive text summarization
  2. evaluation
  3. podcast summarization
  4. rouge

Qualifiers

  • Short-paper

Conference

SIGIR '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)168
  • Downloads (Last 6 weeks)14
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PodReels: Human-AI Co-Creation of Video Podcast TeasersProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661591(958-974)Online publication date: 1-Jul-2024
  • (2024)Enhancing the Podcast Browsing Experience through Topic Segmentation and Visualization with Generative AIProceedings of the 2024 ACM International Conference on Interactive Media Experiences10.1145/3639701.3656324(117-128)Online publication date: 7-Jun-2024
  • (2023)Enabling Goal-Focused Exploration of Podcasts in Interactive Recommender SystemsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584032(142-155)Online publication date: 27-Mar-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media