Insights from CL-SciSumm 2016: the faceted scientific document summarization Shared Task

Jaidka, Kokil; Chandrasekaran, Muthu Kumar; Rustagi, Sajal; Kan, Min-Yen

doi:10.1007/s00799-017-0221-y

Insights from CL-SciSumm 2016: the faceted scientific document summarization Shared Task

Published: 14 June 2017

Volume 19, pages 163–171, (2018)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

Kokil Jaidka¹,
Muthu Kumar Chandrasekaran²,
Sajal Rustagi³ &
…
Min-Yen Kan^2,4

701 Accesses
16 Citations
1 Altmetric
Explore all metrics

Abstract

We describe the participation and the official results of the 2nd Computational Linguistics Scientific Summarization Shared Task (CL-SciSumm), held as a part of the BIRNDL workshop at the Joint Conference for Digital Libraries 2016 in Newark, New Jersey. CL-SciSumm is the first medium-scale Shared Task on scientific document summarization in the computational linguistics (CL) domain. Participants were provided a training corpus of 30 topics, each comprising of a reference paper (RP) and 10 or more citing papers, all of which cite the RP. For each citation, the text spans (i.e., citances) that pertain to the RP have been identified. Participants solved three sub-tasks in automatic research paper summarization using this text corpus. Fifteen teams from six countries registered for the Shared Task, of which ten teams ultimately submitted and presented their results. The annotated corpus comprised 30 target papers—currently the largest available corpora of its kind. The corpus is available for free download and use at https://github.com/WING-NUS/scisumm-corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Empirical Assessment of Citation Information in Scientific Summarization

Exploiting pivot words to classify and summarize discourse facets of scientific papers

Article 13 June 2020

Moreno La Quatra, Luca Cagliero & Elena Baralis

Let’s Summarize Scientific Documents! A Clustering-Based Approach via Citation Context

Notes

http://www.nist.gov/tac/2014.
http://www.nist.gov/tac/2014.
The text of the documents was extracted from the original PDF documents; an optical character recognition (OCR) system was applied.
http://knowtator.sourceforge.net/.
http://protege.stanford.edu/about.php.
https://github.com/WING-NUS/scisumm-corpus/tree/master/evaluation_scripts.
http://www.nist.gov/tac/2014/BiomedSumm.

References

Aggarwal, P., Sharma, R.: Lexical and Syntactic cues to identify Reference Scope of Citance. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 103–112. Newark, NJ, USA (2016)
Cao, Z., Li, W., Wu, D.: PolyU at CL-SciSumm 2016. In: Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 132–138. Newark, NJ, USA (2016)
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: 21st annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336. Association of Computational Linguistics (1998)
Conroy, J., Davis, S.: Vector space and language models for scientific document summarization. In: NAACL-HLT, pp. 186–191. Association of Computational Linguistics, Newark, NJ, USA (2015)
Drouin, P.: Extracting a bilingual transdisciplinary scientific lexicon. In: eLexicography in the 21st century: new challenges, new applications, pp. 43–53. Presses Universitaires de Louvain, Louvain-la-Neuve (2010)
Hoang, C., Kan, M.: Towards automated related work summarization. In: Proceedings of COLING: posters, pp. 427–435. ACL (2010)
Jaidka, K., Chandrasekaran, M.K., Elizalde, B.F., Jha, R., Jones, C., Kan, M.Y., Khanna, A., Molla-Aliod, D., Radev, D.R., Ronzano, F., et al.: The computational linguistics summarization pilot task. In: Proceedings of Text Analysis Conference. Gaithersburg, USA (2014)
Jaidka, K., Khoo, C.S., Na, J.C.: Deconstructing human literature reviews—a framework for multi-document summarization. In: Proceedings of ENLG, pp. 125–135 (2013)
Jones, K.S.: Automatic summarising: the state of the art. Inf. Process. Manag. 43(6), 1449–1481 (2007)
Article Google Scholar
Klampfl, S., Rexha, A., Kern, R.: Identifying referenced text in scientific publications by summarisation and classification techniques. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 122–131. Newark, NJ, USA (2016)
Li, L., Mao, L., Zhang, Y., Chi, J., Huang, T., Cong, X., Peng, H.: CIST system for CL-SciSumm 2016 shared task. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 156–167. Newark, NJ, USA (2016)
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. Text summarization branches out. In: Proceedings of the ACL-04 workshop 8 (2004)
Liu, F., Liu, Y.: Correlation between rouge and human evaluation of extractive meeting summaries. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pp. 201–204. Association for Computational Linguistics (2008)
Lu, K., Mao, J., Li, G., Xu, J.: Recognizing reference spans and classifying their discourse facets. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 139–145. Newark, NJ, USA (2016)
Malenfant, B., Lapalme, G.: RALI system description for CL-SciSumm 2016 shared task. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 146–155. Newark, NJ, USA (2016)
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: 21st National Conference on Artificial Intelligence, pp. 775–780. AAAI (2006)
Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., Radev, D.R., Zajic, D.: Using citations to generate surveys of scientific paradigms. In: Proceedings of NAACL, pp. 584–592. ACL (2009)
Moraes, L., Baki, S., Verma, R., Lee, D.: University of Houston at CL-SciSumm 2016: SVMs with tree kernels and sentence similarity. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 113–121. Newark, NJ, USA (2016)
Nakov, P.I., Schwartz, A.S., Hearst, M.: Citances: Citation sentences for semantic analysis of bioscience text. In: Proceedings of the SIGIR’04 Workshop on Search and Discovery in Bioinformatics, pp. 81–88 (2004)
Nomoto, T.: NEAL: A neurally enhanced approach to linking citation and reference. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 168–174. Newark, NJ, USA (2016)
Qazvinian, V., Radev, D.: Scientific paper summarization using citation summary networks. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 689–696. ACL (2008)
Radev, D.R., Muthukrishnan, P., Qazvinian, V., Abu-Jbara, A.: The ACL anthology network corpus. Lang. Resour. Eval. (2013). doi:10.1007/s10579-012-9211-2
Google Scholar
Saggion, H.: SUMMA: a robust and adaptable summarization tool. Traitement Autom. des Lang. 49(2), 103–125 (2002)
Google Scholar
Saggion, H., AbuRa’Ed, A., Ronzano, F.: Trainable citation-enhanced summarization of scientific articles. In: Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL2016), pp. 175–186. Newark, NJ, USA (2016)
Teufel, S., Moens, M.: Summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Linguist. 28(4), 409–445 (2002)
Article Google Scholar

Download references

Acknowledgements

The development and dissemination of the CL-SciSumm dataset and the related Shared Task has been generously supported by the Microsoft Research Asia (MSRA) Research Grant 2016. We would also like to thank Vasudeva Varma and colleagues at IIIT Hyderabad, India, and University of Hyderabad, India, for their efforts in convening and organizing our annotation workshops. We acknowledge the continued advice of Hoa Dang, Lucy Vanderwende and Anita de Waard from the pilot stage of this task. We also thank Rahul Jha and Dragomir Radev for sharing their software to prepare the XML versions of papers, and Kevin B. Cohen and colleagues for sharing their annotation schema, export scripts and the Knowtator package implementation on the Protege software. These parties have all made indispensable contributions in realizing this Shared Task.

Author information

Authors and Affiliations

School of Arts and Sciences, University of Pennsylvania, Pennsylvania, USA
Kokil Jaidka
School of Computing, National University of Singapore, Singapore, Singapore
Muthu Kumar Chandrasekaran & Min-Yen Kan
Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India
Sajal Rustagi
Smart Systems Institute, National University of Singapore, Singapore, Singapore
Min-Yen Kan

Authors

Kokil Jaidka
View author publications
You can also search for this author in PubMed Google Scholar
Muthu Kumar Chandrasekaran
View author publications
You can also search for this author in PubMed Google Scholar
Sajal Rustagi
View author publications
You can also search for this author in PubMed Google Scholar
Min-Yen Kan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kokil Jaidka.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jaidka, K., Chandrasekaran, M.K., Rustagi, S. et al. Insights from CL-SciSumm 2016: the faceted scientific document summarization Shared Task. Int J Digit Libr 19, 163–171 (2018). https://doi.org/10.1007/s00799-017-0221-y

Download citation

Received: 14 November 2016
Revised: 31 May 2017
Accepted: 06 June 2017
Published: 14 June 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s00799-017-0221-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Insights from CL-SciSumm 2016: the faceted scientific document summarization Shared Task

Abstract

Access this article

Similar content being viewed by others

An Empirical Assessment of Citation Information in Scientific Summarization

Exploiting pivot words to classify and summarize discourse facets of scientific papers

Let’s Summarize Scientific Documents! A Clustering-Based Approach via Citation Context

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Insights from CL-SciSumm 2016: the faceted scientific document summarization Shared Task

Abstract

Access this article

Similar content being viewed by others

An Empirical Assessment of Citation Information in Scientific Summarization

Exploiting pivot words to classify and summarize discourse facets of scientific papers

Let’s Summarize Scientific Documents! A Clustering-Based Approach via Citation Context

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation