skip to main content
10.1145/3529372.3530948acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
extended-abstract

Opening scholarly documents through text analytics

Published: 20 June 2022 Publication History

Abstract

Vast amounts of scholarly knowledge are buried in electronic theses and dissertations (ETDs). ETDs are valuable documents that have been developed at great cost but largely remain unknown and unused. We aim for digital libraries to open up these long documents using computerized text mining and analytics. We add value to the existing systems by providing chapter-level labels and summaries. This allows readers to easily find chapters of interest. We use ETDs to fine-tune language models like BERT and SciBERT, to help better capture the specialized vocabulary present in such documents.

References

[1]
Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3615--3620.
[2]
Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. arXiv:2004.05150 [cs] (Dec. 2020). http://arxiv.org/abs/2004.05150 arXiv: 2004.05150.
[3]
Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. 2018. A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint arXiv:1804.05685 (2018).
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186.
[5]
Palakh Mignonne Jude. June, 2020. Increasing Accessibility of Electronic Theses and Dissertations (ETDs) Through Chapter-level Classification. MS thesis, Computer Science, Virginia Tech (June, 2020). http://hdl.handle.net/10919/99294

Index Terms

  1. Opening scholarly documents through text analytics

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries
    June 2022
    392 pages
    ISBN:9781450393454
    DOI:10.1145/3529372
    • General Chairs:
    • Akiko Aizawa,
    • Thomas Mandl,
    • Zeljko Carevic,
    • Program Chairs:
    • Annika Hinze,
    • Philipp Mayr,
    • Philipp Schaer
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    In-Cooperation

    • IEEE Technical Committee on Digital Libraries (TC DL)

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2022

    Check for updates

    Author Tags

    1. attention
    2. classification
    3. language modeling
    4. natural language processing
    5. summarization
    6. transformers

    Qualifiers

    • Extended-abstract

    Funding Sources

    Conference

    JCDL '22
    Sponsor:

    Acceptance Rates

    JCDL '22 Paper Acceptance Rate 35 of 132 submissions, 27%;
    Overall Acceptance Rate 415 of 1,482 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 111
      Total Downloads
    • Downloads (Last 12 months)29
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media