skip to main content
10.1145/3626772.3657896acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Searching for Physical Documents in Archival Repositories

Published: 11 July 2024 Publication History

Abstract

Early retrieval systems were used to search physical media (e.g., paper) using manually created metadata. Modern ranked retrieval techniques are far more capable, but they require that content be either born digital or digitized. For physical content, searching metadata remains the state of the art. This paper seeks to change that, using a textual-edge graph neural network to learn relations between items from available metadata and from any content that has been digitized. Results show that substantial improvement over the best prior method can be achieved.

References

[1]
Yuwei Fang, Siqi Sun, Zhe Gan, Rohit Pillai, Shuohang Wang, and Jingjing Liu. 2020. Hierarchical Graph Network for Multi-hop Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 8823--8838. https://doi.org/10.18653/v1/2020.emnlp-main.710
[2]
Bowen Jin, Yu Zhang, Yu Meng, and Jiawei Han. 2022. Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks. In The Eleventh International Conference on Learning Representations.
[3]
Chaozhuo Li, Bochen Pang, Yuming Liu, Hao Sun, Zheng Liu, Xing Xie, Tianqi Yang, Yanling Cui, Liangjie Zhang, and Qi Zhang. 2021. AdsGNN: Behavior- Graph Augmented Relevance Modeling in Sponsored Search. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 223--232. https://doi.org/10.1145/3404835.3462926
[4]
National Archives and Records Administration. 2024. Record Group Explorer Data. Website https://www.archives.gov/findingaid/stat/discovery, visited January 11, 2024.
[5]
Douglas W. Oard. 2023. Known by the Company It Keeps: Proximity-Based Indexing for Physical Content in Archival Repositories. In Linking Theory and Practice of Digital Libraries: 27th International Conference on Theory and Practice of Digital Libraries, TPDL 2023, Zadar, Croatia, September 26--29, 2023, Proceedings (Lecture Notes in Computer Science, Vol. 14241). Springer, 17--30. https://doi.org/10.1007/978--3-031--43849--3_3
[6]
T. R. Schellenberg. 1961. Archival Principles of Arrangement. The American Archivist 24, 1 (1961), 11--24.
[7]
Tokinori Suzuki, DouglasW. Oard, Emi Ishita, and Yoichi Tomiura. 2023. Automatically Detecting References from the Scholarly Literature to Records in Archives. In Leveraging Generative Intelligence in Digital Libraries: Towards Human-Machine Collaboration - 25th International Conference on Asia-Pacific Digital Libraries, ICADL 2023, Taipei, Taiwan, December 4--7, 2023, Proceedings, Part II (Lecture Notes in Computer Science, Vol. 14458), Dion Hoe-Lian Goh, Shu-Jiun Chen, and Suppawong Tuarob (Eds.). Springer, 100--107. https://doi.org/10.1007/978--981--99--8088--8_9
[8]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000--6010.
[9]
Gregory Wiedeman. 2019. The Historical Hazards of Finding Aids. The American Archivist 82, 2 (2019), 381--420.
[10]
Junhan Yang, Zheng Liu, Shitao Xiao, Chaozhuo Li, Defu Lian, Sanjay Agrawal, Amit Singh, Guangzhong Sun, and Xing Xie. 2021. GraphFormers: GNNnested Transformers for Representation Learning on Textual Graph. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J.Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 28798--28810. https://proceedings.neurips.cc/paper_files/paper/2021/file/f18a6d1cde4b205199de8729a6637b42-Paper.pdf
[11]
Jason Zhu, Yanling Cui, Yuming Liu, Hao Sun, Xue Li, Markus Pelger, Tianqi Yang, Liangjie Zhang, Ruofei Zhang, and Huasha Zhao. 2021. TextGNN: Improving Text Encoder via Graph Neural Network in Sponsored Search. In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia) (WWW '21). Association for Computing Machinery, New York, NY, USA, 2848--2857. https://doi.org/10.1145/3442381.3449842

Index Terms

  1. Searching for Physical Documents in Archival Repositories

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2024
    3164 pages
    ISBN:9798400704314
    DOI:10.1145/3626772
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 July 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. archives
    2. graph neural network
    3. physical information access

    Qualifiers

    • Short-paper

    Funding Sources

    • JSPS KAKENHI

    Conference

    SIGIR 2024
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 126
      Total Downloads
    • Downloads (Last 12 months)126
    • Downloads (Last 6 weeks)53
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media