Toward Automatically Improving Metadata Quality of Electronic Theses and Dissertations at Scale | IEEE Conference Publication | IEEE Xplore

Toward Automatically Improving Metadata Quality of Electronic Theses and Dissertations at Scale


Abstract:

Metadata is crucial for the accessibility, interoperability, and long-term usability of digital objects such as Electronic Theses and Dissertations (ETDs). In large-scale...Show More

Abstract:

Metadata is crucial for the accessibility, interoperability, and long-term usability of digital objects such as Electronic Theses and Dissertations (ETDs). In large-scale academic repositories, poor metadata quality can significantly impede the discovery and use of resources. This study addresses persistent issues of incomplete and inconsistent ETD metadata collected from U.S. university libraries. However, directly applying machine learning-based error detection and correction models may introduce unwanted errors due to the imperfection of these models. We propose an ETD metadata improvement system (ETDMIS) that mitigates the problem by integrating metadata validation and a version control mechanism. Our system was applied to a dataset of 100,000 U.S. ETDs, resulting in substantial improvements in metadata quality. Scalability was demonstrated by processing the entire dataset efficiently. The original and the enhanced metadata for the 100,000 ETDs are publicly accessible at https://github.com/lamps-lab/ETDMiner/tree/master/Meta100K.
Date of Conference: 15-18 December 2024
Date Added to IEEE Xplore: 16 January 2025
ISBN Information:

ISSN Information:

Conference Location: Washington, DC, USA

Contact IEEE to Subscribe

References

References is not available for this document.