Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Muhammad Suryani 1 ; 2 ; Steffen Hahne 1 ; Christian Beth 1 ; Klaus Wallmann 2 and Matthias Renz 1

Affiliations: 1 Institute of Informatik, Christian-Albrechts-Universität zu Kiel, Kiel, Germany ; 2 GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany

Keyword(s): Information Extraction, Data Acquisition, Research Data Management, Scientific Publication, Marine Science.

Abstract: Researchers encapsulate their findings in publications, generally available in PDFs, which are designed primarily for platform-independent viewing and printing and do not support editing or automatic data extraction. These documents are a rich source of information in any domain, but the information in these publications is presented in text, tables and figures. However, manual extraction of information from these components would be beyond tedious and necessitates an automatic approach. Therefore, an automatic extraction approach could provide valuable data to the research community while also helping to manage the increasing number of publications. Previously, many approaches focused on extracting individual components from scientific publications, i.e. metadata, text or tables, but failed to target these data components collectively. This paper proposes a Data Acquisition Framework (DAF), the most comprehensive framework to our knowledge. The DAF extracts enhanced metadata, segmen ted text, captions and content of tables and figures respectively. Through rigorous evaluation on two distinct datasets from the Marine Science and Chemical Domain we showcase the superior performance compared of the DAF to the baseline PDFDataExtractor. We also provide an illustrative example to underscore DAF’s adaptability in the realm of research data management. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.218.140.12

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Suryani, M., Hahne, S., Beth, C., Wallmann, K. and Renz, M. (2023). DAF: Data Acquisition Framework to Support Information Extraction from Scientific Publications. In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR; ISBN 978-989-758-671-2; ISSN 2184-3228, SciTePress, pages 468-476. DOI: 10.5220/0012260300003598

@conference{kdir23,
author={Muhammad Suryani and Steffen Hahne and Christian Beth and Klaus Wallmann and Matthias Renz},
title={DAF: Data Acquisition Framework to Support Information Extraction from Scientific Publications},
booktitle={Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR},
year={2023},
pages={468-476},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012260300003598},
isbn={978-989-758-671-2},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR
TI - DAF: Data Acquisition Framework to Support Information Extraction from Scientific Publications
SN - 978-989-758-671-2
IS - 2184-3228
AU - Suryani, M.
AU - Hahne, S.
AU - Beth, C.
AU - Wallmann, K.
AU - Renz, M.
PY - 2023
SP - 468
EP - 476
DO - 10.5220/0012260300003598
PB - SciTePress