DAF: Data Acquisition Framework to Support Information Extraction from Scientific Publications

Muhammad Suryani; Muhammad Suryani; Steffen Hahne; Christian Beth; Klaus Wallmann; Matthias Renz

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

DAF: Data Acquisition Framework to Support Information Extraction from Scientific Publications

Topics: Context Discovery; Information Extraction; Natural Language Processing; Pre-Processing and Post-Processing for Data Mining; Software Frameworks and Applications

In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: , 468-476, 2023 , Rome, Italy

Authors: Muhammad Suryani ^{1

;

2} ; Steffen Hahne ¹ ; Christian Beth ¹ ; Klaus Wallmann ² and Matthias Renz ¹

Affiliations: ¹ Institute of Informatik, Christian-Albrechts-Universität zu Kiel, Kiel, Germany ; ² GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany

Keyword(s): Information Extraction, Data Acquisition, Research Data Management, Scientific Publication, Marine Science.

Abstract: Researchers encapsulate their findings in publications, generally available in PDFs, which are designed primarily for platform-independent viewing and printing and do not support editing or automatic data extraction. These documents are a rich source of information in any domain, but the information in these publications is presented in text, tables and figures. However, manual extraction of information from these components would be beyond tedious and necessitates an automatic approach. Therefore, an automatic extraction approach could provide valuable data to the research community while also helping to manage the increasing number of publications. Previously, many approaches focused on extracting individual components from scientific publications, i.e. metadata, text or tables, but failed to target these data components collectively. This paper proposes a Data Acquisition Framework (DAF), the most comprehensive framework to our knowledge. The DAF extracts enhanced metadata, segmen ted text, captions and content of tables and figures respectively. Through rigorous evaluation on two distinct datasets from the Marine Science and Chemical Domain we showcase the superior performance compared of the DAF to the baseline PDFDataExtractor. We also provide an illustrative example to underscore DAF’s adaptability in the realm of research data management. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.218.140.12

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Suryani, M., Hahne, S., Beth, C., Wallmann, K. and Renz, M. (2023). DAF: Data Acquisition Framework to Support Information Extraction from Scientific Publications. In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR; ISBN 978-989-758-671-2; ISSN 2184-3228, SciTePress, pages 468-476. DOI: 10.5220/0012260300003598

@conference{kdir23,
author={Muhammad Suryani and Steffen Hahne and Christian Beth and Klaus Wallmann and Matthias Renz},
title={DAF: Data Acquisition Framework to Support Information Extraction from Scientific Publications},
booktitle={Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR},
year={2023},
pages={468-476},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012260300003598},
isbn={978-989-758-671-2},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR
TI - DAF: Data Acquisition Framework to Support Information Extraction from Scientific Publications
SN - 978-989-758-671-2
IS - 2184-3228
AU - Suryani, M.
AU - Hahne, S.
AU - Beth, C.
AU - Wallmann, K.
AU - Renz, M.
PY - 2023
SP - 468
EP - 476
DO - 10.5220/0012260300003598
PB - SciTePress