Abstract
We address the task of compiling structured documentation of research processes in the form of knowledge graphs by automatically extracting information from publications and associating it with information from other sources. This challenge has not been previously addressed at the level described here. We have developed a process and a system that leverages existing information from DBpedia, retrieves articles from repositories, extracts and interrelates various kinds of named and non-named entities by exploiting article metadata, the structure of text as well as syntactic, lexical and semantic constraints, and populates a knowledge base in the form of RDF triples. An ontology designed to represent scholarly practices is driving the whole process. Rule -based and machine learning- based methods that account for the nature of scientific texts and a wide variety of writing styles have been developed for the task. Evaluation on datasets from three disciplines, Digital Humanities, Bioinformatics, and Medicine, shows very promising performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications. J. Assoc. Inf. Sci. Technol. 66, 2215–2222 (2015)
Renear, A.H., Palmer, C.L.: Strategic reading, ontologies, and the future of scientific publishing. Science 325, 828–832 (2009)
Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE, pp. 546–555 (2017)
Pertsas, V., Constantopoulos, P.: Scholarly ontology: modelling scholarly practices. Int. J. Digit. Libr. 18, 173–190 (2017)
Gerber, D., Hellmann, S., Bühmann, L., Soru, T., Usbeck, R., Ngonga Ngomo, A.-C.: Real-time RDF extraction from unstructured data streams. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 135–150. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_9
Lehmann, J., et al.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6, 167–195 (2015). https://doi.org/10.3233/SW-140134
Chalkidis, I., Michos, A., Androutsopoulos, I.: Extracting contract elements. In: ICAL, London, p. 10 (2017)
Stern, R., Sagot, B.: Population of a knowledge base for news metadata from unstructured text and web data. In: AKBC-WEKEX 2012, Montreal, Canada, pp. 35–40 (2012)
Makki, J., Alquier, A.-M., Prince, V.: Ontology population via NLP techniques in risk management. Int. J. Humanit. Soc. Sci. 3, 212–217 (2008)
Buitelaar, P., Cimiano, P., Frank, A., Hartung, M., Racioppa, S.: Ontology-based information extraction and integration from heterogeneous data sources. Int. J. Hum. Comput. Stud. 66, 759–788 (2008). https://doi.org/10.1016/j.ijhcs.2008.07.007
Pertsas, V., Constantopoulos, P.: Ontology-driven information extraction from research publications. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J.C. (eds.) TPDL 2018. LNCS, vol. 11057, pp. 241–253. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00066-0_21
Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57, 345–420 (2015)
QasemiZadeh, B., Schumann, A.-K.: The ACL RD-TEC 2.0: a language resource for evaluating term extraction and entity recognition methods. In: Proceedings of the 10th Edition of the Language Resources and Evaluation Conference, pp. 1862–1868 (2016)
Lee, L.-H., Lee, K.-C., Tseng, Y.-H.: The NTNU system at SemEval-2017 task 10: extracting keyphrases and relations from scientific publications using multiple CRFs. In: 11th International Workshop on Semantic Evaluation (SemEval 2017), pp. 950–954 (2017)
Luan, Y., Ostendorf, M., Hajishirzi, H.: Scientific information extraction with semi-supervised neural tagging, pp. 2631–2641 (2017)
Sateli, B., Witte, R.: What’s in this paper? Combining rhetorical entities with linked open data for semantic literature querying. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1023–1028. ACM (2015)
Osborne, F., de Ribaupierre, H., Motta, E.: TechMiner: extracting technologies from academic publications. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 463–479. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49004-5_30
Sateli, B., Witte, R.: Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloud. PeerJ Comput. Sci. 1, e37 (2015)
Song, Y., Yi, E., Kim, E., Lee, G.G., Park, S.J.: POSBIOTM-NER: a machine learning approach for bio-named entity recognition, Korea, 305–350 (2004)
Plake, C., et al.: A support vector classifier for gene name recognition. In: BioCreAtIvE Workshop, Granada, Spain, pp. 1–5 (2004)
Gupta, S., Manning, C.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1–9 (2011)
Pertsas, V., Constantopoulos, P., Androutsopoulos, I.: Ontology driven extraction of research processes. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 162–178. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_10
Ruch, P., et al.: Using argumentation to extract key sentences from biomedical abstracts. Int. J. Med. Inf. 76, 195–200 (2007)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
De Sitter, A., Calders, T., Daelemans, W.: A formal framework for evaluation of information extraction, University of Antwerp (2004)
Do, H.H.N., Chandrasekaran, M.K., Cho, P.S., Kan, M.-Y.M.Y.: Extracting and matching authors and affiliations in scholarly documents. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013, p. 219 (2013)
Lindsay, A., Read, J., Ferreira, J.F., Hayton, T., Porteous, J., Gregory, P.: Framer: planning models from natural language action descriptions. In: Proceedings ICAPS, pp. 434–442 (2017)
Feng, W., Zhuo, H.H., Kambhampati, S.: Extracting action sequences from texts based on deep reinforcement learning (2018)
Mei, H., Bansal, M., Walter, M.R.: Listen, attend, and walk: neural mapping of navigational instructions to action sequences (2015)
Yeh, A.: More accurate tests for the statistical significance of result differences. In: Coling 2000 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Constantopoulos, P., Pertsas, V. (2020). From Publications to Knowledge Graphs. In: Flouris, G., Laurent, D., Plexousakis, D., Spyratos, N., Tanaka, Y. (eds) Information Search, Integration, and Personalization. ISIP 2019. Communications in Computer and Information Science, vol 1197. Springer, Cham. https://doi.org/10.1007/978-3-030-44900-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-44900-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44899-8
Online ISBN: 978-3-030-44900-1
eBook Packages: Computer ScienceComputer Science (R0)