Skip to main content

From Publications to Knowledge Graphs

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1197))

Abstract

We address the task of compiling structured documentation of research processes in the form of knowledge graphs by automatically extracting information from publications and associating it with information from other sources. This challenge has not been previously addressed at the level described here. We have developed a process and a system that leverages existing information from DBpedia, retrieves articles from repositories, extracts and interrelates various kinds of named and non-named entities by exploiting article metadata, the structure of text as well as syntactic, lexical and semantic constraints, and populates a knowledge base in the form of RDF triples. An ontology designed to represent scholarly practices is driving the whole process. Rule -based and machine learning- based methods that account for the nature of scientific texts and a wide variety of writing styles have been developed for the task. Evaluation on datasets from three disciplines, Digital Humanities, Bioinformatics, and Medicine, shows very promising performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://biocreative.sourceforge.net/index.html.

  2. 2.

    www.spacy.io.

  3. 3.

    https://radimrehurek.com/gensim/.

  4. 4.

    https://nlp.stanford.edu/projects/glove/.

  5. 5.

    https://nlp.stanford.edu/software/CRF-NER.html.

References

  1. Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications. J. Assoc. Inf. Sci. Technol. 66, 2215–2222 (2015)

    Article  Google Scholar 

  2. Renear, A.H., Palmer, C.L.: Strategic reading, ontologies, and the future of scientific publishing. Science 325, 828–832 (2009)

    Article  Google Scholar 

  3. Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE, pp. 546–555 (2017)

    Google Scholar 

  4. Pertsas, V., Constantopoulos, P.: Scholarly ontology: modelling scholarly practices. Int. J. Digit. Libr. 18, 173–190 (2017)

    Article  Google Scholar 

  5. Gerber, D., Hellmann, S., Bühmann, L., Soru, T., Usbeck, R., Ngonga Ngomo, A.-C.: Real-time RDF extraction from unstructured data streams. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 135–150. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_9

    Chapter  Google Scholar 

  6. Lehmann, J., et al.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6, 167–195 (2015). https://doi.org/10.3233/SW-140134

    Article  Google Scholar 

  7. Chalkidis, I., Michos, A., Androutsopoulos, I.: Extracting contract elements. In: ICAL, London, p. 10 (2017)

    Google Scholar 

  8. Stern, R., Sagot, B.: Population of a knowledge base for news metadata from unstructured text and web data. In: AKBC-WEKEX 2012, Montreal, Canada, pp. 35–40 (2012)

    Google Scholar 

  9. Makki, J., Alquier, A.-M., Prince, V.: Ontology population via NLP techniques in risk management. Int. J. Humanit. Soc. Sci. 3, 212–217 (2008)

    Google Scholar 

  10. Buitelaar, P., Cimiano, P., Frank, A., Hartung, M., Racioppa, S.: Ontology-based information extraction and integration from heterogeneous data sources. Int. J. Hum. Comput. Stud. 66, 759–788 (2008). https://doi.org/10.1016/j.ijhcs.2008.07.007

    Article  Google Scholar 

  11. Pertsas, V., Constantopoulos, P.: Ontology-driven information extraction from research publications. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J.C. (eds.) TPDL 2018. LNCS, vol. 11057, pp. 241–253. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00066-0_21

    Chapter  Google Scholar 

  12. Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57, 345–420 (2015)

    Article  MathSciNet  Google Scholar 

  13. QasemiZadeh, B., Schumann, A.-K.: The ACL RD-TEC 2.0: a language resource for evaluating term extraction and entity recognition methods. In: Proceedings of the 10th Edition of the Language Resources and Evaluation Conference, pp. 1862–1868 (2016)

    Google Scholar 

  14. Lee, L.-H., Lee, K.-C., Tseng, Y.-H.: The NTNU system at SemEval-2017 task 10: extracting keyphrases and relations from scientific publications using multiple CRFs. In: 11th International Workshop on Semantic Evaluation (SemEval 2017), pp. 950–954 (2017)

    Google Scholar 

  15. Luan, Y., Ostendorf, M., Hajishirzi, H.: Scientific information extraction with semi-supervised neural tagging, pp. 2631–2641 (2017)

    Google Scholar 

  16. Sateli, B., Witte, R.: What’s in this paper? Combining rhetorical entities with linked open data for semantic literature querying. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1023–1028. ACM (2015)

    Google Scholar 

  17. Osborne, F., de Ribaupierre, H., Motta, E.: TechMiner: extracting technologies from academic publications. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 463–479. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49004-5_30

    Chapter  Google Scholar 

  18. Sateli, B., Witte, R.: Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloud. PeerJ Comput. Sci. 1, e37 (2015)

    Article  Google Scholar 

  19. Song, Y., Yi, E., Kim, E., Lee, G.G., Park, S.J.: POSBIOTM-NER: a machine learning approach for bio-named entity recognition, Korea, 305–350 (2004)

    Google Scholar 

  20. Plake, C., et al.: A support vector classifier for gene name recognition. In: BioCreAtIvE Workshop, Granada, Spain, pp. 1–5 (2004)

    Google Scholar 

  21. Gupta, S., Manning, C.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1–9 (2011)

    Google Scholar 

  22. Pertsas, V., Constantopoulos, P., Androutsopoulos, I.: Ontology driven extraction of research processes. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 162–178. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_10

    Chapter  Google Scholar 

  23. Ruch, P., et al.: Using argumentation to extract key sentences from biomedical abstracts. Int. J. Med. Inf. 76, 195–200 (2007)

    Article  Google Scholar 

  24. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  Google Scholar 

  25. De Sitter, A., Calders, T., Daelemans, W.: A formal framework for evaluation of information extraction, University of Antwerp (2004)

    Google Scholar 

  26. Do, H.H.N., Chandrasekaran, M.K., Cho, P.S., Kan, M.-Y.M.Y.: Extracting and matching authors and affiliations in scholarly documents. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013, p. 219 (2013)

    Google Scholar 

  27. Lindsay, A., Read, J., Ferreira, J.F., Hayton, T., Porteous, J., Gregory, P.: Framer: planning models from natural language action descriptions. In: Proceedings ICAPS, pp. 434–442 (2017)

    Google Scholar 

  28. Feng, W., Zhuo, H.H., Kambhampati, S.: Extracting action sequences from texts based on deep reinforcement learning (2018)

    Google Scholar 

  29. Mei, H., Bansal, M., Walter, M.R.: Listen, attend, and walk: neural mapping of navigational instructions to action sequences (2015)

    Google Scholar 

  30. Yeh, A.: More accurate tests for the statistical significance of result differences. In: Coling 2000 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panos Constantopoulos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Constantopoulos, P., Pertsas, V. (2020). From Publications to Knowledge Graphs. In: Flouris, G., Laurent, D., Plexousakis, D., Spyratos, N., Tanaka, Y. (eds) Information Search, Integration, and Personalization. ISIP 2019. Communications in Computer and Information Science, vol 1197. Springer, Cham. https://doi.org/10.1007/978-3-030-44900-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-44900-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-44899-8

  • Online ISBN: 978-3-030-44900-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics