Skip to main content

Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles

  • Conference paper
  • First Online:
Towards Open and Trustworthy Digital Societies (ICADL 2021)

Abstract

We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article’s contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed in a set of rules. Only those lexico-syntactic patterns were selected that were easily recognizable, occurred frequently, and positionally indicated a scientific entity type. The rules were developed on a collection of 50,237 CL titles covering all articles in the ACL Anthology. In total, 19,799 research problems, 18,111 solutions, 20,033 resources, 1,059 languages, 6,878 tools, and 21,687 methods were extracted at an average precision of 75%.

Supported by TIB Leibniz Information Centre for Science and Technology, the EU H2020 ERC project ScienceGRaph (GA ID: 819536).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The ORKG platform can be accessed online: https://orkg.org/.

References

  1. Ammar, W., et al.: Construction of the literature graph in semantic scholar. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Industry Papers), vol. 3, pp. 84–91 (2018)

    Google Scholar 

  2. Ammar, W., Peters, M.E., Bhagavatula, C., Power, R.: The AI2 system at SemeEal-2017 task 10 (ScienceIE): semi-supervised end-to-end entity and relation extraction. In: SemEval@ACL (2017)

    Google Scholar 

  3. Aryani, A., et al.: A research graph dataset for connecting research data repositories using RD-switchboard. Sci. Data 5(1), 1–9 (2018)

    Article  Google Scholar 

  4. Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE - extracting keyphrases and relations from scientific publications. In: SemEval@ACL (2017)

    Google Scholar 

  5. Beltagy, I., Lo, K., Cohan, A.: SciBERT: pretrained language model for scientific text. In: EMNLP (2019)

    Google Scholar 

  6. Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-independent extraction of scientific concepts from research articles. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 251–266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_17

    Chapter  Google Scholar 

  7. Burton, A., et al.: The Scholix framework for interoperability in data-literature information exchange. D-Lib Mag. 23(1/2) (2017)

    Google Scholar 

  8. Charles, M.: Adverbials of result: phraseology and functions in the problem-solution pattern. J. Engl. Acad. Purp. 10(1), 47–60 (2011)

    Article  Google Scholar 

  9. Cousijn, H., et al.: Connected research: the potential of the PID graph. Patterns 2(1), 100180 (2021)

    Article  Google Scholar 

  10. D’Souza, J., Hoppe, A., Brack, A., Jaradeh, M.Y., Auer, S., Ewerth, R.: The STEM-ECR dataset: grounding scientific entity references in stem scholarly content to authoritative encyclopedic and lexicographic sources. In: LREC, Marseille, France, pp. 2192–2203, May 2020

    Google Scholar 

  11. D’Souza, J., Ng, V.: Sieve-based entity linking for the biomedical domain. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 297–302 (2015)

    Google Scholar 

  12. Gupta, S., Manning, C.D.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1–9 (2011)

    Google Scholar 

  13. Handschuh, S., QasemiZadeh, B.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: COLING 2014: 4th International Workshop on Computational Terminology (2014)

    Google Scholar 

  14. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Coling 1992 Volume 2: The 15th International Conference on Computational Linguistics (1992)

    Google Scholar 

  15. Hearst, M.A.: Automated discovery of wordnet relations. WordNet: An Electronic Lexical Database, vol. 2 (1998)

    Google Scholar 

  16. Heffernan, K., Teufel, S.: Identifying problems and solutions in scientific text. Scientometrics 116(2), 1367–1382 (2018)

    Article  Google Scholar 

  17. Houngbo, H., Mercer, R.E.: Method mention extraction from scientific research papers. In: Proceedings of COLING 2012, pp. 1211–1222 (2012)

    Google Scholar 

  18. Jaradeh, M.Y., et al.: Open research knowledge graph: next generation infrastructure for semantic scholarly knowledge. In: Proceedings of the 10th International Conference on Knowledge Capture, K-CAP 2019, pp. 243–246. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3360901.3364435

  19. Johnson, R., Watkinson, A., Mabe, M.: The STM Report. An Overview of Scientific and Scholarly Publishing. 5th edn., October 2018. https://www.stm-assoc.org/2018_10_04_STM_Report_2018.pdf

  20. Katsurai, M., Joo, S.: Adoption of data mining methods in the discipline of library and information science. J. Libr. Inf. Stud. 19(1), 1–17 (2021)

    Google Scholar 

  21. Landhuis, E.: Scientific literature: information overload. Nature 535(7612), 457–458 (2016)

    Article  Google Scholar 

  22. Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: EMNLP (2018)

    Google Scholar 

  23. Luan, Y., Ostendorf, M., Hajishirzi, H.: Scientific information extraction with semi-supervised neural tagging. arXiv preprint arXiv:1708.06075 (2017)

  24. Miller, G.A.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  25. Raghunathan, K., et al.: A multi-pass sieve for coreference resolution. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 492–501 (2010)

    Google Scholar 

  26. Singh, M., Dan, S., Agarwal, S., Goyal, P., Mukherjee, A.: AppTechMiner: mining applications and techniques from scientific articles. In: Proceedings of the 6th International Workshop on Mining Scientific Publications, pp. 1–8 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jennifer D’Souza .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

D’Souza, J., Auer, S. (2021). Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91669-5_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91668-8

  • Online ISBN: 978-3-030-91669-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics