Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles

D’Souza, Jennifer; Auer, Sören

doi:10.1007/978-3-030-91669-5_31

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13133))

Included in the following conference series:

International Conference on Asian Digital Libraries

Abstract

We describe a rule-based approach for the automatic acquisition of salient scientific entities from Computational Linguistics (CL) scholarly article titles. Two observations motivated the approach: (i) noting salient aspects of an article’s contribution in its title; and (ii) pattern regularities capturing the salient terms that could be expressed in a set of rules. Only those lexico-syntactic patterns were selected that were easily recognizable, occurred frequently, and positionally indicated a scientific entity type. The rules were developed on a collection of 50,237 CL titles covering all articles in the ACL Anthology. In total, 19,799 research problems, 18,111 solutions, 20,033 resources, 1,059 languages, 6,878 tools, and 21,687 methods were extracted at an average precision of 75%.

Supported by TIB Leibniz Information Centre for Science and Technology, the EU H2020 ERC project ScienceGRaph (GA ID: 819536).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TechMiner: Extracting Technologies from Academic Publications

MORTY: Structured Summarization for Targeted Information Extraction from Scholarly Articles

Ontology-Driven Information Extraction from Research Publications

Notes

1.
The ORKG platform can be accessed online: https://orkg.org/.

References

Ammar, W., et al.: Construction of the literature graph in semantic scholar. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Industry Papers), vol. 3, pp. 84–91 (2018)
Google Scholar
Ammar, W., Peters, M.E., Bhagavatula, C., Power, R.: The AI2 system at SemeEal-2017 task 10 (ScienceIE): semi-supervised end-to-end entity and relation extraction. In: SemEval@ACL (2017)
Google Scholar
Aryani, A., et al.: A research graph dataset for connecting research data repositories using RD-switchboard. Sci. Data 5(1), 1–9 (2018)
Article Google Scholar
Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE - extracting keyphrases and relations from scientific publications. In: SemEval@ACL (2017)
Google Scholar
Beltagy, I., Lo, K., Cohan, A.: SciBERT: pretrained language model for scientific text. In: EMNLP (2019)
Google Scholar
Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-independent extraction of scientific concepts from research articles. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 251–266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_17
Chapter Google Scholar
Burton, A., et al.: The Scholix framework for interoperability in data-literature information exchange. D-Lib Mag. 23(1/2) (2017)
Google Scholar
Charles, M.: Adverbials of result: phraseology and functions in the problem-solution pattern. J. Engl. Acad. Purp. 10(1), 47–60 (2011)
Article Google Scholar
Cousijn, H., et al.: Connected research: the potential of the PID graph. Patterns 2(1), 100180 (2021)
Article Google Scholar
D’Souza, J., Hoppe, A., Brack, A., Jaradeh, M.Y., Auer, S., Ewerth, R.: The STEM-ECR dataset: grounding scientific entity references in stem scholarly content to authoritative encyclopedic and lexicographic sources. In: LREC, Marseille, France, pp. 2192–2203, May 2020
Google Scholar
D’Souza, J., Ng, V.: Sieve-based entity linking for the biomedical domain. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 297–302 (2015)
Google Scholar
Gupta, S., Manning, C.D.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1–9 (2011)
Google Scholar
Handschuh, S., QasemiZadeh, B.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: COLING 2014: 4th International Workshop on Computational Terminology (2014)
Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Coling 1992 Volume 2: The 15th International Conference on Computational Linguistics (1992)
Google Scholar
Hearst, M.A.: Automated discovery of wordnet relations. WordNet: An Electronic Lexical Database, vol. 2 (1998)
Google Scholar
Heffernan, K., Teufel, S.: Identifying problems and solutions in scientific text. Scientometrics 116(2), 1367–1382 (2018)
Article Google Scholar
Houngbo, H., Mercer, R.E.: Method mention extraction from scientific research papers. In: Proceedings of COLING 2012, pp. 1211–1222 (2012)
Google Scholar
Jaradeh, M.Y., et al.: Open research knowledge graph: next generation infrastructure for semantic scholarly knowledge. In: Proceedings of the 10th International Conference on Knowledge Capture, K-CAP 2019, pp. 243–246. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3360901.3364435
Johnson, R., Watkinson, A., Mabe, M.: The STM Report. An Overview of Scientific and Scholarly Publishing. 5th edn., October 2018. https://www.stm-assoc.org/2018_10_04_STM_Report_2018.pdf
Katsurai, M., Joo, S.: Adoption of data mining methods in the discipline of library and information science. J. Libr. Inf. Stud. 19(1), 1–17 (2021)
Google Scholar
Landhuis, E.: Scientific literature: information overload. Nature 535(7612), 457–458 (2016)
Article Google Scholar
Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: EMNLP (2018)
Google Scholar
Luan, Y., Ostendorf, M., Hajishirzi, H.: Scientific information extraction with semi-supervised neural tagging. arXiv preprint arXiv:1708.06075 (2017)
Miller, G.A.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Raghunathan, K., et al.: A multi-pass sieve for coreference resolution. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 492–501 (2010)
Google Scholar
Singh, M., Dan, S., Agarwal, S., Goyal, P., Mukherjee, A.: AppTechMiner: mining applications and techniques from scientific articles. In: Proceedings of the 6th International Workshop on Mining Scientific Publications, pp. 1–8 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
Jennifer D’Souza & Sören Auer
L3S Research Center at Leibniz University of Hannover, Hannover, Germany
Sören Auer

Authors

Jennifer D’Souza
View author publications
You can also search for this author in PubMed Google Scholar
Sören Auer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jennifer D’Souza .

Editor information

Editors and Affiliations

National Taiwan Normal University, Taipei, Taiwan
Hao-Ren Ke
Nanyang Technological University, Singapore, Singapore
Chei Sian Lee
Kyoto University, Kyoto, Japan
Kazunari Sugiyama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

D’Souza, J., Auer, S. (2021). Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-91669-5_31
Published: 30 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91668-8
Online ISBN: 978-3-030-91669-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles