Easy Semantification of Bioassays

Anteghini, Marco; D’Souza, Jennifer; dos Santos, Vitor A. P. Martins; Auer, Sören

doi:10.1007/978-3-031-08421-8_14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13196))

Included in the following conference series:

International Conference of the Italian Association for Artificial Intelligence

1199 Accesses

Abstract

Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two methods are on opposite ends of the method complexity spectrum. Characteristically modeling our problem, we find the clustering solution significantly outperforms a deep neural network state-of-the-art labeling approach. This novel contribution is based on two factors: 1) a learning objective closely modeled after the data outperforms an alternative approach with sophisticated semantic modeling; 2) automatically semantifying biological assays achieves a high performance F1 of nearly 83%, which to our knowledge is the first reported standardized evaluation of the task offering a strong benchmark model.

Supported by TIB Leibniz Information Centre for Science and Technology, the EU H2020 ERC project ScienceGRaph (GA ID: 819536) and the ITN PERICO (GA ID: 812968).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Representing Semantified Biological Assays in the Open Research Knowledge Graph

NETME: On-the-Fly Knowledge Network Construction from Biomedical Literature

Semantic Data Integration and Knowledge Management to Represent Biological Network Associations

Notes

References

Abeyruwan, S., et al.: Evolving BioAssay ontology (BAO): modularization, integration and applications. J. Biomed. Semantics 5(Suppl 1), S5 (2014)
Article Google Scholar
Ammar, W., Peters, M.E., Bhagavatula, C., Power, R.: The AI2 system at SemEval-2017 task 10 (ScienceIE): semi-supervised end-to-end entity and relation extraction. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 592–596. Association for Computational Linguistics, Vancouver (2017). https://doi.org/10.18653/v1/S17-2097
Anteghini, M., D’Souza, J., Dos Santos, V.A.M., Auer, S.: SciBERT-based semantification of bioassays in the open research knowledge graph. In: EKAW-PD 2020, pp. 22–30 (2020)
Google Scholar
Anteghini, M., D’Souza, J., Martins dos Santos, V.A.P., Auer, S.: Representing semantified biological assays in the open research knowledge graph. In: Ishita, E., Pang, N.L.S., Zhou, L. (eds.) ICADL 2020. LNCS, vol. 12504, pp. 89–98. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64452-9_8
Chapter Google Scholar
Auer, S.: Towards an open research knowledge graph (2018). https://doi.org/10.5281/zenodo.1157185
Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE - extracting keyphrases and relations from scientific publications. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 546–555. Association for Computational Linguistics, Vancouver (2017). https://doi.org/10.18653/v1/S17-2091
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3606–3611 (2019)
Google Scholar
Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Sci. Am. 284(5), 34–43 (2001)
Article Google Scholar
Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-independent extraction of scientific concepts from research articles. In: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 251–266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_17
Chapter Google Scholar
Clark, A.M., Bunin, B.A., Litterman, N.K., Schürer, S.C., Visser, U.: Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation. PeerJ 2, e524 (2014)
Article Google Scholar
The UniProt Consortium: UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49(D1), D480–D489 (2020). https://doi.org/10.1093/nar/gkaa1100
Constantin, A., Peroni, S., Pettifer, S., Shotton, D., Vitali, F.: The document components ontology (DoCo). Semantic Web 7(2), 167–181 (2016). https://doi.org/10.3233/SW-150177
Article Google Scholar
Dessì, D., Osborne, F., Reforgiato Recupero, D., Buscaldi, D., Motta, E., Sack, H.: AI-KG: an automatically generated knowledge graph of artificial intelligence. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12507, pp. 127–143. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62466-8_9
Chapter Google Scholar
D’Souza, J., Auer, S., Pedersen, T.: SemEval-2021 Task 11: NLPContributionGraph - structuring scholarly NLP contributions for a research knowledge graph. In: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pp. 364–376. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.semeval-1.44
D’Souza, J., Hoppe, A., Brack, A., Jaradeh, M.Y., Auer, S., Ewerth, R.: The STEM-ECR dataset: grounding scientific entity references in STEM scholarly content to authoritative encyclopedic and lexicographic sources. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2192–2203. European Language Resources Association, Marseille (2020). https://aclanthology.org/2020.lrec-1.268
Fisas, B., Ronzano, F., Saggion, H.: A multi-layered annotated corpus of scientific papers. In: LREC (2016)
Google Scholar
Gábor, K., Buscaldi, D., Schumann, A.K., QasemiZadeh, B., Zargayouna, H., Charnois, T.: SemEval-2018 task 7: semantic relation extraction and classification in scientific papers. In: Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 679–688. Association for Computational Linguistics, New Orleans (2018). https://doi.org/10.18653/v1/S18-1111
Hoskins, W.M., Craig, R.: Uses of bioassay in entomology. Annu. Rev. Entomol. 7(1), 437–464 (1962)
Article Google Scholar
Irwin, J.: Statistical method in biological assay. Nature 172(4386), 925–926 (1953)
Article Google Scholar
Jassal, B., et al.: The reactome pathway knowledgebase. Nucleic Acids Res. (2019). https://doi.org/10.1093/nar/gkz1031
Article Google Scholar
Jin, X., Han, J.: K-means clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston (2011). https://doi.org/10.1007/978-0-387-30164-8_425
Kanehisa, M., Goto, S.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000). https://doi.org/10.1093/nar/28.1.27
Article Google Scholar
Katayama, T., et al.: Biohackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains. J. Biomed. Semantics 5(1), 1–13 (2014)
Article Google Scholar
Kononova, O., et al.: Text-mined dataset of inorganic materials synthesis recipes. Sci. Data 6(1), 1–11 (2019)
Article MathSciNet Google Scholar
Kulkarni, C., Xu, W., Ritter, A., Machiraju, R.: An annotated corpus for machine reading of instructions in wet lab protocols. In: NAACL: HLT, vol. 2, pp. 97–106 (Short Papers). New Orleans (2018). https://doi.org/10.18653/v1/N18-2016
Kuniyoshi, F., Makino, K., Ozawa, J., Miwa, M.: Annotating and extracting synthesis process of all-solid-state batteries from scientific literature. In: LREC, pp. 1941–1950 (2020)
Google Scholar
Liakata, M., Saha, S., Dobnik, S., Batchelor, C., Rebholz-Schuhmann, D.: Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics 28(7), 991–1000 (2012). https://doi.org/10.1093/bioinformatics/bts071
Article Google Scholar
Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.: Corpora for the conceptualisation and zoning of scientific papers. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA), Valletta (2010)
Google Scholar
Liu, H., Sarol, M.J., Kilicoglu, H.: UIUC_BioNLP at SemEval-2021 task 11: a cascade of neural models for structuring scholarly NLP contributions. In: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pp. 377–386. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.semeval-1.45
Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3219–3232. Association for Computational Linguistics, Brussels (2018). https://doi.org/10.18653/v1/D18-1360
Mysore, S., et al.: The materials science procedural text corpus: annotating materials synthesis procedures with shallow semantic structures. In: Proceedings of the 13th Linguistic Annotation Workshop, pp. 56–64 (2019)
Google Scholar
Pertsas, V., Constantopoulos, P.: Scholarly ontology: modelling scholarly practices. Int. J. Digit. Libr. 18(3), 173–190 (2017)
Article Google Scholar
Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF. w3c recommendation (2008)
Google Scholar
QasemiZadeh, B., Handschuh, S.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: Proceedings of the 4th International Workshop on Computational Terminology (Computerm), pp. 52–63. Association for Computational Linguistics and Dublin City University, Dublin (2014). https://doi.org/10.3115/v1/W14-4807
Wheeler, D.L., et al.: Database resources of the national center for biotechnology information. Nucleic Acids Res. 46(D1), D8–D13 (2017). https://doi.org/10.1093/nar/gkx1095
Article Google Scholar
Sammut, C., Webb, G.I. (eds.): TF-IDF, pp. 986–987. Springer, Boston (2010)
Google Scholar
Schürer, S.C., Vempati, U., Smith, R., Southern, M., Lemmon, V.: Bioassay ontology annotations facilitate cross-analysis of diverse high-throughput screening data sets. J. Biomol. Screen. 16(4), 415–426 (2011)
Article Google Scholar
Soldatova, L.N., King, R.D.: An ontology of scientific experiments. J. R. Soc. Interface 3(11), 795–803 (2006). https://doi.org/10.1098/rsif.2006.0134
Article Google Scholar
Syakur, M., Khotimah, B., Rochman, E., Satoto, B.D.: Integration k-means clustering method and elbow method for identification of the best customer profile cluster. In: IOP Conference Series: Materials Science and Engineering, vol. 336, p. 012017. IOP Publishing (2018)
Google Scholar
Teufel, S., Carletta, J., Moens, M.: An annotation scheme for discourse-level argumentation in research articles. In: Ninth Conference of the European Chapter of the Association for Computational Linguistics, pp. 110–117. Association for Computational Linguistics, Bergen (1999). https://aclanthology.org/E99-1015
Thomas, A.L.: Essentials in bioassay development. BioPharm Int. 32(11), 42–45 (2019)
Google Scholar
Vempati, U.D., et al.: Formalization, annotation and analysis of diverse drug and probe screening assay datasets using the BioAssay Ontology (BAO). PLoS ONE 7(11), e49198 (2012)
Article Google Scholar
Visser, U., Abeyruwan, S., Vempati, U., Smith, R.P., Lemmon, V., Schürer, S.C.: BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results. BMC Bioinform. 12(1), 257 (2011)
Article Google Scholar
Wadden, D., Wennberg, U., Luan, Y., Hajishirzi, H.: Entity, relation, and event extraction with contextualized span representations. arXiv preprint arXiv:1909.03546 (2019)
Wang, Y., et al.: PubChem BioAssay: 2017 update. Nucleic Acids Res. 45(D1), D955–D963 (2016)
Article Google Scholar
Wang, Y., et al.: PubChem’s BioAssay database. Nucleic Acids Res. 40(D1), D400–D412 (2011)
Article Google Scholar
Zhou, P., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 2: Short Papers), pp. 207–212. Association for Computational Linguistics, Berlin (2016). https://doi.org/10.18653/v1/P16-2034

Download references

Author information

Authors and Affiliations

Lifeglimmer GmbH, Markelstr. 38, 12163, Berlin, Germany
Marco Anteghini & Vitor A. P. Martins dos Santos
Wageningen University and Research, Laboratory of Systems and Synthetic Biology, Stippeneng 4, 6708 WE, Wageningen, The Netherlands
Marco Anteghini & Vitor A. P. Martins dos Santos
TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
Jennifer D’Souza & Sören Auer

Authors

Marco Anteghini
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer D’Souza
View author publications
You can also search for this author in PubMed Google Scholar
Vitor A. P. Martins dos Santos
View author publications
You can also search for this author in PubMed Google Scholar
Sören Auer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Anteghini .

Editor information

Editors and Affiliations

Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
Stefania Bandini
Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
Francesca Gasparini
Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Genova, Italy
Viviana Mascardi
Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
Matteo Palmonari
Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
Giuseppe Vizzari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anteghini, M., D’Souza, J., dos Santos, V.A.P.M., Auer, S. (2022). Easy Semantification of Bioassays. In: Bandini, S., Gasparini, F., Mascardi, V., Palmonari, M., Vizzari, G. (eds) AIxIA 2021 – Advances in Artificial Intelligence. AIxIA 2021. Lecture Notes in Computer Science(), vol 13196. Springer, Cham. https://doi.org/10.1007/978-3-031-08421-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-08421-8_14
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08420-1
Online ISBN: 978-3-031-08421-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Easy Semantification of Bioassays