Skip to main content

Easy Semantification of Bioassays

  • Conference paper
  • First Online:
AIxIA 2021 – Advances in Artificial Intelligence (AIxIA 2021)

Abstract

Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. We propose a solution for automatically semantifying biological assays. Our solution contrasts the problem of automated semantification as labeling versus clustering where the two methods are on opposite ends of the method complexity spectrum. Characteristically modeling our problem, we find the clustering solution significantly outperforms a deep neural network state-of-the-art labeling approach. This novel contribution is based on two factors: 1) a learning objective closely modeled after the data outperforms an alternative approach with sophisticated semantic modeling; 2) automatically semantifying biological assays achieves a high performance F1 of nearly 83%, which to our knowledge is the first reported standardized evaluation of the task offering a strong benchmark model.

Supported by TIB Leibniz Information Centre for Science and Technology, the EU H2020 ERC project ScienceGRaph (GA ID: 819536) and the ITN PERICO (GA ID: 812968).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.orkg.org/.

  2. 2.

    https://bioportal.bioontology.org/.

  3. 3.

    https://github.com/MarcoAnteghini/Easy-Semantification-of-Bioassays-SM.

  4. 4.

    https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html.

References

  1. Abeyruwan, S., et al.: Evolving BioAssay ontology (BAO): modularization, integration and applications. J. Biomed. Semantics 5(Suppl 1), S5 (2014)

    Article  Google Scholar 

  2. Ammar, W., Peters, M.E., Bhagavatula, C., Power, R.: The AI2 system at SemEval-2017 task 10 (ScienceIE): semi-supervised end-to-end entity and relation extraction. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 592–596. Association for Computational Linguistics, Vancouver (2017). https://doi.org/10.18653/v1/S17-2097

  3. Anteghini, M., D’Souza, J., Dos Santos, V.A.M., Auer, S.: SciBERT-based semantification of bioassays in the open research knowledge graph. In: EKAW-PD 2020, pp. 22–30 (2020)

    Google Scholar 

  4. Anteghini, M., D’Souza, J., Martins dos Santos, V.A.P., Auer, S.: Representing semantified biological assays in the open research knowledge graph. In: Ishita, E., Pang, N.L.S., Zhou, L. (eds.) ICADL 2020. LNCS, vol. 12504, pp. 89–98. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64452-9_8

    Chapter  Google Scholar 

  5. Auer, S.: Towards an open research knowledge graph (2018). https://doi.org/10.5281/zenodo.1157185

  6. Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE - extracting keyphrases and relations from scientific publications. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 546–555. Association for Computational Linguistics, Vancouver (2017). https://doi.org/10.18653/v1/S17-2091

  7. Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3606–3611 (2019)

    Google Scholar 

  8. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Sci. Am. 284(5), 34–43 (2001)

    Article  Google Scholar 

  9. Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R.: Domain-independent extraction of scientific concepts from research articles. In: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 251–266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_17

    Chapter  Google Scholar 

  10. Clark, A.M., Bunin, B.A., Litterman, N.K., Schürer, S.C., Visser, U.: Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation. PeerJ 2, e524 (2014)

    Article  Google Scholar 

  11. The UniProt Consortium: UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49(D1), D480–D489 (2020). https://doi.org/10.1093/nar/gkaa1100

  12. Constantin, A., Peroni, S., Pettifer, S., Shotton, D., Vitali, F.: The document components ontology (DoCo). Semantic Web 7(2), 167–181 (2016). https://doi.org/10.3233/SW-150177

    Article  Google Scholar 

  13. Dessì, D., Osborne, F., Reforgiato Recupero, D., Buscaldi, D., Motta, E., Sack, H.: AI-KG: an automatically generated knowledge graph of artificial intelligence. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12507, pp. 127–143. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62466-8_9

    Chapter  Google Scholar 

  14. D’Souza, J., Auer, S., Pedersen, T.: SemEval-2021 Task 11: NLPContributionGraph - structuring scholarly NLP contributions for a research knowledge graph. In: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pp. 364–376. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.semeval-1.44

  15. D’Souza, J., Hoppe, A., Brack, A., Jaradeh, M.Y., Auer, S., Ewerth, R.: The STEM-ECR dataset: grounding scientific entity references in STEM scholarly content to authoritative encyclopedic and lexicographic sources. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2192–2203. European Language Resources Association, Marseille (2020). https://aclanthology.org/2020.lrec-1.268

  16. Fisas, B., Ronzano, F., Saggion, H.: A multi-layered annotated corpus of scientific papers. In: LREC (2016)

    Google Scholar 

  17. Gábor, K., Buscaldi, D., Schumann, A.K., QasemiZadeh, B., Zargayouna, H., Charnois, T.: SemEval-2018 task 7: semantic relation extraction and classification in scientific papers. In: Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 679–688. Association for Computational Linguistics, New Orleans (2018). https://doi.org/10.18653/v1/S18-1111

  18. Hoskins, W.M., Craig, R.: Uses of bioassay in entomology. Annu. Rev. Entomol. 7(1), 437–464 (1962)

    Article  Google Scholar 

  19. Irwin, J.: Statistical method in biological assay. Nature 172(4386), 925–926 (1953)

    Article  Google Scholar 

  20. Jassal, B., et al.: The reactome pathway knowledgebase. Nucleic Acids Res. (2019). https://doi.org/10.1093/nar/gkz1031

    Article  Google Scholar 

  21. Jin, X., Han, J.: K-means clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston (2011). https://doi.org/10.1007/978-0-387-30164-8_425

  22. Kanehisa, M., Goto, S.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000). https://doi.org/10.1093/nar/28.1.27

    Article  Google Scholar 

  23. Katayama, T., et al.: Biohackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains. J. Biomed. Semantics 5(1), 1–13 (2014)

    Article  Google Scholar 

  24. Kononova, O., et al.: Text-mined dataset of inorganic materials synthesis recipes. Sci. Data 6(1), 1–11 (2019)

    Article  MathSciNet  Google Scholar 

  25. Kulkarni, C., Xu, W., Ritter, A., Machiraju, R.: An annotated corpus for machine reading of instructions in wet lab protocols. In: NAACL: HLT, vol. 2, pp. 97–106 (Short Papers). New Orleans (2018). https://doi.org/10.18653/v1/N18-2016

  26. Kuniyoshi, F., Makino, K., Ozawa, J., Miwa, M.: Annotating and extracting synthesis process of all-solid-state batteries from scientific literature. In: LREC, pp. 1941–1950 (2020)

    Google Scholar 

  27. Liakata, M., Saha, S., Dobnik, S., Batchelor, C., Rebholz-Schuhmann, D.: Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics 28(7), 991–1000 (2012). https://doi.org/10.1093/bioinformatics/bts071

    Article  Google Scholar 

  28. Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.: Corpora for the conceptualisation and zoning of scientific papers. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA), Valletta (2010)

    Google Scholar 

  29. Liu, H., Sarol, M.J., Kilicoglu, H.: UIUC_BioNLP at SemEval-2021 task 11: a cascade of neural models for structuring scholarly NLP contributions. In: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), pp. 377–386. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.semeval-1.45

  30. Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3219–3232. Association for Computational Linguistics, Brussels (2018). https://doi.org/10.18653/v1/D18-1360

  31. Mysore, S., et al.: The materials science procedural text corpus: annotating materials synthesis procedures with shallow semantic structures. In: Proceedings of the 13th Linguistic Annotation Workshop, pp. 56–64 (2019)

    Google Scholar 

  32. Pertsas, V., Constantopoulos, P.: Scholarly ontology: modelling scholarly practices. Int. J. Digit. Libr. 18(3), 173–190 (2017)

    Article  Google Scholar 

  33. Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF. w3c recommendation (2008)

    Google Scholar 

  34. QasemiZadeh, B., Handschuh, S.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: Proceedings of the 4th International Workshop on Computational Terminology (Computerm), pp. 52–63. Association for Computational Linguistics and Dublin City University, Dublin (2014). https://doi.org/10.3115/v1/W14-4807

  35. Wheeler, D.L., et al.: Database resources of the national center for biotechnology information. Nucleic Acids Res. 46(D1), D8–D13 (2017). https://doi.org/10.1093/nar/gkx1095

    Article  Google Scholar 

  36. Sammut, C., Webb, G.I. (eds.): TF-IDF, pp. 986–987. Springer, Boston (2010)

    Google Scholar 

  37. Schürer, S.C., Vempati, U., Smith, R., Southern, M., Lemmon, V.: Bioassay ontology annotations facilitate cross-analysis of diverse high-throughput screening data sets. J. Biomol. Screen. 16(4), 415–426 (2011)

    Article  Google Scholar 

  38. Soldatova, L.N., King, R.D.: An ontology of scientific experiments. J. R. Soc. Interface 3(11), 795–803 (2006). https://doi.org/10.1098/rsif.2006.0134

    Article  Google Scholar 

  39. Syakur, M., Khotimah, B., Rochman, E., Satoto, B.D.: Integration k-means clustering method and elbow method for identification of the best customer profile cluster. In: IOP Conference Series: Materials Science and Engineering, vol. 336, p. 012017. IOP Publishing (2018)

    Google Scholar 

  40. Teufel, S., Carletta, J., Moens, M.: An annotation scheme for discourse-level argumentation in research articles. In: Ninth Conference of the European Chapter of the Association for Computational Linguistics, pp. 110–117. Association for Computational Linguistics, Bergen (1999). https://aclanthology.org/E99-1015

  41. Thomas, A.L.: Essentials in bioassay development. BioPharm Int. 32(11), 42–45 (2019)

    Google Scholar 

  42. Vempati, U.D., et al.: Formalization, annotation and analysis of diverse drug and probe screening assay datasets using the BioAssay Ontology (BAO). PLoS ONE 7(11), e49198 (2012)

    Article  Google Scholar 

  43. Visser, U., Abeyruwan, S., Vempati, U., Smith, R.P., Lemmon, V., Schürer, S.C.: BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results. BMC Bioinform. 12(1), 257 (2011)

    Article  Google Scholar 

  44. Wadden, D., Wennberg, U., Luan, Y., Hajishirzi, H.: Entity, relation, and event extraction with contextualized span representations. arXiv preprint arXiv:1909.03546 (2019)

  45. Wang, Y., et al.: PubChem BioAssay: 2017 update. Nucleic Acids Res. 45(D1), D955–D963 (2016)

    Article  Google Scholar 

  46. Wang, Y., et al.: PubChem’s BioAssay database. Nucleic Acids Res. 40(D1), D400–D412 (2011)

    Article  Google Scholar 

  47. Zhou, P., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 2: Short Papers), pp. 207–212. Association for Computational Linguistics, Berlin (2016). https://doi.org/10.18653/v1/P16-2034

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Anteghini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anteghini, M., D’Souza, J., dos Santos, V.A.P.M., Auer, S. (2022). Easy Semantification of Bioassays. In: Bandini, S., Gasparini, F., Mascardi, V., Palmonari, M., Vizzari, G. (eds) AIxIA 2021 – Advances in Artificial Intelligence. AIxIA 2021. Lecture Notes in Computer Science(), vol 13196. Springer, Cham. https://doi.org/10.1007/978-3-031-08421-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08421-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08420-1

  • Online ISBN: 978-3-031-08421-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics