Published July 28, 2023 | Version OAEI Bio-ML 2023
Dataset Open

Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching

  • 1. University of Oxford
  • 2. City, University of London
  • 3. Samsung Research UK

Description

 

This version is used in the Bio-ML track of the OAEI 2023; a few signifcant changes have been made as compared to the OAEI 2022 version.

 

Overview

The purpose of these datasets is to support equivalence and subsumption ontology matching.

There are five ontology pairs extracted from MONDO and UMLS:

Source Task Category #SrcCls #TgtCls #Ref (equiv) #Ref (subs)
Mondo OMIM-ORDO Disease 9,648 (+6) 9,275 (+437) 3,721 103
Mondo NCIT-DOID Disease 15,762 (+8,927) 8,465 (+17) 4,686 3,339
UMLS SNOMED-FMA Body 34,418 (+10,236) 88,955 (+24,229) 7,256 5,506
UMLS SNOMED-NCIT Pharm 29,500 (+13,455) 22,136 (+6,886) 5,803 4,225
UMLS SNOMED-NCIT Neoplas 22,971 (+11,700) 20,247 (+6291) 3,804 213

The "+" numbers reflect the changes due to locality module enrichment.

The main track is available at "bio-ml", where each pair is associated with a task folder, containing the source and target ontologies, reference equivalence mappings (in "refs_equiv"), reference subsumption mappings ("refs_subs"). 

The special sub-track is available at "bio-llm", where each pair is associated with a task folder, containing the source and target ontologies, and the test candidate mappings. 

 

Citation

Bio-ML

```
@inproceedings{he2022machine, title={Machine learning-friendly biomedical datasets for equivalence and subsumption ontology matching}, author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Jim{\'e}nez-Ruiz, Ernesto and Hadian, Ali and Horrocks, Ian}, booktitle={International Semantic Web Conference}, pages={575--591}, year={2022}, organization={Springer} }
```

Bio-LLM

```
@article{he2023exploring, title={Exploring large language models for ontology alignment}, author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Horrocks, Ian}, journal={arXiv preprint arXiv:2309.07172}, year={2023} }
```

 

Important Links

 

Changelog

Several signifcant changes have been made and they are well-documented in the Bio-ML documentation.

Files

bio-llm.zip

Files (154.1 MB)

Name Size Download all
md5:82a8a79d9a1d18e7b3469db8d9ed6f6a
13.0 MB Preview Download
md5:2bd40b9b44aab44fb3f048a89e974113
40.5 MB Preview Download
md5:5f205767c98f1f782b2896c7c67bfc78
100.5 MB Preview Download