Abstract
Model repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics can then help to classify models, to identify additional features for model retrieval tasks, or to enable the comparison of sets of models. In this paper, we present four methods for annotation-based feature extraction from model sets. All methods have been used with four different model sets in SBML format and taken from BioModels Database. To characterize each of these sets, we analyzed and extracted concepts from three frequently used ontologies for SBML models, namely Gene Ontology, ChEBI and SBO. We find that three of the four tested methods are suitable to determine characteristic features for model sets. The selected features vary depending on the underlying model set, and they are also specific to the chosen model set. We show that the identified features map on concepts that are higher up in the hierarchy of the ontologies than the concepts used for model annotations. Our analysis also reveals that the information content of concepts in ontologies and their usage for model annotation do not correlate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Le Novère, N., et al.: Meeting report from the first meetings of the Computational Modeling in Biology Network (COMBINE). Standards in Genomic Sciences 5(2), 230 (2011)
Hucka, M., et al.: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19(4), 524–531 (2003)
Courtot, M., et al.: Controlled vocabularies and semantics in systems biology. Molecular Systems Biology 7(1) (2011)
Robinson, P.N., Bauer, S.: Introduction to Bio-ontologies. Taylor & Francis, US (2011)
Li, C., et al.: BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models. BMC Systems Biology 4(1), 92 (2010)
Henkel, R., et al.: Ranked retrieval of Computational Biology models. BMC Bioinformatics 11(1), 423 (2010)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press Books (1999)
Waltemath, D., et al.: SBML Level 3 Package Proposal: Annot. Nature Preceedings (2011), http://precedings.nature.com/documents/5610/version/1
Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)
Hastings, J., et al.: The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res. 41, D456–D463 (2013)
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, San Francisco, CA, USA, pp. 412–420. Morgan Kaufmann Publishers Inc. (1997)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Hastie, T., Tibshirani, R., Friedman, J.: Hierarchical Clustering. In: The Elements of Statistical Learning, pp. 520–528. Springer (2009)
Li, Y., et al.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering 15(4), 871–882 (2003)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 445–453 (1995)
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Trißl, S., Hussels, P., Leser, U.: InterOnto – Ranking Inter-Ontology Links. In: Bodenreider, O., Rance, B. (eds.) DILS 2012. LNCS, vol. 7348, pp. 5–20. Springer, Heidelberg (2012)
McGuinness, D.L., et al.: Owl web ontology language overview. W3C Recommendation 10(2004-03) (2004)
Henkel, R., Wolkenhauer, O., Walthemath, D.: Combining computational models, semantic annotations, and associated simulation experiments in a graph database. Peer J. Preprints (2:e376v1) (2014)
Waltemath, D., et al.: Possibilities for Integrating Model-related Data in Computational Biology. In: CEUR Workshop Proceedings of the 9th International Conference on Data Integration in the Life Sciences (2013), http://www2.unb.ca/csas/data/ws/dils2013/
Henkel, R., et al.: Considerations of graph-based concepts to manage computational biology models and associated simulations. In: GI-Jahrestagung, pp. 1545–1551 (2012)
Waltemath, D., et al.: Das Sombi-Framework zum Ermitteln geeigneter Suchfunktionen für biologische Modelldatenbasen. Datenbank-Spektrum 11(1), 27–36 (2011)
Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research Logistics Quarterly 2(1-2), 83–97 (1955)
Cuellar, A.A., et al.: An overview of CellML 1.1, a biological model description language. Simulation 79(12), 740–747 (2003)
Gleeson, P., et al.: NeuroML: a language for describing data driven models of neurons and networks with a high degree of biological detail. PLoS Computational Biology 6(6), e1000815 (2010)
Schomburg, I., et al.: BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA. Nucleic Acids Research 41(D1), D764–D772 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Alm, R., Waltemath, D., Wolkenauer, O., Henkel, R. (2014). Annotation-Based Feature Extraction from Sets of SBML Models. In: Galhardas, H., Rahm, E. (eds) Data Integration in the Life Sciences. DILS 2014. Lecture Notes in Computer Science(), vol 8574. Springer, Cham. https://doi.org/10.1007/978-3-319-08590-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-08590-6_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08589-0
Online ISBN: 978-3-319-08590-6
eBook Packages: Computer ScienceComputer Science (R0)