Abstract
Targeting biological questions requires comprehensive evaluation of multiple types of annotations describing current biological knowledge; they are increasingly available, but their fast evolution, heterogeneity and dispersion in many different sources hamper their effective use. Leveraging on innovative flexible data schema and automatic software procedures that support the integration of data sources evolving in number, data content and structure, while assuring quality and provenance tracking of the integrated data, we created a multi-organism Genomic and Proteomic Knowledge Base (GPKB) and easily maintained it updated. From several well-known databases it imports and integrates very numerous gene and protein data, external references and annotations, expressed through multiple biomedical terminologies. To easily query such integrated data, we developed intuitive web interfaces and services for programmatic access to the GPKB; they are publicly available respectively at http://www.bioinformatics.deib.polimi.it/GPKB/ and http://www.bioinformatics.deib.polimi.it/GPKB-REST/. The created GPKB is a very valuable resource used in several projects by many users; the developed interfaces enhance its relevance to the community by allowing the seamlessly composition of queries, although complex, on all data integrated in the GPKB, which can help unveiling new biomedical knowledge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Galperin, M.Y., Rigden, D.J., Fernández-Suárez, X.M.: The 2015 nucleic acids research database issue and molecular biology database collection. Nucleic Acids Res. 43(Database issue), D1–D5 (2015)
Sujansky, W.: Heterogeneous database integration in biomedicine. J. Biomed. Inform. 34(4), 285–298 (2001)
Hernandez, T., Kambhampati, S.: Integration of biological sources: current systems and challenges ahead. ACM Sigmod Rec. 33(3), 51–60 (2004)
Smedley, D., Haider, S., Ballester, B., Holland, R., London, D., Thorisson, G., Kasprzyk, A.: BioMart - Biological queries made easy. BMC Genom. 10(1), 22 (2009)
Stevens, R., Baker, P., Bechhofer, S., Ng, G., Jacoby, A., Paton, N.W., et al.: TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources. Bioinform. 16(2), 184–185 (2000)
Freier, A., Hofestädt, R., Lange, M., Scholz, U., Stephanik, A.: BioDataServer: a SQL-based service for the online integration of life science data. Silico Biol. 2(2), 37–57 (2002)
Cadag, E., Louie, B., Myler, P.J., Tarczy-Hornoch, P.: Biomediator data integration and inference for functional annotation of anonymous sequences. In: Pacific Symposium on Biocomputing, pp. 343–354 (2007)
Lee, T.J., Pouliot, Y., Wagner, V., Gupta, P., Stringer-Calvert, D.W., Tenenbaum, J.D., Karp, P.D.: BioWarehouse: a bioinformatics database warehouse toolkit. BMC Bioinform. 7, 170 (2006)
Birkland, A., Yona, G.: BIOZON: a system for unification, management and analysis of heterogeneous biological data. BMC Bioinform. 7, 70 (2006)
Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. 41(5), 706–716 (2008)
Louie, B., Mork, P., Martin-Sanchez, F., Halevy, A., Tarczy-Hornoch, P.: Data integration and genomic medicine. J. Biomed. Inform. 40(1), 5–16 (2007)
Goble, C., Stevens, R.: State of the nation in data integration for bioinformatics. J. Biomed. Inform. 41(5), 687–693 (2008)
Lapatas, V., Stefanidakis, M., Jimenez, R.C., Via, A., Schneider, M.V.: Data integration in biological research: an overview. J. Biol. Res. (Thessalon) 22(1), 9 (2015)
Davidson, S.B., Crabtree, J., Brunk, B.P., Schug, J., Tannen, V., Overton, G.C., et al.: K2/Kleisli and GUS: Experiments in integrated access to genomic data sources. IBM Syst. J. 40(2), 512–531 (2001)
Bornberg-Bauer, E., Paton, N.W.: Conceptual data modelling for bioinformatics. Brief. Bioinform. 3(2), 166–180 (2002)
Masseroli, M., Ceri, S., Campi, A.: Integration and mining of genomic annotations: experiences and perspectives in GFINDer data warehousing. In: Paton, N.W., Missier, P., Hedeler, C. (eds.) DILS 2009. LNCS, vol. 5647, pp. 88–95. Springer, Heidelberg (2009)
Canakoglu, A., Masseroli, M., Ceri, S., Tettamanti, L., Ghisalberti, G., Campi, A.: Integrative warehousing of biomolecular information to support complex multi-topic queries for biomedical knowledge discovery. In: Nikita, S.K., Fotiadis, D.I., (eds.) Proceedings of Thirteenth IEEE International Conference Bioinformatics and Bioengineering, (BIBE 2013), vol. 159, pp.1–4. IEEE Computer Society, Los Alamitos, CA (2013)
Ghisalberti, G., Masseroli, M., Tettamanti, L.: Quality controls in integrative approaches to detect errors and inconsistencies in biological databases. J. Integr. Bioinform. 7(3), 119, 1–13 (2010)
Masseroli, M., Canakoglu, A., Quigliatti, M.: Detection of gene annotations and protein-protein interaction associated disorders through transitive relationships between integrated annotations. BMC Genom. 16(Suppl 6), S5 (2015)
Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., et al.: Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15(10), 1451–1455 (2005)
Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., et al.: The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res. 41(Web Server issue), W557–W561 (2013)
LinkingOpenData W3C SWEO community project. http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData. Accessed 01 December 2015
Samwald, M., Jentzsch, A., Bouton, C., Kallesøe, C.S., Willighagen, E., Hajagos, J., et al.: Linked open drug data for pharmaceutical research and development. J. Cheminform. 3(1), 19 (2011)
Masseroli, M., Picozzi, M., Ghisalberti, G., Ceri, S.: Explorative search of distributed bio-data to answer complex biomedical questions. BMC Bioinform. 15(Suppl 1): S3, 1–14 (2014)
Cohen, T., Widdows, D., Schvaneveldt, R.W., Davies, P., Rindflesch, T.C.: Discovering discovery patterns with predication-based semantic indexing. J. Biomed. Inform. 45(6), 1049–1065 (2012)
Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.C.: SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinform. 28(23), 3158–3160 (2012)
Acknowledgements
The authors would like to thank the several students who co-worked on developing and making publicly available the GPKB and its web and service interfaces, particularly Maria Carucci, Vincenzo Di Girolamo, Stefano Gennaro, and Marta Morfina.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Canakoglu, A., Ceri, S., Masseroli, M. (2016). Biomolecular Annotation Integration and Querying to Help Unveiling New Biomedical Knowledge. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2016. Lecture Notes in Computer Science(), vol 9656. Springer, Cham. https://doi.org/10.1007/978-3-319-31744-1_69
Download citation
DOI: https://doi.org/10.1007/978-3-319-31744-1_69
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31743-4
Online ISBN: 978-3-319-31744-1
eBook Packages: Computer ScienceComputer Science (R0)