Abstract
The growth of the Life Science Semantic Web is illustrated by the increasing number of resources available in the Linked Open Data Cloud. Our SWIT tool supports the generation of semantic repositories, and it has been successfully applied in the field of orthology resources, helping to achieve objectives of the Quest for Orthologs consortium. However, our experience with SWIT reveals that despite the computational complexity of the algorithm is linear with the size of the dataset, the time required for the generation of the datasets is longer than desired.
The goal of this work is the application of High Performance Computing techniques to speed up the generation of semantic datasets using SWIT. For this purpose, the SWIT kernel was reimplemented, its algorithm was adapted for facilitating the application of parallelization techniques, which were finally designed and implemented.
An experimental analysis of the speed up of the transformation process has been performed using the orthologs database InParanoid, which provides many files of orthology relations between pairs of species. The results show that we have been able to obtain accelerations up to 7000x.
The performance of SWIT has been highly improved, which will certainly increase its usefulness for creating large semantic datasets and show that HPC techniques should play an important role for increasing the performance of semantic tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. 41(5), 706–716 (2008)
Bizer, C.: The emerging web of linked data. Intell. Syst. IEEE 24(5), 87–92 (2009)
Bodenreider, O., Stevens, R.: Bio-ontologies: current trends and future directions. Brief. Bioinf. 7, 256–274 (2006)
Bourne, P.E., et al.: Biomedicine as a data driven science. In: National Data Integrity Conference-2015, Colorado State University. Libraries (2015)
Carriero, N., Osier, M.V., Cheung, K.H., Miller, P.L., Gerstein, M., Zhao, H., Wu, B., Rifkin, S., Chang, J., Zhang, H., White, K., Williams, K., Schultz, M.: A high productivity/low maintenance approach to high-performance computation for biomedicine: four case studies. J. Am. Med. Inf. Assoc. 12(1), 90–98 (2005)
Fernández-Breis, J.T., Chiba, H., Legaz-García, M.D.C., Uchiyama, I.: The orthology ontology: development and applications. J. Biomed. Semant. 7, 34 (2016)
Galperin, M.Y., Fernndez, X.M., Rigden, D.J.: The 24th annual nucleic acids research database issue: a look back and upcoming changes. Nucleic Acids Res. 45(D1), D1–D11 (2017)
Hautaniemi, S., Laakso, M.: High-performance computing in biomedicine. In: 2013 International Conference on High Performance Computing and Simulation (HPCS), p. 233. IEEE (2013)
Legaz-García, M.D.C., Miñarro-Giménez, J.A., Tortosa, M.M., Fernández-Breis, J.T.: Generation of open biomedical datasets through ontology-driven transformation and integration processes. J. Biomed. Semant. 7, 32 (2016)
Magalhães, G.G., Sartor, A.L., Lorenzon, A.F., Navaux, P.O.A., Beck, A.C.S.: How programming languages and paradigms affect performance and energy in multithreaded applications. In: 2016 VI Brazilian Symposium on Computing Systems Engineering (SBESC), pp. 71–78. IEEE (2016)
O’brien, K.P., Remm, M., Sonnhammer, E.L.: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33(suppl-1), D476–D480 (2005)
Schmitt, T., Messina, D.N., Schreiber, F., Sonnhammer, E.L.: Letter to the editor: Seqxml and orthoxml: standards for sequence and orthology information. Brief. Bioinf. 12(5), 485–488 (2011)
Sonnhammer, E.L., Gabaldón, T., da Silva, A.W.S., Martin, M., Robinson-Rechavi, M., Boeckmann, B., Thomas, P.D., Dessimoz, C., et al.: Big data and other challenges in the quest for orthologs. Bioinformatics (2014) btu492
Tange, O.: GNU parallel - the command-line power tool. The USENIX Mag. 36(1), 42–47 (2011)
Acknowledgements
This work has been partially funded by to the Spanish Ministry of Economy, Industry and Competitiveness, the European Regional Development Fund (ERDF) Programme and by the Fundación Séneca through grants TIN2014-53749-C2-2-R and 19371/PI/14.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Bernabé-Díaz, J.A., Legaz-García, M.d.C., García, J.M., Fernández-Breis, J.T. (2018). Application of High Performance Computing Techniques to the Semantic Data Transformation. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds) Trends and Advances in Information Systems and Technologies. WorldCIST'18 2018. Advances in Intelligent Systems and Computing, vol 745. Springer, Cham. https://doi.org/10.1007/978-3-319-77703-0_69
Download citation
DOI: https://doi.org/10.1007/978-3-319-77703-0_69
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77702-3
Online ISBN: 978-3-319-77703-0
eBook Packages: EngineeringEngineering (R0)