Skip to main content

Application of High Performance Computing Techniques to the Semantic Data Transformation

  • Conference paper
Trends and Advances in Information Systems and Technologies (WorldCIST'18 2018)

Abstract

The growth of the Life Science Semantic Web is illustrated by the increasing number of resources available in the Linked Open Data Cloud. Our SWIT tool supports the generation of semantic repositories, and it has been successfully applied in the field of orthology resources, helping to achieve objectives of the Quest for Orthologs consortium. However, our experience with SWIT reveals that despite the computational complexity of the algorithm is linear with the size of the dataset, the time required for the generation of the datasets is longer than desired.

The goal of this work is the application of High Performance Computing techniques to speed up the generation of semantic datasets using SWIT. For this purpose, the SWIT kernel was reimplemented, its algorithm was adapted for facilitating the application of parallelization techniques, which were finally designed and implemented.

An experimental analysis of the speed up of the transformation process has been performed using the orthologs database InParanoid, which provides many files of orthology relations between pairs of species. The results show that we have been able to obtain accelerations up to 7000x.

The performance of SWIT has been highly improved, which will certainly increase its usefulness for creating large semantic datasets and show that HPC techniques should play an important role for increasing the performance of semantic tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.questfororthologs.org.

  2. 2.

    http://xmlsoft.org/.

  3. 3.

    https://software.intel.com/en-us/articles/what-is-code-modernization.

  4. 4.

    https://Neobernad@bitbucket.org/Neobernad/swit-test.git.

  5. 5.

    http://inparanoid.sbc.su.se/download/8.0_current/Orthologs_OrthoXML/.

  6. 6.

    http://purl.bioontology.org/ontology/ORTH.

  7. 7.

    http://sele.inf.um.es/swit/ortho/mappingsOrthoXML.xml.

References

  1. Belleau, F., Nolin, M.A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inform. 41(5), 706–716 (2008)

    Article  Google Scholar 

  2. Bizer, C.: The emerging web of linked data. Intell. Syst. IEEE 24(5), 87–92 (2009)

    Article  Google Scholar 

  3. Bodenreider, O., Stevens, R.: Bio-ontologies: current trends and future directions. Brief. Bioinf. 7, 256–274 (2006)

    Article  Google Scholar 

  4. Bourne, P.E., et al.: Biomedicine as a data driven science. In: National Data Integrity Conference-2015, Colorado State University. Libraries (2015)

    Google Scholar 

  5. Carriero, N., Osier, M.V., Cheung, K.H., Miller, P.L., Gerstein, M., Zhao, H., Wu, B., Rifkin, S., Chang, J., Zhang, H., White, K., Williams, K., Schultz, M.: A high productivity/low maintenance approach to high-performance computation for biomedicine: four case studies. J. Am. Med. Inf. Assoc. 12(1), 90–98 (2005)

    Article  Google Scholar 

  6. Fernández-Breis, J.T., Chiba, H., Legaz-García, M.D.C., Uchiyama, I.: The orthology ontology: development and applications. J. Biomed. Semant. 7, 34 (2016)

    Article  Google Scholar 

  7. Galperin, M.Y., Fernndez, X.M., Rigden, D.J.: The 24th annual nucleic acids research database issue: a look back and upcoming changes. Nucleic Acids Res. 45(D1), D1–D11 (2017)

    Article  Google Scholar 

  8. Hautaniemi, S., Laakso, M.: High-performance computing in biomedicine. In: 2013 International Conference on High Performance Computing and Simulation (HPCS), p. 233. IEEE (2013)

    Google Scholar 

  9. Legaz-García, M.D.C., Miñarro-Giménez, J.A., Tortosa, M.M., Fernández-Breis, J.T.: Generation of open biomedical datasets through ontology-driven transformation and integration processes. J. Biomed. Semant. 7, 32 (2016)

    Article  Google Scholar 

  10. Magalhães, G.G., Sartor, A.L., Lorenzon, A.F., Navaux, P.O.A., Beck, A.C.S.: How programming languages and paradigms affect performance and energy in multithreaded applications. In: 2016 VI Brazilian Symposium on Computing Systems Engineering (SBESC), pp. 71–78. IEEE (2016)

    Google Scholar 

  11. O’brien, K.P., Remm, M., Sonnhammer, E.L.: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33(suppl-1), D476–D480 (2005)

    Google Scholar 

  12. Schmitt, T., Messina, D.N., Schreiber, F., Sonnhammer, E.L.: Letter to the editor: Seqxml and orthoxml: standards for sequence and orthology information. Brief. Bioinf. 12(5), 485–488 (2011)

    Article  Google Scholar 

  13. Sonnhammer, E.L., Gabaldón, T., da Silva, A.W.S., Martin, M., Robinson-Rechavi, M., Boeckmann, B., Thomas, P.D., Dessimoz, C., et al.: Big data and other challenges in the quest for orthologs. Bioinformatics (2014) btu492

    Google Scholar 

  14. Tange, O.: GNU parallel - the command-line power tool. The USENIX Mag. 36(1), 42–47 (2011)

    Google Scholar 

Download references

Acknowledgements

This work has been partially funded by to the Spanish Ministry of Economy, Industry and Competitiveness, the European Regional Development Fund (ERDF) Programme and by the Fundación Séneca through grants TIN2014-53749-C2-2-R and 19371/PI/14.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Antonio Bernabé-Díaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Cite this paper

Bernabé-Díaz, J.A., Legaz-García, M.d.C., García, J.M., Fernández-Breis, J.T. (2018). Application of High Performance Computing Techniques to the Semantic Data Transformation. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds) Trends and Advances in Information Systems and Technologies. WorldCIST'18 2018. Advances in Intelligent Systems and Computing, vol 745. Springer, Cham. https://doi.org/10.1007/978-3-319-77703-0_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77703-0_69

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77702-3

  • Online ISBN: 978-3-319-77703-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics