Skip to main content
Log in

Abstract

This paper examines and investigates the relationship between bioinformatics data processing and its underlying computing architecture within the context of the International Nucleotide Sequence Database Collaboration (INSDC). INSDC exchanges sequence data on a daily basis across its three member organizations in USA, UK and Japan. We studied how this sequence database in MySQL can best take advantage of the increased transfer bandwidth of a grid-based storage architecture. Within the context of the UK Government Project “Grid-oriented Storage (GOS)” and the EC Project “EuroAsiaGrid,” GOS has been developed in our lab, which melds parallel streaming technique to meet the needs of WAN/Grid-based virtual organizations. A real-world test shows that the INSDC sequence database backuping operation, mysqldump, over the pipelined GOS architecture beats those over the classic infrastructures by six times over the link between Cambridge and Tokyo. When performing genomic sequence search against one million records via the underlying GOS architecture, the performance improvement of 67.3% has been achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. “Call for Papers: Special Issue on Computing Architectures and Acceleration for Bioinformatics Algorithms,” The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology,” 2006.

  2. Frank Wang, Sining Wu, Na Helian, Andy Parker, Yike Guo, Yuhui Deng and Vineet Khare, “Grid-oriented Storage: A Single-Image, Cross-Domain, High-Bandwidth Architecture,” IEEE Trans Comp, ISSN: 0018–9340, accepted, vol. 56, no. 4, 2007.

  3. International sequence databases exceed 100 gigabases, http://www.ncbi.nlm.nih.gov/Genbank/.

  4. Glenn Colaco (Sun Microsystems) and Darrell Suggs (Network Appliance), “Database Performance with NAS: Optimizing Oracle on NFS,” Technical Whitepaper, May 2004.

  5. Network File System, Sun Microsystems, http://www.sun.com.

  6. Andrew S. Tanenbaum, “Computer Networks, 4th Edition,” ISBN-10: 0-13-066102-3, Prentice Hall, 2002.

  7. “Breaking Moore's law, A brief history of the Grid, GridCafe,” gridcafe.web.cern.ch, 2006.

  8. Shin SangYong, “Storage Tank in Data Grid, IBM Grid Computing, The Second International Workshop on HEP Data Grid,” August 22–23, 2003.

  9. HP “StorageWorks Grid,” 2004, http://www.whitepapers.silicon.com/.

  10. “The Storage Resource Manager Collaboration,” 2005, http://www.sdm.lbl.gov/srmwg/.

  11. G. Gibson (Panasas Inc. & Carnegie Mellon), B. Welch (Panasas Inc.), G. Goodson, P. Corbett (Network Appliance Inc.), “Internet-Draft, Parallel NFS Requirements and Design Considerations,” October 18, 2004.

  12. “The Panasas ActiveScale File System,” 2006, http://www.panasas.com/panfs.html.

  13. P. H. Carns, W. B. Ligon III, R. B. Ross and R. Thakur, “PVFS: A Parallel File System For Linux Clusters,” Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317–327.

  14. “IBM General Parallel File System,” April 2006, http://www-03.ibm.com/.

  15. Robin Schumacher, “MySQL Developer Zone,” MySQL AB, October 19, 2004, http://www.mysql.com/.

  16. MySQL AB, “MySQL Internals Manual,” 2005-12-02 (revision: 472)

  17. “GlobusWORLD,” http://www.globusworld.com/program/program.php, 2006.

  18. Jie Chen, Walt Akers, Ying Chen and William Watson III, “Java Parallel Secure Stream for Grid Computing,” http://www.ihep.ac.cn/~chep01/abstract/10-008.htm, 2005.

  19. “European Molecular Biology Laboratory (EMBL),” http://www.ensembl.org/Drosophila_melanogaster.

  20. http://www.netperf.org/netperf/NetperfPage.html.

  21. “Cambridge-Cranfield High Performance Computing Facility (CCHPCF),” http://www.hpcf.cam.ac.uk/.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Zhigang Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, F.Z., Helian, N., Wu, S. et al. Grid-Based Storage Architecture for Accelerating Bioinformatics Computing. J VLSI Sign Process Syst Sign Im 48, 311–324 (2007). https://doi.org/10.1007/s11265-007-0066-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-007-0066-5

Keywords

Navigation