Abstract
This paper examines and investigates the relationship between bioinformatics data processing and its underlying computing architecture within the context of the International Nucleotide Sequence Database Collaboration (INSDC). INSDC exchanges sequence data on a daily basis across its three member organizations in USA, UK and Japan. We studied how this sequence database in MySQL can best take advantage of the increased transfer bandwidth of a grid-based storage architecture. Within the context of the UK Government Project “Grid-oriented Storage (GOS)” and the EC Project “EuroAsiaGrid,” GOS has been developed in our lab, which melds parallel streaming technique to meet the needs of WAN/Grid-based virtual organizations. A real-world test shows that the INSDC sequence database backuping operation, mysqldump, over the pipelined GOS architecture beats those over the classic infrastructures by six times over the link between Cambridge and Tokyo. When performing genomic sequence search against one million records via the underlying GOS architecture, the performance improvement of 67.3% has been achieved.
Similar content being viewed by others
References
“Call for Papers: Special Issue on Computing Architectures and Acceleration for Bioinformatics Algorithms,” The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology,” 2006.
Frank Wang, Sining Wu, Na Helian, Andy Parker, Yike Guo, Yuhui Deng and Vineet Khare, “Grid-oriented Storage: A Single-Image, Cross-Domain, High-Bandwidth Architecture,” IEEE Trans Comp, ISSN: 0018–9340, accepted, vol. 56, no. 4, 2007.
International sequence databases exceed 100 gigabases, http://www.ncbi.nlm.nih.gov/Genbank/.
Glenn Colaco (Sun Microsystems) and Darrell Suggs (Network Appliance), “Database Performance with NAS: Optimizing Oracle on NFS,” Technical Whitepaper, May 2004.
Network File System, Sun Microsystems, http://www.sun.com.
Andrew S. Tanenbaum, “Computer Networks, 4th Edition,” ISBN-10: 0-13-066102-3, Prentice Hall, 2002.
“Breaking Moore's law, A brief history of the Grid, GridCafe,” gridcafe.web.cern.ch, 2006.
Shin SangYong, “Storage Tank in Data Grid, IBM Grid Computing, The Second International Workshop on HEP Data Grid,” August 22–23, 2003.
HP “StorageWorks Grid,” 2004, http://www.whitepapers.silicon.com/.
“The Storage Resource Manager Collaboration,” 2005, http://www.sdm.lbl.gov/srmwg/.
G. Gibson (Panasas Inc. & Carnegie Mellon), B. Welch (Panasas Inc.), G. Goodson, P. Corbett (Network Appliance Inc.), “Internet-Draft, Parallel NFS Requirements and Design Considerations,” October 18, 2004.
“The Panasas ActiveScale File System,” 2006, http://www.panasas.com/panfs.html.
P. H. Carns, W. B. Ligon III, R. B. Ross and R. Thakur, “PVFS: A Parallel File System For Linux Clusters,” Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, GA, October 2000, pp. 317–327.
“IBM General Parallel File System,” April 2006, http://www-03.ibm.com/.
Robin Schumacher, “MySQL Developer Zone,” MySQL AB, October 19, 2004, http://www.mysql.com/.
MySQL AB, “MySQL Internals Manual,” 2005-12-02 (revision: 472)
“GlobusWORLD,” http://www.globusworld.com/program/program.php, 2006.
Jie Chen, Walt Akers, Ying Chen and William Watson III, “Java Parallel Secure Stream for Grid Computing,” http://www.ihep.ac.cn/~chep01/abstract/10-008.htm, 2005.
“European Molecular Biology Laboratory (EMBL),” http://www.ensembl.org/Drosophila_melanogaster.
“Cambridge-Cranfield High Performance Computing Facility (CCHPCF),” http://www.hpcf.cam.ac.uk/.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, F.Z., Helian, N., Wu, S. et al. Grid-Based Storage Architecture for Accelerating Bioinformatics Computing. J VLSI Sign Process Syst Sign Im 48, 311–324 (2007). https://doi.org/10.1007/s11265-007-0066-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-007-0066-5