Abstract
The International Nucleotide Sequence Database Collaboration (INSDC) exchanges sequence data on a daily basis across its three member organizations in the USA, UK and Japan. This paper studies how this sequence database in MySQL can best take advantage of the increased transfer bandwidth of a Grid-optimized data communication protocol. Within the context of the UK Government Project Grid-oriented Storage (GOS) and the EC Project EuroAsiaGrid, GOS File System (GOS-FS) has been developed in our lab, which melds distributed file system technology with high performance data transfer techniques to meet the needs of WAN/Grid-based virtual organizations. A real-world test shows that the INSDC sequence database backing up operation, mysqldump, over the GOS-FS protocol beats those over the classic NFS protocol by 6 times over the link between Cambridge and Tokyo. Best of all, the multi-streamed GOS-FS protocol remains fully compatible with existing IP infrastructures.
Similar content being viewed by others
References
“International sequence databases exceed 100 gigabases,” www.ncbi.nlm.nih.gov/Genbank/
Colaco, G. and Suggs, D., “Database Performance with NAS: Optimizing Oracle on NFS,” Technical Whitepaper, May 2004.
Sun Microsystems, “Network File System,” www.sun.com.
Andrew S. Tanenbaum, Computer Networks (4th Edition) Prentice Hall, 2002.
GridCafe, “Breaking Moore’s law, A brief history of the Grid,” gridcafe.web.cern.ch, 2006
Hall, C. and Bonnet, P., “Getting Priorities Straight: Improving Linux Support for Database I/O,” Proc. of the 31st VLDB Conference, Trondheim, Norway, 2005.
Gibson, G., Welch, B., Goodson, G. and Corbett, P., “Internet-Draft, Parallel NFS Requirements and Design Considerations,” October 18, 2004.
“The Panasas ActiveScale File System,” www.panasas.com/panfs.html, 2006
“IBM General Parallel File System,” www-03.ibm.com/, April 2006
Carns, P. H., Ligon III, W. B., Ross, R. B. and Thakur, R. , “PVFS: A Parallel File System For Linux Clusters,” Proc. of the 4th Annual Linux Showcase and Conference, Atlanta, GA, pp. 317-327, October 2000.
“Lustre: A Scalable, High-Performance File System,” www.lustre.org/docs/whitepaper.pdf, 2006
“Grid Ecosystem GridFTP,” www.globus.org/
Wang, F., Wu, S., Helian, N., Deng, Y., Parker, A., Guo, Y. and Khare V. “Grid-oriented Storage: A Single-Image, Cross-Domain, High-Bandwidth Architecture,” IEEE Transaction on Computers, April, 2007
Schumacher, R., “MySQL Developer Zone,” MySQL AB, http://www.mysql.com/, October 19, 2004
GlobusWORLD, www.globusworld.com/program/program.php, 2006
Chen, J., Akers, W., Chen, Y. and Watson III, W., “Java Parallel Secure Stream for Grid Computing,” http://www.ihep.ac.cn/chep01/abstract/10-008.htm, 2005
MySQL AB, “MySQL Internals Manual,” December, 2005.
European Molecular Biology Laboratory, http://www.ensembl.org/Drosophila_melanogaster.
netperf, www.netperf.org/netperf/NetperfPage.html.
Cambridge-Cranfield High Performance Computing Facility (CCHPCF), http://www.hpcf.cam.ac.uk/.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Wang, F.Z., Wu, S., Helian, N. et al. Grid-based Data Access to Nucleotide Sequence Database. New Gener. Comput. 25, 409–424 (2007). https://doi.org/10.1007/s00354-007-0026-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00354-007-0026-4