Skip to main content
Log in

Grid-based Data Access to Nucleotide Sequence Database

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

The International Nucleotide Sequence Database Collaboration (INSDC) exchanges sequence data on a daily basis across its three member organizations in the USA, UK and Japan. This paper studies how this sequence database in MySQL can best take advantage of the increased transfer bandwidth of a Grid-optimized data communication protocol. Within the context of the UK Government Project Grid-oriented Storage (GOS) and the EC Project EuroAsiaGrid, GOS File System (GOS-FS) has been developed in our lab, which melds distributed file system technology with high performance data transfer techniques to meet the needs of WAN/Grid-based virtual organizations. A real-world test shows that the INSDC sequence database backing up operation, mysqldump, over the GOS-FS protocol beats those over the classic NFS protocol by 6 times over the link between Cambridge and Tokyo. Best of all, the multi-streamed GOS-FS protocol remains fully compatible with existing IP infrastructures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. “International sequence databases exceed 100 gigabases,” www.ncbi.nlm.nih.gov/Genbank/

  2. Colaco, G. and Suggs, D., “Database Performance with NAS: Optimizing Oracle on NFS,” Technical Whitepaper, May 2004.

  3. Sun Microsystems, “Network File System,” www.sun.com.

  4. Andrew S. Tanenbaum, Computer Networks (4th Edition) Prentice Hall, 2002.

  5. GridCafe, “Breaking Moore’s law, A brief history of the Grid,” gridcafe.web.cern.ch, 2006

  6. Hall, C. and Bonnet, P., “Getting Priorities Straight: Improving Linux Support for Database I/O,” Proc. of the 31st VLDB Conference, Trondheim, Norway, 2005.

  7. Gibson, G., Welch, B., Goodson, G. and Corbett, P., “Internet-Draft, Parallel NFS Requirements and Design Considerations,” October 18, 2004.

  8. “The Panasas ActiveScale File System,” www.panasas.com/panfs.html, 2006

  9. “IBM General Parallel File System,” www-03.ibm.com/, April 2006

  10. Carns, P. H., Ligon III, W. B., Ross, R. B. and Thakur, R. , “PVFS: A Parallel File System For Linux Clusters,” Proc. of the 4th Annual Linux Showcase and Conference, Atlanta, GA, pp. 317-327, October 2000.

  11. “Lustre: A Scalable, High-Performance File System,” www.lustre.org/docs/whitepaper.pdf, 2006

  12. “Grid Ecosystem GridFTP,” www.globus.org/

  13. Wang, F., Wu, S., Helian, N., Deng, Y., Parker, A., Guo, Y. and Khare V. “Grid-oriented Storage: A Single-Image, Cross-Domain, High-Bandwidth Architecture,” IEEE Transaction on Computers, April, 2007

  14. Schumacher, R., “MySQL Developer Zone,” MySQL AB, http://www.mysql.com/, October 19, 2004

  15. GlobusWORLD, www.globusworld.com/program/program.php, 2006

  16. Chen, J., Akers, W., Chen, Y. and Watson III, W., “Java Parallel Secure Stream for Grid Computing,” http://www.ihep.ac.cn/chep01/abstract/10-008.htm, 2005

  17. MySQL AB, “MySQL Internals Manual,” December, 2005.

  18. European Molecular Biology Laboratory, http://www.ensembl.org/Drosophila_melanogaster.

  19. netperf, www.netperf.org/netperf/NetperfPage.html.

  20. Cambridge-Cranfield High Performance Computing Facility (CCHPCF), http://www.hpcf.cam.ac.uk/.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Zhigang Wang.

About this article

Cite this article

Wang, F.Z., Wu, S., Helian, N. et al. Grid-based Data Access to Nucleotide Sequence Database. New Gener. Comput. 25, 409–424 (2007). https://doi.org/10.1007/s00354-007-0026-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-007-0026-4

Keywords

Navigation