skip to main content
10.1145/1851476.1851587acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

ROARS: a scalable repository for data intensive scientific computing

Published: 21 June 2010 Publication History

Abstract

As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide both data archival services and rich metadata, and cleanly integrate with large scale computing resources. ROARS is a hybrid approach to distributed storage that provides both large, robust, scalable storage and efficient rich metadata queries for scientific applications. In this paper, we demonstrate that ROARS is capable of importing and exporting large quantities of data, migrating data to new storage nodes, providing robust fault tolerance, and generating materialized views based on metadata queries. Our experimental results demonstrate that ROARS' aggregate throughput scales with the number of concurrent clients while providing fault-tolerant data access. ROARS is currently being used to store 5.1TB of data in our local biometrics repository.

References

[1]
}}S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 3(215):403--410, Oct 1990.
[2]
}}Amazon Simple Storage Service (Amazon S3). http://aws.amazon.com/s3/, 2009.
[3]
}}C. Baru, R. Moore, A. Rajasekar, and M. Wan. The SDSC storage resource broker. In Proceedings of CASCON, Toronto, Canada, 1998.
[4]
}}H. Bui, M. Kelly, C. Lyon, M. Pasquier, D. Thomas, P. Flynn, and D. Thain. Experience with BXGrid: A Data Repository and Computing Grid for Biometrics Research. Journal of Cluster Computing, 12(4):373, 2009.
[5]
}}J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Operating Systems Design and Implementation, 2004.
[6]
}}J. J. Dongarra and D. W. Walker. MPI: A standard message passing interface. Supercomputer, pages 56--68, January 1996.
[7]
}}S. Ghemawat, H. Gobioff, and S. Leung. The Google filesystem. In ACM Symposium on Operating Systems Principles, 2003.
[8]
}}Hadoop. http://hadoop.apache.org/, 2007.
[9]
}}J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, and M. West. Scale and performance in a distributed file system. ACM Trans. on Comp. Sys., 6(1):51--81, February 1988.
[10]
}}M. Ivanova, N. Nes, R. Goncalves, and M. Kersten. Monetdb/sql meets skyserver: the challenges of a scientific database. Scientific and Statistical Database Management, International Conference on, 0:13, 2007.
[11]
}}J. No, R. Thakur, and A. Choudhary:. Integrating parallel file i/o and database support for high-performance scientific data management. In IEEE High Performance Networking and Computing, 2000.
[12]
}}E. Riedel, G. A. Gibson, and C. Faloutsos. Active storage for large scale data mining and multimedia. In Very Large Databases (VLDB), 1998.
[13]
}}R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon. Design and implementation of the Sun network filesystem. In USENIX Summer Technical Conference, pages 119--130, 1985.
[14]
}}R. Searcs, C. V. Ingen, and J. Gray. To blob or not to blob: Large object storage in a database or a filesystem. Technical Report MSR-TR-2006-45, Microsoft Research, April 2006.
[15]
}}E. Stolte, C. von Praun, G. Alonso, and T. Gross. Scientific data repositories. designing for a moving target. In SIGMOD, 2003.
[16]
}}M. Stonebraker, J. Becla, D. J. DeWitt, K.-T. Lim, D. Maier, O. Ratzesberger, and S. B. Zdonik. Requirements for science data bases and scidb. In CIDR. www.crdrdb.org, 2009.
[17]
}}M. Stonebraker, J. F. T, and J. Dozier. An overview of the sequoia 2000 project. In In Proceedings of the Third International Symposium on Large Spatial Databases, pages 397--412, 1992.
[18]
}}A. S. Szalay, P. Z. Kunszt, A. Thakar, J. Gray, and D. R. Slutz. Designing and mining multi-terabyte astronomy archives: The sloan digital sky survey. In SIGMOD Conference, 2000.
[19]
}}O. Tatebe, N. Soda, Y. Morita, S. Matsuoka, and S. Sekiguchi. Gfarm v2: A grid file system that supports high-performance distributed and parallel data computing. In Computing in High Energy Physics (CHEP), September 2004.
[20]
}}D. Thain. Identity Boxing: A New Technique for Consistent Global Identity. In IEEE/ACM Supercomputing, pages 51--61, 2005.
[21]
}}D. Thain, C. Moretti, and J. Hemmes. Chirp: A Practical Global Filesystem for Cluster and Grid Computing. Journal of Grid Computing, 7(1):51--72, 2009.
[22]
}}D. Thain, T. Tannenbaum, and M. Livny. Condor and the grid. In F. Berman, G. Fox, and T. Hey, editors, Grid Computing: Making the Global Infrastructure a Reality. John Wiley, 2003.
[23]
}}Vertica. http://www.vertica.com/, 2009.
[24]
}}M. Wan, R. Moore, and W. Schroeder. A prototype rule-based distributed data management system rajasekar. In HPDC Workshop on Next Generation Distributed Data Management, May 2006.
[25]
}}S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In USENIX Operating Systems Design and Implementation, 2006.

Cited By

View all
  • (2017)Exploring the design space of metadata-focused file management systemsProceedings of the Australasian Computer Science Week Multiconference10.1145/3014812.3014833(1-10)Online publication date: 30-Jan-2017
  • (2012)Scripting distributed scientific workflows using WeaverConcurrency and Computation: Practice & Experience10.1002/cpe.187124:15(1685-1707)Online publication date: 1-Oct-2012
  • (2010)Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and HadoopProceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science10.1109/CloudCom.2010.74(488-495)Online publication date: 30-Nov-2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
June 2010
911 pages
ISBN:9781605589428
DOI:10.1145/1851476
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2010

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

HPDC '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Exploring the design space of metadata-focused file management systemsProceedings of the Australasian Computer Science Week Multiconference10.1145/3014812.3014833(1-10)Online publication date: 30-Jan-2017
  • (2012)Scripting distributed scientific workflows using WeaverConcurrency and Computation: Practice & Experience10.1002/cpe.187124:15(1685-1707)Online publication date: 1-Oct-2012
  • (2010)Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and HadoopProceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science10.1109/CloudCom.2010.74(488-495)Online publication date: 30-Nov-2010
  • (2010)Seismic data server application service for SEEGRID seismology virtual organizationEarth Science Informatics10.1007/s12145-010-0067-y3:4(219-228)Online publication date: 5-Sep-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media