skip to main content
10.1145/2538542.2538565acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Asynchronous object storage with QoS for scientific and commercial big data

Published: 17 November 2013 Publication History

Abstract

This paper presents our design for an asynchronous object storage system intended for use in scientific and commercial big data workloads. Use cases from the target workload domains are used to motivate the key abstractions used in the application programming interface (API). The architecture of the Scalable Object Store (SOS), a prototype object storage system that supports the API's facilities, is presented. The SOS serves as a vehicle for future research into scalable and resilient big data object storage. We briefly review our research into providing efficient storage servers capable of providing quality of service (QoS) contracts relevant for big data use cases.

References

[1]
M. Abd-El-Malek, W. V. Courtright, II, C. Cranor, G. R. Ganger, J. Hendricks, A. J. Klosterman, M. Mesnier, M. Prasad, B. Salmon, R. R. Sambasivan, S. Sinnamohideen, J. D. Strunk, E. Thereska, M. Wachs, and J. J. Wylie. Ursa minor: versatile cluster-based storage. In Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4, FAST'05, pages 5--5, Berkeley, CA, USA, 2005. USENIX Association.
[2]
Amazon Web Services, Inc. Amazon Elastic MapReduce (Amazon EMR). http://aws.amazon.com/elasticmapreduce/, 2013.
[3]
Amazon Web Services, Inc. Amazon S3, cloud computing storage for files, images, videos. http://aws.amazon.com/s3/, 2013.
[4]
E. Barton. Fast forward i/o and storage. http://www.pdsw.org/pdsw12/slides/keynote-FF-IO-Storage.pdf, 2012.
[5]
P. H. Carns, W. B. Ligon, III, R. B. Ross, and R. Thakur. Pvfs: a parallel file system for linux clusters. In Proceedings of the 4th annual Linux Showcase & Conference - Volume 4, ALS'00, pages 28--28, Berkeley, CA, USA, 2000. USENIX Association.
[6]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, June 2008.
[7]
M. Curry, R. Klundt, and H. Ward. Using the Sirocco file system for high-bandwidth checkpoints. Technical Report Technical Report SAND2012-1087, Sandia National Laboratory, February 2012.
[8]
J. Dean and S. Ghemawat. Mapreduce: a flexible data processing tool. Commun. ACM, 53(1):72--77, Jan. 2010.
[9]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, SOSP '07, pages 205--220, New York, NY, USA, 2007. ACM.
[10]
M. Folk, G. Heber, Q. Koziol, E. Pourmal, and D. Robinson. An overview of the hdf5 technology suite and its applications. In Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, AD '11, pages 36--47, New York, NY, USA, 2011. ACM.
[11]
D. K. Gifford, R. M. Needham, and M. D. Schroeder. The cedar file system. Commun. ACM, 31(3):288--298, Mar. 1988.
[12]
Google, Inc. Google BigQuery - cloud platform. https://cloud.google.com/products/big-query/, retrieved June 2013.
[13]
Google, Inc. Google Cloud Storage - cloud platform. https://cloud.google.com/products/cloud-storage/, retrieved June 2013.
[14]
I/O Performance, Inc. Xdd: The extreme dd toolset. https://github.com/bws/xdd, 2013.
[15]
D. Kotz. Disk-directed i/o for mimd multiprocessors. In Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation, OSDI '94, Berkeley, CA, USA, 1994. USENIX Association.
[16]
A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35--40, Apr. 2010.
[17]
Lawrence Livermore National Laboratory. ASC Sequoia. https://asc.llnl.gov/computing_resources/sequoia/, October 2012.
[18]
J. Li, W.-k. Liao, A. Choudhary, R. Ross, R. Thakur, W. Gropp, R. Latham, A. Siegel, B. Gallagher, and M. Zingale. Parallel netcdf: A high-performance scientific i/o interface. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC '03, pages 39--, New York, NY, USA, 2003. ACM.
[19]
Microsoft, Inc. Storage - windows azure service management. http://www.windowsazure.com/en-us/manage/services/storage/, 2013.
[20]
B. C. Neuman. The Prospero file system: A global file system based on the virtual model. Computing Systems, 5:407--432, 1992.
[21]
Oak Ridge National Laboratory. Introducing Titan - the world's no. 1 open science supercomputer. http://www.olcf.ornl.gov/titan/, 2012.
[22]
R. A. Oldfield, P. Widener, A. B. Maccabe, L. Ward, and T. Kordenbrock. Efficient data-movement for lightweight I/O. In Proceedings of the 2006 International Workshop on High Performance I/O Techniques and Deployment of Very Large Scale I/O Systems, Barcelona, Spain, September 2006.
[23]
H. C. Rao and L. L. Peterson. Accessing files in an internet: The Jade file system. IEEE Trans. Softw. Eng., 19(6):613--624, June 1993.
[24]
K. E. Seamons, Y. Chen, P. Jones, J. Jozwiak, and M. Winslett. Server-directed collective i/o in panda. In Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), Supercomputing '95, New York, NY, USA, 1995. ACM.
[25]
B. W. Settlemyer, J. D. Dobson, S. W. Hodson, J. A. Kuehn, S. W. Poole, and T. M. Ruwart. A technique for moving large data sets over high-performance long distance networks. In Proceedings of the 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies, MSST '11, pages 1--6, Washington, DC, USA, 2011. IEEE Computer Society.
[26]
D. Shue, M. J. Freedman, and A. Shaikh. Performance isolation and fairness for multi-tenant cloud storage. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI'12, pages 349--362, Berkeley, CA, USA, 2012. USENIX Association.
[27]
The Apache Software Foundation. Apache Hadoop. http://hadoop.apache.org, 2013.
[28]
M. Wachs, M. Abd-El-Malek, E. Thereska, and G. R. Ganger. Argon: performance insulation for shared storage servers. In Proceedings of the 5th USENIX conference on File and Storage Technologies, FAST '07, pages 5--5, Berkeley, CA, USA, 2007. USENIX Association.
[29]
S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. Ceph: a scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation, OSDI '06, pages 307--320, Berkeley, CA, USA, 2006. USENIX Association.
[30]
S. A. Weil, A. W. Leung, S. A. Brandt, and C. Maltzahn. RADOS: a scalable, reliable storage service for petabyte-scale storage clusters. In Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07, PDSW '07, pages 35--44, New York, NY, USA, 2007. ACM.
[31]
B. Welch, M. Unangst, Z. Abbasi, G. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou. Scalable performance of the Panasas parallel file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies, FAST'08, pages 2:1--2:17, Berkeley, CA, USA, 2008. USENIX Association.
[32]
J. C. Wu and S. A. Brandt. Providing quality of service support in object-based file system. In Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies, MSST '07, pages 157--170, Washington, DC, USA, 2007. IEEE Computer Society.
[33]
X. Zhang, K. Davis, and S. Jiang. QoS support for end users of i/o-intensive applications using shared storage systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 18:1--18:12, New York, NY, USA, 2011. ACM.

Cited By

View all
  • (2018)Enforcing End-to-End I/O Policies for Scientific Workflows Using Software-Defined Storage Resource EnclavesIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2018.28790964:4(662-675)Online publication date: 1-Oct-2018
  • (2018)Budget-Transfer: A Low Cost Inter-Service Data Storage and Transfer Scheme2018 IEEE International Congress on Big Data (BigData Congress)10.1109/BigDataCongress.2018.00022(112-119)Online publication date: Jul-2018
  • (2017)Rethinking key-value store for parallel I/O optimizationInternational Journal of High Performance Computing Applications10.1177/109434201667708431:4(335-356)Online publication date: 1-Jul-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PDSW '13: Proceedings of the 8th Parallel Data Storage Workshop
November 2013
55 pages
ISBN:9781450325059
DOI:10.1145/2538542
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HPC storage
  2. cloud storage
  3. object storage
  4. storage QoS

Qualifiers

  • Research-article

Conference

SC13

Acceptance Rates

PDSW '13 Paper Acceptance Rate 8 of 16 submissions, 50%;
Overall Acceptance Rate 17 of 41 submissions, 41%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Enforcing End-to-End I/O Policies for Scientific Workflows Using Software-Defined Storage Resource EnclavesIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2018.28790964:4(662-675)Online publication date: 1-Oct-2018
  • (2018)Budget-Transfer: A Low Cost Inter-Service Data Storage and Transfer Scheme2018 IEEE International Congress on Big Data (BigData Congress)10.1109/BigDataCongress.2018.00022(112-119)Online publication date: Jul-2018
  • (2017)Rethinking key-value store for parallel I/O optimizationInternational Journal of High Performance Computing Applications10.1177/109434201667708431:4(335-356)Online publication date: 1-Jul-2017
  • (2017)Intelligent Orchestration Agent for Storage Platform ManagementProceedings of the International Conference on Research in Adaptive and Convergent Systems10.1145/3129676.3129705(295-297)Online publication date: 20-Sep-2017
  • (2017)A configurable rule based classful token bucket filter network request scheduler for the lustre file systemProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126932(1-12)Online publication date: 12-Nov-2017
  • (2016)An ephemeral burst-buffer file system for scientific applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014997(1-12)Online publication date: 13-Nov-2016
  • (2016)Secured Large Scale Shared Storage SystemJournal of Communications10.12720/jcm.11.1.93-99Online publication date: 2016
  • (2016)An Ephemeral Burst-Buffer File System for Scientific ApplicationsSC16: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2016.68(807-818)Online publication date: Nov-2016
  • (2014)IO-CopProceedings of the 2014 43rd International Conference on Parallel Processing Workshops10.1109/ICPPW.2014.20(52-60)Online publication date: 9-Sep-2014
  • (2014)Rethinking key-value store for parallel I/O optimizationProceedings of the 2014 International Workshop on Data Intensive Scalable Computing Systems10.1109/DISCS.2014.11(33-40)Online publication date: 16-Nov-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media