Abstract
Storage pooling is a virtualization technique used in data centers to build upgradeable storage pools and to face up the explosive growth of information. In this technique, a randomized data distribution strategy (DDS) ensures the load balancing when adding new devices to the pool by using reallocation mechanisms. However, when applying fault-tolerant schemes to the storage pools, the system produces r redundant objects from a common data source and DDS must allocate them in different devices, which increases the complexity of the reallocation operations performed during the upgrade procedures. This paper presents RS-Pooling: an adaptive DDS for fault-tolerant and large-scale storage systems. RS-Pooling builds storage pools by grouping devices into disjointed sub-pools and ensures the effectiveness of fault-tolerant schemes by performing the allocation of redundant objects from a common data source in different sub-pools. In RS-Pooling, the first redundant object is allocated in random manner whereas the rest of them are allocated by using a cyclic list of sub-pools, this procedure minimizes the amount of reallocation operations, and fosters load balancing. We performed an emulation-based evaluation of RS-Pooling and a traditional DDS for storage pooling called RUSHp. The evaluation reveals that RS-Pooling improves the time efficiency of look up operations compared to that obtained from RUSHp. The evaluation also shows that, in upgrade procedures and regardless of the initial settlement, RS-Pooling requires significantly less reallocation operations than that of RUSHp for load balancing of fault-tolerant storage pools.










Similar content being viewed by others
References
Gantz J, Reinsel D (2011) Extracting value from Chaos. IDC and EMC Report
Gantz J, Reinsel D (2012) The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. IDC and EMC Report
Brinkmann A, Heidebuer M, Meyer auf der Heide F, Ruckert U, Salzwedel K, Vodisek M (2004) V-Drive: costs and benefits of an out-of-band storage virtualization system. In: Proceedings of the 12th NASA Goddard, 21st IEEE conference on mass storage systems and technologies (MSST), pp 153–157
Gonzalez JL, Marcelin-Jimenez R (2011) Phoenix: a fault-tolerant distributed web storage based on URLs. In: IEEE 9th international symposium parallel and distributed processing with applications (ISPA), pp 282–287
Weber RO (2002) Information Technology SCSI object-based storage device commands (OSD); Technical Council Proposal Document T10/1355-D, Technical Committee T10
Honicky RJ, Miller E (2003) A fast algorithm for online placement and reorganization of replicated data. In: Proceedings of the parallel and distributed processing symposium, pp 10, 22–26
Miranda A et al (2011) Reliable and randomized data distribution strategies for large scale storage systems. In: Proceedings of the HiPC ’11, pp 1–10
Ghemawat S, Gobioff H, Leung ST (2003) The Google file system. In: Proceedings of the nineteenth ACM symposium on operating systems principles, pp 29–43
Quezada Naquid M., Jimenez RM, Gonzalez Compean JL (2014) The babel file system. In: Big data (bigdata congress), 2014 IEEE international congress on, pp 234–241, June 27 2014–July 2 2014
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: Proceedings of the 26th IEEE transactions on computing symposium on mass storage systems and technologies (MSST ’10)
Rabin MO (1989) Efficient dispersal of information for security, load balancing and fault tolerance. J ACM 36(2):335–348
Cleversafe (2013). http://www.cleversafe.com/. Accessed 07/06/2015
Brinkmann A, Effert S (2008) Redundant data placement strategies for cluster storage environments. In: Proceedings of conference on principles of distributed systems (OPODIS), pp 551–554
Brinkmann A, Effert S, auf der Heide FM, Scheideler C (2007) Dynamic and redundant data placement. In: Proceedings of the 27th IEEE international conference on distributed computing systems, 29
Weil SA, Brandt SA, Miller EL, Long DDE, Maltzahn C (2006) Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th symposium on operating systems design and implementation (OSDI ’06), pp 307–320
Quezada Naquid M, Marcelín Jiménez R, López Guerrero M (2007) Balance tradeoff in a distributed storage system. Computación y Sistemas 14(2):151–163
Gonzalez JL, Carretero Perez J, Sosa-Sosa V, Rodriguez Cardoso JF, Marcelin-Jimenez R (2013) An approach for constructing private storage services as a unified fault-tolerant system. J Syst Softw 86(7):1907–1922
Spillner J, Chaichenko A, Brito A, Brasileiro F, Schill A (2014) Cloud resource recycling: an addition of species to the zoo of virtualised, overlaid, federated, multiplexed and nested clouds. SDPS Trans J Integr Des Process Sci (JIDPS) 18(1):5–19
Spillner Josef, MuLler Johannes, Schill Alexander (2013) Creating optimal cloud storage systems. Future Gener Comput Syst 29(4):1062–1072
Jiang H, Shen F, Chen S, Li K-C, Jeong Y-S (2015) A secure and scalable storage system for aggregate data in IoT. Future Gener Comput Syst 49:133–141. ISSN: 0167-739X
Symform (2015). http://www.symform.com/. Accessed 07/06/2015
Gonzalez JL, Jesus Carretero Perez, Sosa-Sosa Victor J, Sanchez Luis M, Borja Bergua (2015) SkyCDS: a resilient content delivery service based on diversified cloud storage. Simul Model Pract Theory 54:64–85 ISSN 1569-190X
Swift (2015). http://docs.openstack.org/developer/swift/. Accessed 07/06/2015
Ceph (2007) Reliable, scalable, and high-performance distributed storage. Sage A. Weil. Ph.D. thesis, University of California, Santa Cruz
EMCVflex (2015). http://www.emc.com/campaign/global/vplex/index.htm. Accessed 07/06/2015
Welch B et al (2008) Scalable performance of the panasas. Parallel file system. In: Proceedings of the 6th USENIX conference on file and storage technologies, pp 1–17
Patterson DA, Gibson G, Katz RH (1988) A case for redundant arrays of inexpensive disks (RAID). In: Boral H and Larson P-A (eds) Proceedings of the 1988 ACM SIGMOD international conference on management of data (SIGMOD ’88), pp 109–116
Gonzalez JL, Cortes T (2008) Distributing orthogonal redundancy on adaptive disk arrays. On the move to meaningful internet systems: OTM 2008. Lecture notes in computer science 5331, pp 914–931
Miranda A, Cortes T (2014) CRAID: online RAID upgrades using dynamic hot data reorganization. Usenix FAST 2014:133–146
Chou CF, Golubchik L, Lui JCS (2000) Striping doesn’t scale: How to achieve scalability for continuous media servers with replication. University of Maryland, 34
Goel A, Shahabi C, Yao S-YD, Zimmermann R (2002) SCADDAR: an efficient randomized technique to reorganize continuous media blocks. In: Proceedings of the 18th international conference on data engineering, pp 473–482
Seo Beomjoo, Zimmermann Roger (2005) Efficient disk replacement and data migration algorithms for large disk subsystems. ACM Trans Storage (TOS) 1(3):316–345
Karger D, Lehman E, Leighton T, Levine M, Mitzenmacher M, Upfal E (2005) Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, Cambridge
Lewin D, Panigrahy R (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the 29th ACM symposium on theory of computing (STOC), pp 654–663
Mitzenmacher M (1996) The power of two choices in randomized load balancing. PhD thesis, Computer Science Department, University of California at Berkeley
Honicky RJ, Miller EL (2004) Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution. In: Proceedings of the parallel and distributed processing symposium, pp 26–30
Mense M, Scheideler C (2008) Spread: an adaptive scheme for redundant and fair storage in dynamic heterogeneous storage systems. In: Proceedings of the 19th ACM-SIAM symposium on discrete algorithms (SODA)
DADISI (2013). http://dadisi.sourceforge.net/?s=1
Sood A, James GM, Tellis GJ, Zhu J (2012) Predicting the path of technological innovation: SAW vs. Moore, Bass, Gompertz, and Kryder. Mark Sci 31(6):964–979
Brinkmann MA, Salzwedel K, Scheideler (2000) Efficient, distributed data placement strategies for storage area networks. In: Proceedings of the 12th ACM symposium on parallel algorithms and architectures (SPAA), pp 119–128
Gonzalez JL, Cortes T (2004) Increasing the capacity of RAID5 by online gradual assimilation. In: Proceedings of the international workshop on Storage network architecture and parallel I/Os (SNAPI ’04). ACM, pp 17–24
Yao SD, Shahabi C, Zimmermann R (2008) BroadScale: efficient scaling of heterogeneous storage systems. International Journal of Digital Libraries (IJDL), Special Issue on Multimedia Contents and Management in Digital Libraries, Springer GmbH, ISSN: 1432-5012 (Paper), 1432-1300 (Online) 6(1): 98-111
Kai Y, Yuxiang G, Peng Z., Meikang Q (2015) Design and Architecture of Dell Acceleration Appliances for Database (DAAD): A Practical Approach with High Availability Guaranteed In proceedings of 2015 IEEE 17th IHPCC-ICESS-CSS, New York, USA
Acknowledgments
This work has been funded by scholarship from CONACYT and UAM (Mexico). Authors express their gratitude to the anonymous referees that helped to improve the quality of this paper with their invaluable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
R. Marcelín-Jiménez also collaborates with the research team at the “Centro de Investigación en Tecnologías de la Información y Comunicación: INFOTEC”.
Rights and permissions
About this article
Cite this article
Quezada-Naquid, M., Marcelín-Jiménez, R., Gonzalez-Compeán, J.L. et al. RS-Pooling: an adaptive data distribution strategy for fault-tolerant and large-scale storage systems. J Supercomput 72, 417–437 (2016). https://doi.org/10.1007/s11227-015-1569-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1569-7