Elsevier

Computers in Industry

Volume 100, September 2018, Pages 21-30
Computers in Industry

WebGlusterFS: A web-based administration tool for GlusterFS with resource assignment for various storage demands

https://doi.org/10.1016/j.compind.2018.04.001Get rights and content

Highlights

  • A web-based administration tools is designed and implemented to ease the management of GlusterFS.

  • A framework is setup to automatically assign resource according to various workload demands.

  • Two algorithms are integrated into the tools to allocate the resource to build volumes with lowest cost for two types of workloads.

Abstract

Facing the complex tasks involves making decisions about assignment of workloads to storage backends as well as dynamic and timely adjustment according to the storage demands in Cloud and Big-data environment, an administration tool for GlusterFS, WebGlusterFS, is presented in this article to ease the management and help to assign the storage resource. WebGlusterFS is a web-based tool designed to substitute the command line console manager of GlusterFS and provides an interface for auto-assignment module to build volumes from heterogeneous backend devices. A simple demo module is also implemented to show how various storage demands are fulfilled by building the volumes from properly matched storage resource with minimum cost. The characteristics of underlying storage resource are obtained by benchmarking and used to make the assignment decision. WebGlusterFS setups a base framework for workload aware storage platform for large scale computing environments.

Introduction

With the rapid development of information technology, enormou volumes of data or called “big data” is being generated by many enterprises at all time. Scientific applications, weather forecasting, researches, hospitals, gene information processing and military services are few such major contributors. The need to provide efficient, easy to use solutions has become one of the main issues for these types of computations. The prosperous solution to this issue is the use of Distributed File Systems (DFS) or cloud storage.

There are a number of opensource solutions like GlusterFS, Ceph [3], HDFS [4], Swift [5] and vendor specific solutions like EMC ViPR [6], NetApp ONTAP [7], IBM Virtual Storage [8], etc. All these DFSs are qualified to the basic requirements of cloud storage. The most typical feature of cloud storage involves SAAS (storage as a service) [2], which makes the providing of storage as a service to the users. So, the system need to provide administrators and users with convenient operating and maintaining environment. Besides the easy to manage, there is a need to auto-configure and reconfigure the storage resource easily to provide the storage serve to meet the demand of on-going applications.

Storage administration and data management are challenging and expensive tasks [16], particularly in cloud and Big Data environments where resources are shared among multiple workloads and accessed by various patterns [9,10]. One major and complex portion of these tasks involves making decisions about assignment of workloads to storage backend as well as dynamic and timely adjustment according to changing demands in cloud environments [11]. The assignment decision can be made based on the detailed description of workload characteristics and the performance factors of the underlying storage resource. There are all beyond the scope of these DFSs. So, we need an enhancement to cooperate the workload classification, backend characterization and the resource assignment of the exist DFSs to handle the variety of storage demands from Big-data applications.

In this article, a web-based management tool for GlusterFS is designed and implemented to help managing the storage resource and provide the possibility to make the resource decision by considering the workload requirements and backend characterization. GlusterFS is a scalable open source parallel file system that offers a global namespace, distributed front end, and capable of scaling to hundreds of petabytes without difficulty. It also offers extraordinary cost advantages benefits that are unmatched in the industry. The advantages that make it an ideal cloud storage are as follows [1]: 1) High scalable. Cloud storage can be expanded according to the demand of applications. 2) High reliable and available. Cloud storage will automatically back up the data to software/hardware failures and capable of disaster recovery. 3) Resource controllable. It is capable of controlling the access permissions of resources. 4) High utilization. Cloud storage can consolidate all the storage resources and provide a unified access interface to users. 5) Cost effective. The use of cloud storage can dramatically reduce the cost to running a data center for enterprises, and reduce the need for removable storage devices and efficiently lower the cost of individual users and businesses. GlusterFS has indeed a large user base both in HPC computing farms, and in several Cloud computing facilities. It supports access to storage both in terms of POSIX file-system and via a REST gateway for object storage support. It is worth simplifying the management and enable the workload awareness.

The rest of the paper is organized as follows. Section 2 presents the background of the key challenges and the system architecture of GlusterFS. Section 3 gives the details of designing and implementing the web-based management tool of GlusterFS to ease the management. A scheme is figured out to provide an interface to create volumes based on the workload classification and the performance parameters. Section 4 details two algorithms to assign resource with lowest cost. Section 5 shows some the prototype of this system and the experiment results. We conclude in Section 6.

Section snippets

The heterogeneous distributed storage systems

The heterogeneity of distributed storage systems relies both in hardware and software.

The nodes of the distributed storage system are likely to become heterogeneous which means different specifications [19]. The first case is a system expansion. In general, specifications of new storage nodes for the expansion may be different from storage nodes used in the existing system. Because computer technologies may evolve during the long time from the day of system installation to the day of system

A web-base management tool for GlsuterFS

The GlusterFS is unable to build a volume automatically based on an input of workload demand presently. The enforcement can be made by modifying the source code of GlusterFS, or by an upper layer software through the administration interface of GlusterFS. We choose the latter one to fulfill the storage resource assignment, which will not introduce any bugs to GlusterFS and is easy to design and debug. As the command line management tool are not an friendly and convenient, we designed one

Resource assignment module

As the variety of storage demands from Big-data applications, the storage resource allocation is impossible to be figured out by one algorithm with same performance parameters. The lack of an ideal modal to describe the whole distributed storage system make it difficult to design a universal allocation algorithm. One more feasible method is to deal the each type of workload individually with the modal of proper accuracy. Two assignment modules are provide in this section to show how this

The basic management function

The WebGlusterFS enable the administrator to manage GlusterFS cluster in GUI style webpage as Fig. 9 shows. There are six modules in the web shell, which are GlusterFS cluster management, volume management, user management, performance profiling, disaster recovering and raw server initialization.

The right frame of Fig. 9 is about storage resource exploring. The IP range is set by the text inputs on the top. The reachable nodes are listed in the rightmost column. The alive GlusterFS nodes are

Conclusions

In this paper, the WebGlusterFS for the console management tool of GlusterFS is provided. By encapsulating the commands in http requests, the WebGlusterFS performs all the management operations. The management operation is first captured in the PHP frontend with GUI, and relayed by the PHP backend to the server side via ssh protocol. By detecting and registering the underlying storage resource, this framework provides the possibility of assigning the proper resource to the storage demands

Acknowledgments

The research was jointly supported by project granted from Shenzhen Science Technology Foundation: JCYJ20170302153920897/JCYJ20150930105133185/JCYJ20150324140036842, Guangdong Pre-national Project 2014GKXM054, and Guangdong Natural Science Foundation: 2017B030314073/2016A030313036.

Qiuming Luo is an associate professor in the College of Computer Science and Software Engineering at Shenzhen University. His research interests include high-performance computing and OS design. Luo received a PhD in computer architecture from Huazhong University of Science and Technology. He is a member of the China Computing Federation (CCF).

References (29)

  • P. Zhou et al.

    ECStor: a flexible enterprise-Oriented cloud storage system based on GlusterFS[C]//Advanced cloud and big data (CBD)

    2016 International Conference on IEEE

    (2016)
  • Wikipedia, Storage as a service [EB/OL]. 2017-4-28,...
  • Ceph,...
  • Hadoop Distributed File System, [EB/OL]. 2017-4-28,...
  • Openstack Swift, [EB/OL]. 2017-4-28,...
  • EMC ViPR, [EB/OL]. 2017-4-28,...
  • NetApp ONTAP, [EB/OL]. 2017-4-28,...
  • IBM Virtual Storage, [EB/OL]. 2017-4-28, http://www-03....
  • Rui Mao et al.

    Overcoming the challenge of variety: big data abstraction, the next evolution of data management for AAL communication systems

    IEEE Commun. Mag.

    (2015)
  • Rui Mao et al.

    Pivot selection for metric-space indexing

    Int. J. Mach. Learn. Cybern.

    (2016)
  • B. Tremblay et al.

    A workload aware storage platform for large scale computing environments: challenges and proposed directions

    Proceedings of the ACM 7th Workshop on Scientific Cloud Computing ACM

    (2016)
  • Z. Thusoo et al.

    Data warehousing and analytics infrastructure at facebook

    Proceedings Ofthe 2010 ACM SIGMOD International Conference on Management of Data

    (2018)
  • R. Zhang et al.

    IO Tetris: deep storage consolidation for the cloud via fine-grained workload analysis. Cloud Computing (CLOUD)

    2011 IEEE International Conference on IEEE

    (2011)
  • Z. Yang et al.

    Dynamic SSD resource allocation in virtualized storage systems with heterogeneous VMs

    35th IEEE International Performance Computing and Communications Conference (IPCCC) IEEE

    (2016)
  • Cited by (2)

    Qiuming Luo is an associate professor in the College of Computer Science and Software Engineering at Shenzhen University. His research interests include high-performance computing and OS design. Luo received a PhD in computer architecture from Huazhong University of Science and Technology. He is a member of the China Computing Federation (CCF).

    Cuiping Zhu is a graduating class student in College of Computer Science and Software Engineering at Shenzhen University. She is going to further study in a university in HongKong.

    Gang Liu is a faculty in the College of Computer Science and Software Engineering at Shenzhen University. His research interests include high-performance computing and network-on-chip(NoC). Liu received a PhD in computer architecture from University of Science and Technology of China. He is a member of the China Computing Federation (CCF).

    Rui Mao is a professor in the College of Computer Science and Software Engineering at Shenzhen University. His research interests include high-performance computing. Mao received a PhD in computer architecture from University of Science and Technology of China. He is a Chairmen of the CCF YOCSEF @Shenzhen).

    View full text