An efficient load-balancing mechanism for heterogeneous range-queriable cloud storage☆
Introduction
The cloud storage is becoming increasingly more important for cloud applications, as it can seamlessly handle huge amount of data efficiently [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]. To achieve high scalability required by cloud applications, most of the cloud storage systems employ the non-relational data structures such as key–value and BigTable-like table-based data structure to simplify the relations among data items. This kind of data structure enables cloud service providers to build cloud storage systems by managing a large number of connected commodity machines. In such kind of systems, distributing data items to storage nodes according to their capacities dynamically is very important. Skewed data distribution will decrease the performance and even cause system failure. For example, the total execution time of a MapReduce task mostly depends on the execution time of the node with the heaviest workload, even though most of the other nodes have only a little workload.
Due to the excellent load-balancing characteristics, DHT-based key–value stores become one of the most popular cloud storage technologies. DHT-based key–value store employs consistent Hashing function to transform both the keys of data items and the addresses of storage devices into hash values without semantic meanings, and then distributes data items into different storage devices if their hash values are “close”. Although DHT-based key–value store can balance the workload among nodes simply, range queries cannot be supported efficiently because the data locality is destroyed by consistent Hashing. Specifically, in DHT-based key–value stores, data items with similar semantic meaning are always far away from each other physically (i.e. stored in different physical machines), and therefore, it is difficult to execute range queries which are important for social network and IoT services. Range query is an important functionality as it allows flexible query conditions without specifying exact keys of the data required. Consider that in a key–value store of environment monitoring sensor information, the key is location and the data is sensor description. If a user wants to know the sensor information in a certain region, she can query the system with a fuzzy condition such as “sensors within from northern latitude, north latitude”.
To support range queries, the order of keys must be preserved, which makes load balancing a very challenging task. Fig. 1 shows a simple image of range-queriable cloud storage. We can see that each storage node has a responsible range, and the responsible ranges of different nodes do not overlap. A data item is stored in a specific storage node if and only if its key falls into the responsible range of that node. Due to the skewed data distribution, skewed data insertion and deletion, the responsible ranges of storage nodes have to be adjusted to sustain the load balancing. Some of the existing range-queriable cloud storage systems employ a central management node to manage the metadata information of physical nodes and data items, and make load-balancing decisions [3]. There are also systems that employ decentralized load-balancing methods. In these systems, the storage nodes form a range-queriable PP network to realize self-organizing [2], [4]. The storage nodes exchange load information and make load-balancing decisions independently. For both centralized and decentralized load-balancing methods, most of the existing research employs the combination of NBRADJUST and REORDER as the basic mechanism.
Fig. 2, Fig. 2(a) shows an example of NBRADJUST. With NBRADJUST, an overloaded node can move a portion of its load to a neighbor. In a system with nodes, the time complexity of load balancing with NBRADJUST is because NBRADJUST can only be carried out between neighboring nodes. Due to performance reasons, REORDER is rarely employed as the single load-balancing mechanism, but is always used as complement of the NBRADJUST method. With REORDER method, an underloaded node first executes a NBRADJUST operation to move its load to a neighbor (no matter whether the neighbor is overloaded or underloaded), then leaves the system and joins again as a neighbor of an overloaded node. After the re-joining, the overloaded node can move a portion of its load to this new empty neighbor with a NBRADJUST operation (Fig. 2, Fig. 2(b)). This method suffers mainly from the data movement overhead. Specifically, to share the load with an overloaded node, an underloaded node has to move all its load to a neighbor first. This data movement can be seen as overhead and “unnecessary”. Correspondingly, the time consumed by the “unnecessary” data movement can be considered as time overhead. Even worse, after the underloaded node moves its load to a neighbor, the neighbor may become a new overloaded node, and the load-balancing process has to be carried out iteratively, which may take a lot of time to converge.
In this work, we introduce a novel decentralized virtual node-based load-balancing method, while avoiding NBRADJUST and REORDER. The virtual node-based load-balancing method is widely used in DHT systems, however, the fundamental difference between range-queriable system and DHT system makes our method quite different from the existing work. In our method, we divide each physical node into multiple virtual nodes with identical capacity, and construct range-queriable PP network in virtual node level rather than physical node level. We keep some of the virtual nodes in each physical node filled with data items, and the others empty. By maintaining the load of each non-empty virtual node in a certain range, the load of physical nodes can be approximated as the ratio of non-empty virtual nodes to the total virtual nodes within them. With this approximation, we can consider that a physical node is overloaded if it contains too many non-empty virtual nodes and only a few empty-virtual nodes. To balance the load between an overloaded physical node and an underloaded physical node, we can simply move the load from a non-empty virtual node in the overloaded physical node to an empty virtual node in the underloaded physical node. This mechanism incurs much less data movement overhead than the NBRADJUST and REORDER-based methods in that data movement only happens between overloaded node and underloaded node, and new overloaded node will not be produced as NBRADUST and REORDER-based methods do. Correspondingly, the proposed mechanism can also converge fast. To realize this mechanism, we addressed many challenging problems in this work, including constructing overlay network with virtual nodes to realize self-organization while preserving the order of data items, maintaining the load of non-empty virtual nodes in a certain range locally and dynamically, evaluating the average load of the system in realtime for physical nodes, discovering underloaded physical nodes to share the workload for the overloaded nodes and so on. Our method is completely decentralized, so that it has good scalability for cloud storage systems. Moreover, as our method is based on virtual nodes, it can be applied to heterogeneous environment inherently. Although the research objective of this work is to distribute data items among storage devices to adapt to their capacities, the proposed method can also be extended to be applied to distributing other kinds of workload such as query accessing workload. The proposed method can be applied if only the system has a unique kind of bottleneck.
The rest of the paper is organized as follows. In Section 2, we briefly introduce the related research about load balancing for range-queriable systems, then in Section 3, we present our method in detail, including the load-balancing framework, the key technologies to realize the framework and some theoretical analysis. In Section 4, we present extensive simulation results to show the effect of our method. Some implementation issues are discussed in Section 5, and in Section 6, we conclude the work and discussed our future plan.
Section snippets
Related work
In this section, we give a brief introduction of the load-balancing technologies for range-queriable systems. Different from DHT systems, the load-balancing problem of range-queriable system is very challenging because the order of data items has to be preserved. One of the earliest research was conducted by Aspnes et al. [15]. In their work, the authors described a load-balancing mechanism for skip graphs and similar distributed data structures. The mechanism is based on a global threshold,
Effective load balancing by virtual node swap
In this section, we presented our method in detail, including the overlay-based cloud storage system architecture, the load-balancing framework, the enabling technologies and theoretical analysis.
Performance evaluation
To evaluate the performance of our method, we developed a simulator with Java. In addition to our method, we also implemented the method in [19] for comparison. In the rest of the paper, we refer to the method in [19] as “LoReC” (LOad REbalancing for distribute file systems in Clouds) for convenience. We chose LoReC to compare because it is one of the best methods which can work in both homogeneous and heterogeneous environments. LoReC employed the combination of the NBRADJUST and REORDER
Discussions
In this section, we discuss some important issues in implementation.
The first issue is how to decide the capacity of virtual nodes. On the one hand, using smaller virtual nodes allows load balancing to be carried out in fine granularity because the proposed load-balancing method is an approximate method that measures the physical node capacity and workload in the unit of virtual node. Consider a storage device of GB, if the virtual node capacity is set to GB, then a difference in less
Conclusion and future plan
Nowadays, the range-queriable cloud storage is becoming increasingly important. However, most of the existing load-balancing methods incurs too much overhead which limits the usage of range-queriable technologies. In this work, we present a virtual node based, decentralized load-balancing method for range-queriable cloud storage systems. In our method, we partition physical nodes into multiple virtual nodes, and organize the virtual nodes with a range-queriable PP network. Load balancing is
Acknowledgments
This work has partially been supported by the JSPS KAKENHI Grant Number JP16K16053, the Grant-in-Aid for Young Scientists (B). We would also like to thank the reviewers for their detailed comments and suggestions to improve the work.
Xun Shao received his B.E. degree from Civil Aviation University of China, in 2005, and M.E. degree from Beijing Jiaotong University, China, in 2008. He received his Ph.D. degree in Information Science from Osaka University, Japan, in 2013. Currently, he is a researcher with the National Institute of Information and Communications Technology (NICT), Japan. His research interests include distributed systems and networking. He is a member of IEICE.
References (24)
- et al.
Achieving one billion key-value requests per second on a single server
IEEE Micro Mag.
(2016) - et al.
On the interplay of internet of things and cloud computing: A systematic mapping study
Comput. Commun.
(2016) - et al.
Big data-backed video distribution in the telecom cloud
Comput. Commun.
(2016) - et al.
A model to compare cloud and non-cloud storage of big data
Future Gener. Comput. Syst.
(2016) - et al.
Key based data analytics across data centers considering bi-level resource provision in cloud computing
Future Gener. Comput. Syst.
(2016) - et al.
Building a network-aware and load-balanced peer-to-peer system for range queries
Comput. Netw.
(2012) - et al.
Range queries on structured overlay networks
Comput. Commun.
(2008) - X. Shao, M. Jibiki, Y. Teranishi, N. Nishinaga, Effective load balancing mechanism for heterogeneous range queriable...
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, W....
- Hadoop Distributed File System. URL...
An IoT-oriented data storage framework in cloud computing platform
IEEE Trans. Ind. Inf.
Cited by (24)
A multistage protocol for aggregated queries in distributed cloud databases with privacy protection
2019, Future Generation Computer SystemsCitation Excerpt :It tackled the multi-user setting by incorporating multi-party searchable encryption resisting collusion attacks. An effective mechanism for enabling heterogeneous rangequeries in cloud storage was proposed in [33]. In the method, each physical node is partitioned into multiple virtual nodes and all the virtual nodes are organized into range-queriable network.
A self-organized resource provisioning for cloud block storage
2018, Future Generation Computer SystemsCitation Excerpt :It is combined with various extensions that are often not adjustable within the data layer such as security and redundancy [2]. Also, the rise of IoT (Internet of Things) increased the complexity of scheduling and load-balancing for the sheer number of requests [3]. Hence, new ways of thinking and architectures are required to manage the heterogeneous storage systems [4].
MetaGON: A Lightweight Pedestrian Re-Identification Domain Generalization Model Adapted to Edge Devices
2024, IEEE Open Journal of the Communications SocietyData storage and range queries in ubiquitous mobile data cloud
2023, Journal of Ambient Intelligence and Humanized ComputingA COMPREHENSIVE SURVEY ON LOAD BALANCING TECHNIQUES FOR VIRTUAL MACHINES
2023, System Research and Information Technologies
Xun Shao received his B.E. degree from Civil Aviation University of China, in 2005, and M.E. degree from Beijing Jiaotong University, China, in 2008. He received his Ph.D. degree in Information Science from Osaka University, Japan, in 2013. Currently, he is a researcher with the National Institute of Information and Communications Technology (NICT), Japan. His research interests include distributed systems and networking. He is a member of IEICE.
Masahiro Jibiki received the Ph.D. degree in system management from University of Tsukuba, Japan, in 2003. Since he joined NEC Corporation in 1992, he has been working as a researcher in the Central Research Laboratories. From 2006 to 2009, he was also a visiting professor at the University of Wakayama, Japan . Currently he holds the post of an expert researcher in the National Institute of Information and Communications Technology (NICT), Japan. His research interests include networking, distributed systems, and software science. He is a member of IEICE.
Yuuichi Teranishi received his M.E. and Ph.D. degrees from Osaka University, Japan, in 1995 and 2004, respectively. From 1995 to 2004, he was engaged Nippon Telegraph and Telephone Corporation (NTT. From 2005 to 2007, he was a Lecturer of Cybermedia Center, Osaka University. From 2007 to 2011, He was an associate professor of Graduate School of Information Science and Technology, Osaka University. Since August 2011, He has been a research manager and project manager of National Institute of Information and Communications Technology (NICT), Japan. His research interests include technologies for distributed network systems and applications. He is a member of IPSJ, IEEE.
Nozomu Nishinaga received his B.S. and M.S. in Electronics Engineering and his Ph.D. in Information Engineering from Nagoya University, Japan in 1994, 1996, and 1998, respectively. From November 1998 to March 1999, he was a research assistant at the Information Media Education Center, Nagoya University. From 1999 to the present, he has been a researcher with National Institute of Information and Communications Technology (NICT), Japan. Since April 2011, He is director of New Generation Network Laboratory, Network Research Headquarters. His current research interests include Internet architecture and wireless communications. He is a member of IEICE.