VM instance selection for deadline constraint job on agent-based interconnected cloud

doi:10.1016/j.future.2018.04.017

Future Generation Computer Systems

Volume 87, October 2018, Pages 470-487

https://doi.org/10.1016/j.future.2018.04.017 Get rights and content

Highlights

•
Agent framework is used to create and manage the interconnected cloud.
•
Using Rough Set Theory to predict job’s execution time.
•
Using deadline and execution time to select and adjust the computing resource.

Abstract

In recent years, users have been buying computing resources, such as VM instances, from the cloud resource providers. However, every cloud has its pricing models, solutions, and interfaces. Without a united interface, it is hard to manage across different clouds. To make things worse, it is even hard to select proper VM instances among multiple cloud resource providers when the resources are not sufficient. This paper proposes an interconnected cloud of selecting VM instances based on the job’s deadline constraint. To make the system function, five agents are created for different purposes: a System Monitoring Agent, a Job Dispatching Agent, an Instance Group Managing Agent, an Instance Group Administrating Agent, and a Job Executing Agent. This paper also presents two decision algorithms, the Job Dispatching Algorithm and the Instance Group Invocation Algorithm, based on the Rough Set Theory, the jobs’ deadline constraint, and the computing resource of the VM instances. The proposed system was deployed and evaluated on the real machines. The result shows not only the VM instances on different machines are interconnected, but also the algorithms successfully help manage the interconnected cloud’s computing resources according to the submitted jobs with deadline constraints.

Introduction

The technology of cloud computing [1] has become more mature in the past few years. Cloud computing provides simple, flexible, and quicker ways to allow consumers to obtain required computing resources. Base on the deployment, there are three kinds of cloud model: the private cloud, the public cloud, and the interconnected cloud.

In general, the private cloud owner exploits virtualization technology, such as Xen [2], Hyper-V [3], or VMWare [4], to efficiently manage and utilize the resources. However, the private cloud owner needs a big budget and effort to run the platform. When the existing computing resource is not enough to serve the extra workload on demand [5], the owners are likely to buy additional computing resources from public cloud providers, such as Amazon Elastic Compute Cloud [6], GoGrid [7], or HiCloud [8]. The computing resource usually charges as the pay-as-you-go model. The interconnected cloud [9], or hybrid cloud [1], is necessary to federate multiple cloud environments [10]. The interconnected cloud can inherit the advantages from both the private cloud and the public cloud. An additional benefit is to help reduce hardware cost and improve dynamic resource provisioning [[11], [12]]. To conclude, the interconnected cloud can host computing resource and share the load in private cloud at the same time. For example, if a user submits a job with a deadline constraint to a private cloud that cannot be completed before the deadline, the task can be dispatched and executed in the public cloud to meet the deadline constraint.

To deal with the problem that different Cloud providers have different kinds of Virtual Machine (VM) instance types, pricing models, and management interfaces, a Cloud Broker can be used. A Cloud Broker provides a unified abstraction layer for the user and developers to manage all the computing resources under one interface. In addition, the Cloud Broker also manages the VM instances on different cloud providers and collects useful information, such as the specification of the VM instance, the pricing model, availability, etc., from every Cloud provider [1]. OpenStack [13] is one of the most well-known solutions to deploy a federated cloud. Some academic studies or projects such as Aneka [[14], [15], [16], [17], [18], [19], [20], [21]], the Reservoir project [[10], [22]], and other projects [[23], [24], [25], [26], [27], [28]] also addresses this issue.

Most of the interconnected cloud systems only provide hypervisors that connecting each VM instance. Connecting the VM instances only makes the brawn (the computing ability) becomes powerful. Without the brain (the smart management), it is hard to show the elastic of the brain [24]. To give the brawn with a powerful brain, the agent technology [[5], [29], [30]] can be exploited. With the agents’ knowledge, the amount of the computing resource of the interconnected cloud can be adjusted dynamically according to the status of the environment.

Another issue is that the brokers can only manage the instances on multiple clouds. All the decisions, such as adding some VM instances to the cloud, are decided by the user. Without proper knowledge, it is not easy to determine which VM instance should be activated. When the job has a deadline constraint, how to choose a suitable VM instance becomes a complicated task [[10], [25]]. If the deadline constraint job is assigned to a VM instance that does not have enough computing resource, the constraint will be violated. However, if it is assigned to a VM instance with more computing resources, the cost of renting the VM instance will increase.

This paper proposes an agent-based interconnected cloud model with a system that can react according to the events and the interconnected cloud’s environments. The overview of the model is shown in Fig. 1, which consists of three layers: the Federation Layer, the Cloud Infrastructure Layer, and the Data Center Layer. The bottom layer of the model is the Data Center Layer, where all the source data and the results are stored. All of the VM instances used in the interconnected cloud can access the data centers. In this paper, we assume that there is no concurrency problem of any files. That is, each job can only read data from the data centers; the results are written to different files. Tasks cannot update or delete any existing file in the data center. Another assumption is that the source data’s transfer time is not considered. These assumptions help the paper focus on the proposed model.

The Cloud Infrastructure Layer is the second layer of the proposed model, where all the private and public clouds belong to. Each cloud consists of multiple VM instances with different capabilities (e.g., different numbers of core processor, RAM sizes, and Internet speeds). Users can control the VM instances through the VM Instance Hypervisor provided by the resource provider.

The last layer is the Federation Layer, which is located at the top of the interconnected cloud model. A centralized server is set up to manage the computing resource in the interconnected cloud. A knowledge base is also installed on the server to help the agents to make decisions. The server also provides an interface that allows administrators to monitor and submit jobs to the interconnected cloud.

When a user submits a job to the interconnected cloud via the server in the federation layer, the agents will redirect the job to one of the VM Instances to execute the job. The job may have a deadline; the user can assign a deadline to a job when submitting it. If the agents in the federation layer notice that some jobs are about to violate the deadline, the agent will start more VM instance. On the other hand, if there are too many idle VM instances, some of them will be closed. According to the knowledge base in the federation layer, the agents are capable of determining which VM instance to be started or to be stopped.

The contribution of the paper can be stated as follows.

1.
An agent framework is used to create and to manage the interconnected cloud.
2.
Using the technique of the Rough Set Theory [[23], [31], [32], [33], [34], [35]] to estimate the execution of the job according to its condition attributes and the execution environment.
3.
Allowing jobs with deadline constraints. If the job’s deadline constraint is detected to be violated, additional VM instances will be started to scale out the computing resource of the interconnected cloud.

Most of the works share the same consumption: the algorithm knows the jobs’ execution time when the jobs are submitted to the system. This is not suitable for the real cloud environment. Different sizes of jobs may need different durations of execution time. Even though the average execution time of the tasks may be a good solution, it would be better if more parameters can be put into consideration. In our proposed work, the data size of a job is defined by the user, and it is treated as an attribute when the execution time is estimated. When the job cannot find any similar historical jobs, the average execution time of the historical jobs can be treated as its estimated execution time.

The following sections of the paper are organized as follows. In Section 2, related works about the interconnected cloud and the deadline aware resource provisioning models are presented. Section 3 describes the design of the agents in this work. The section also presents the interactions between the agents. In Section 4, how to approximate a job’s execution time by using the Rough Set Theory is shown. The main algorithms are also described in Section 4. Section 5 displays the implementation of the system. The evaluation of the proposed interconnected cloud system is shown in Section 6. The final section is the conclusion of this paper.

Section snippets

Interconnected cloud model

With the interconnected cloud mode, it allows users to federate multiple clouds and control them under one interface. Toosi et al. mentioned six motivations to enable the cloud interoperability [9]. The interconnected cloud allows users to use multiple cloud resources; therefore, the availability and the scalability of the cloud are better compared with those of the single cloud. Users can have more options with a reasonable cost. The other benefit from enabling the cloud interoperability is to

Agent-based Federated Broker

The proposed interconnected cloud consists of three layers: the Federated Layer, the Cloud Infrastructure Layer, and the Data Center Layer. Fig. 1 shows the overview of the system architecture; Fig. 2 presents detailed parts of each layer. The federated layer is the heart of the interconnected cloud system. In the federated layer, there are three agents: the System Monitoring Agent (SMA), the Job Dispatching Agent (JDA), and the Instance Group Managing Agent (IMA). Section 3.3 depicts the

Execution time approximation

Rough Set Theory is a method that can help find similar objects from the observed objects. In this section, the method of approximating a job’s execution time is described. The procedures of how to conduct the Rough Set Theory are shown in Section 4.1.

In this work, an attribute database is used to make the Rough Set Theory have more precise predictions. Instead of using a list of predefined attributes to describe the job, this work allows users to add more useful attributes to describe the

Implementation

Fig. 9 shows the web interface of the system. Not only can the users obtain the usage information of the system, but they are also able to submit a new job to the system. The interface is mainly divided into two columns. The left column shows each IG’s usage information and each job’s status. The right column shows the form that allows the users to submit a job via the web interface.

The usage information, in the left column, displays three major kinds of information of the system: the usage

Evaluation and results

In this section, some evaluation of the interconnected cloud system is conducted. In Section 6.1, the jobs used to evaluate the system are described. Section 6.2 depicts the evaluation environment. The details of the evaluations and their results are shown in Section 6.3.

Conclusions and future works

The single cloud infrastructure may face the issues like the lack of scalability and availability. Interconnected cloud provides the key to solve the issue. Users can use multiple clouds based on his strategy. When the system meets the resource shortage, one can buy resource from the other clouds, and transfer some workload to them. The cases of resource shortage are easily seen if the system allows executing multi-constraint jobs. Deadline constraint job is one of the jobs that are easily

Acknowledgment

This work was supported by the Ministry of Science and Technology of Taiwan, The Republic of China under Grant No. 105-2221-E-305-010-. This work was also supported by the National Taipei University under Grant No. 106-NTPU_A-H&E-143-001 and 107-NTPU_A-H&E-143-001. First, we would like to express our deep appreciation to all anonymous reviewers for their kind comments. In addition, we also would like to express our appreciation to all efforts of open source software development, such as JADE

Chih-Tien Fan received the M.Sc. degree in computer science and information engineering from National Taipei University, New Taipei city, Taiwan, in 2010. He is a graduate student of Ph.D. program. His research interests are in ubiquitous and intelligence computing, cloud computing and agent technology.

References (46)

CalheirosR.N. et al.
The aneka platform and QoS-driven resource provisioning for elastic applications on hybrid clouds
Future Gener. Comput. Syst.
(2012)
VecchiolaC. et al.
Deadline-driven provisioning of resources for scientific applications in hybrid clouds with aneka
Future Gener. Comput. Syst.
(2012)
MateescuG. et al.
Hybrid computing—where HPC meets grid and cloud computing
Future Gener. Comput. Syst.
(2011)
MalawskiM. et al.
Cost minimization for computational applications on hybrid cloud infrastructures
Future Gener. Comput. Syst.
(2013)
WalczakB. et al.
Rough set theory
Chemometrics Intell. Lab. Syst.
(1999)
WalczakB. et al.
Rough sets theory
Chemometrics Intell. Lab. Syst.
(1999)
P. Mill, T. Grace, The NIST Definition of Cloud Computing, NIST special publication 800.145,...
The Xen Project, the powerful open source industry standard for virtualization. Available:...
Virtualization for your modern datacenter and hybrid cloud ∣ Microsoft. Available:...
VMware Virtualization for Desktop & Server, Application, Public & Hybrid Clouds ∣ United States. Available:...

C.T. Fan, Y.S. Chang, W.J. Wang, S.M. Yuan, Execution time prediction using rough set theory in hybrid cloud, in: The...

AWS ∣ Amazon Elastic Compute Cloud (EC2) - Scalable Cloud Hosting, Available:...

Run Big Data on SSD, Raw Disk, High RAM, or Dedicated Servers ∣ GoGrid, Available:...

Chunghwa Telecom hicloud Services, Available:...

ToosiA.N. et al.

Interconnected cloud computing environment: Challenges, taxonomy, and survey

ACM Comput. Surv.

(2014)

RochwergerB. et al.

The reservoir model and architecture for open federated cloud computing

IBM J. Res. Dev.

(2009)

WangW.J. et al.

Adaptive scheduling for parallel tasks with QoS satisfaction for hybrid cloud environment

J. Supercomput.

(2013)

ChangY.S. et al.

Cost evaluation on building and operating cloud platform

Int. J. Grid High Perform. Comput.

(2013)

Home » OpenStack Open Source Cloud Computing Software, Available:...

ChuX. et al.

Aneka: Next-generation enterprise grid platform for e-science and e-business applications

VecchiolaC. et al.

SukumarK. et al.

The structure of the new IT Frontier: Aneka platform for elastic cloud computing applications

WeiY. et al.

Cited by (4)

Shared data-aware dynamic resource provisioning and task scheduling for data intensive applications on hybrid clouds using Aneka
2020, Future Generation Computer Systems
Citation Excerpt :
Proposed algorithm resulted in better results as compared with default FCFS algorithm. Fan et al. [16] used agent based framework for selection of appropriate VM instances from multiple Cloud providers based on the deadline of the job. They consider network bandwidth partially while selection of appropriate public Cloud provider but do not consider data transfer time in their algorithm.
In the recent years, data-intensive applications have been growing at an increasing rate and there is a critical need to solve the high-performance and scalability issues. Hybrid Cloud Computing paradigm provides a promising solution to harness local infrastructure and remote resources and provide high Quality of Service (QoS) for time sensitive and data-intensive applications. Generally, hybrid cloud deployments have a heterogeneous pool of resources and it becomes a challenging task to efficiently utilize resources to provide optimum results. In modern data hungry applications, it is crucial to optimize bandwidth consumption, latency and networking overheads. Moreover, most of them have large extent of file sharing capability. The existing algorithms do not explicitly consider file sharing scenarios that leads large data transmission times and has severe effects on latency. In this direction, this paper focuses on building upon existing dynamic resource provisioning and task scheduling algorithms to provide better QoS in hybrid cloud environments for data intensive applications in a shared file task environment. The efficiency of proposed algorithms is demonstrated by deploying them on Microsoft Azure using Aneka, a platform for developing scalable applications on the Cloud. Experiments using real-world applications and datasets show that proposed algorithms are able to allocate tasks and extend to public cloud resources more efficiently, reducing deadline violations and improving response times to give response time reduction of upto 40.12% for a sample local alignment search application on genome sequences.
Emerging trends, issues and challenges in Internet of Things, Big Data and cloud computing
2018, Future Generation Computer Systems
Although Big Data, IoT and cloud computing are three distinct approaches that have evolved independently, they are becoming more and more interconnected over time. The convergence of IoT, Big Data and clouds provides new opportunities and results in development of new applications in many fields, including business, healthcare, sciences and engineering. At the same time, various challenges are faced during processing and management of massive amounts of data, as well as during their storage in cloud environments. This special issue presents novel research approaches related to Big Data, IOT and cloud computing. It also discusses the encountered problems and open issues.
Scheduling Deadline Sensitive Tasks for Reducing Makespan, Task Diversity and Increasing Deadline Hit Ratio Using Backfilling of Resources and Patternized Clustering in Cloud Environment
2023, Wireless Personal Communications
Cost-Efficient Request Scheduling and Resource Provisioning in Multiclouds for Internet of Things
2020, IEEE Internet of Things Journal

Yue-Shan Chang received his Ph.D. Degree from the Department of Computer and Information Science at the National Chiao Tung University in 2001. He joined the Department of Electronic Engineering at Ming Hsing University of Science and Technology in August 1992. In August 2004, he joined the Department of Computer Science and Information Engineering, National Taipei University, Taipei County, Taiwan. In August 2010, he became a Professor and serve as the Chairman of department from 2014 to 2017. His research interests are cloud computing, intelligent computing, big data, and Internet of Thing.

Shyan-Ming Yuan received his BSEE degree from National Taiwan University in 1981, his M.S. degree in Computer Science from University of Maryland, Baltimore County in 1985, and his Ph.D. degree in Computer Science from the University of Maryland College Park in 1989. Dr. Yuan joined the Electronics Research and Service Organization, Industrial Technology Research Institute as a Research Member in October 1989. Since September 1990, he has been an Associate Professor at the Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan. He became the Professor in June 1995. His current research interests include Distributed Objects, Internet Technologies, and Software System Integration. Dr. Yuan’s ORCiD ID is https://orcid.org/0000-0002-3621-9528.

View full text

VM instance selection for deadline constraint job on agent-based interconnected cloud

Highlights

Abstract

Introduction

Section snippets

Interconnected cloud model

Agent-based Federated Broker

Execution time approximation

Implementation

Evaluation and results

Conclusions and future works

Acknowledgment

Future Gener. Comput. Syst.

Future Gener. Comput. Syst.

Future Gener. Comput. Syst.

Future Gener. Comput. Syst.

Chemometrics Intell. Lab. Syst.

Chemometrics Intell. Lab. Syst.

Interconnected cloud computing environment: Challenges, taxonomy, and survey

ACM Comput. Surv.

The reservoir model and architecture for open federated cloud computing

IBM J. Res. Dev.

Adaptive scheduling for parallel tasks with QoS satisfaction for hybrid cloud environment

J. Supercomput.

Cost evaluation on building and operating cloud platform

Int. J. Grid High Perform. Comput.

Aneka: Next-generation enterprise grid platform for e-science and e-business applications

The structure of the new IT Frontier: Aneka platform for elastic cloud computing applications