Elsevier

Applied Soft Computing

Volume 70, September 2018, Pages 12-21
Applied Soft Computing

Optimal resource allocation using reinforcement learning for IoT content-centric services

https://doi.org/10.1016/j.asoc.2018.03.056Get rights and content

Highlights

  • Propose a novel dynamic programming that uses RL techniques for resource allocations in IoT.

  • Combines QoE with RL to create pre-stored cost mapping tables for optimal resource allocations.

  • Implement content-centric network to enhance the fulfillment of the resource allocation.

Abstract

The exponential growing rate of the networking technologies has led to a dramatical large scope of the connected computing environment. Internet-of-Things (IoT) is considered an alternative for obtaining high performance by the enhanced capabilities in system controls, resource allocations, data exchanges, and flexible adoptions. However, current IoT is encountering the bottleneck of the resource allocation due to the mismatching networking service quality and complicated service offering environments. This paper concentrates on the issue of resource allocations in IoT and utilizes the satisfactory level of Quality of Experience (QoE) to achieve intelligent content-centric services. A novel approach is proposed by this work, which utilizes the mechanism of Reinforcement Learning (RL) to obtain high accurate QoE in resource allocations. Two RL-based algorithms have been proposed for cost mapping tables creations and optimal resource allocations. Our experiment evaluations have assessed the efficiency of implementing the proposed approach.

Introduction

The dramatically increasing amount of the network-based techniques adoptions is leading to a large extent of the connected environment, such the numerous connected devices form an Internet-of-Things (IoT) system. Establishing a scalable and networkable system is becoming one of the mainstreams in modern industries, due to its multiple benefits, such as exchanging data, sharing infrastructure, integrating systems, or distributing workloads [1], [2], [3], [4]. For instance, as a high level of IoT, a Cyber-Physical System (CPS) is an efficient approach for tying up the Internet, humans, and functional objects. A functional object can be either a hardware or a software, which is expected to provision certain offerings or achieve some purposes. The advantage of interconnecting numerous devices is observable. However, developing a competent resource allocation approach is a challenging issue.

Despite many prior studies addressing resource allocation issues [5], the challenges still exist in contemporary IoTs. The vital problem in this issue is creating an adaptable mechanism for resource allocations. An intelligent agent is one of the alternatives for allocating various resources in IoT. The resource allocation mechanism needs to consider at least two aspects: the first aspect is to guarantee a high performance; the other aspect is to ensure that the strategy of the resource allocation can be created in an acceptable time period [6]. The performance can be any examinable objective in the scenario of IoT, such as the energy cost, response time, transmission rate, or security level [7], [8], [9], [10]. Nevertheless, these two aspects generally are incompatible with one another [11]. An algorithm demanding a short execution time can only create sub-optimal solutions [12]; an algorithm that creates optimal solutions generally needs a longer execution time for creating resource allocation strategies [13], [14]. Thus, finding out an approach that distinguishes from prior research directions and effectively avoid the contradiction is a demanding job.

Moreover, another challenging problem addresses creating a proper pre-stored cost mapping in intelligent agents. Most optimization algorithms highly replies on a table as inputs for resource allocations. It determines that most employed active tables are fixed, which can hardly forward tasks depending on real-time service contents [15], [16]. The performance of the optimization may not be able to reach the expected consequence, since a fixed mapping table cannot effectively represent the status of the input task. Additionally, it is also unworkable to obtain accurate information concerning the cost each at each available computation node, based on the task type or data package headers. Therefore, maintaining an effective pre-stored cost mapping table is a challenging and necessary mission for intelligent agents (Fig. 1).

As one of the approaches in machine learning, Reinforcement Learning (RL) is a choice for system designers to obtain optimal controls. In the context of IoT, there are two characteristics about the performance enhancement [17]. First, many factors may have impact on eventual performance due to a great quantity of connected devices. A few typical factors include the energy supply [18], heterogeneous networks [19], hardware conditions, infrastructure periodic availability, usage frequency, jamming/spoofing attacks [20], or networking traffic jams. The other characteristic is that edge devices can measure the performance based on pre-defined criterion. These two characteristics imply that utilizing feedbacks from the user side can avoid the interference and complexity when creating cost mapping tables. Therefore, we design an approach for intelligent agent to learn the cost of the various task types for each computation node from the perspective of the state.

Furthermore, this paper is an extended work that further expands our prior study [21]. Our approach is called Smart Content-Centric Services for Internet-of-Things (SCCS-IoT) model, which applies addressable and routable contents architecture for communications. Term Smart in our model refers to intelligent operations that can output optimal solutions with using a lower-level or affordable computing resource. In addition, the proposed approach has two focuses. The first focus is the resource allocation issue in IoT and we propose a novel approach that uses a RL mechanism to construct the strategy of the resource allocation. Implementing a RL mechanism intends to prevent the contradiction and make the resource allocation operations smart. The other focus is that we consider Quality of Experience (QoE) to construct the value function used in RL. Our approach combines QoE with RL to obtain a dynamic cost mapping table. Each task type's cost for a computation node is considered a state. The value of the state is determined by the rewards responded from the user side.

The significance of this work is perceivable due to the high demand. From the technical perspective, the main contributions of this paper include:

  • 1

    This paper proposes a novel approach that uses RL techniques for resource allocations in IoT. The proposed approach can solve a core issue restricting previous solutions, which is the contradiction between the performance and the strategy generation time. The main algorithm used our model is a dynamic programming.

  • 2

    This work combines QoE with RL to create pre-stored cost mapping tables for optimal resource allocations. The values in the table are determined by feedbacks from the user side throughout a QoE manipulations. Value function in RL uses the level of QoE as rewards for updating states.

  • 3

    This paper emphasizes the implementation of the content-centric network to enhance the fulfillment of the resource allocation. The value function used in RL considers both energy cost and response time, which is expandable and extendable to other applications.

The rest of this paper is organized by the following order. First, Section 2 provides a brief survey concentrating on two basic aspects, namely RL and QoE. Next, Section 3 presents the proposed model as well as key concepts used in our model. Meanwhile, Section 4 demonstrates the core algorithms that support our model. Furthermore, Section 5 provides a motivational example to show the major steps and mechanisms of our model. Moreover, Section 6 shows the configuration of the experiment and illustrates a series of experiment results. Finally, Section 7 gives conclusions of this work.

Section snippets

Related work

We review and summarize relevant previous studies concerning RL mechanisms and QoE implementations in IoT in this section.

A RL approach's core component is an agent with a value function. Formulating a proper value function for the agent is a primary manipulation for a specific machine learning problem. In the domain of networks, RL is an alternative for reaching optimal controls. For example, Liu et al. [22] have developed a RL-based tracking control approach that only requires two parameters

Preliminary

We provide a brief synthesis of preliminary studies related to our work in this section. First, our model is on the basis of the fact that information transfers are controllable in IoT. This goal can be achieved by various techniques, such as utilizing Data Chunks (DCs) in resource allocations [37]. A DC is a term for describing a piece of information carried by a Stream Control Transmission Protocol (SCTP), which embraces a header showing data parameters for control purposes. For instance,

Algorithms

We propose two crucial algorithms to support two vital manipulations in SCCS-IoT model. The first algorithm is Reinforcement Learning-based Mapping Table (RLMT) algorithm, which functions as the method of updating/maintaining cost mapping table and is described in Section 4.1. The other algorithm is called Reinforcement Learning-based Resource Allocation (RLRA) algorithm that is responsible for resource allocations. We present RLRA algorithm in Section 4.2.

Motivational example

A simple example is presented in this section in order to provide a basic picture about the implementation of our approach. The presented example is simulating an implementation scenario in IoT, in which a set of slightly changing inputs need to be allocated to heterogeneous computation nodes. Its motivation is to maximize the utilization of the computing resource without being impacted by slight changes of the inputs.

Align with the designs in Section 3, there are two crucial components in this

Experiment configuration

We presented our experimental revaluations in this section. The evaluation mainly focused on the training time for using reinforcement learning to obtain optimal resource allocations. We configured that the number of the input task was in an incremental manner. The purpose of this configuration was to examine the impact caused by the input setting. The findings derived from this examination would guide the input configuration by ensuring the number of input tasks at one service request within

Conclusions

This paper proposed a novel approach for resource allocation in CPS, which was based on a design of the reinforcement learning mechanism. Our approach applied the method of QoE to examine the service level, which enabled a dynamic resource allocation and avoided a fixed input task table. The value function used in the proposed reinforcement learning considered the level of the QoE the reward parameter and the output of our approach was an optimal solution.

Keke Gai received the B.Eng. degree from the Nanjing University of Science and Technology, Nanjing, China, the M.E.T. degree from the University of British Columbia, Vancouver, BC, Canada, the M.B.A. and M.S. degrees from Lawrence Technological University, Southfield, MI, USA, and the Ph.D. degree in computer science from the Department of Computer Science, Pace University, New York, NY, USA. He is currently an Associate Professor with the School of Computer Science and Technology, Beijing

References (40)

  • K. Gai et al.

    In-memory big data analytics under space constraints using dynamic programming

    Future Gener. Comput. Syst.

    (2018)
  • K. Gai et al.

    Blend arithmetic operations on tensor-based fully homomorphic encryption over real numbers

    IEEE Trans. Ind. Inf.

    (2018)
  • K. Gai et al.

    Resource management in sustainable cyber-physical systems using heterogeneous cloud computing

    IEEE Trans. Sustain. Comput.

    (2017)
  • K. Gai et al.

    Privacy-preserving data encryption strategy for big data in mobile cloud computing

    IEEE Trans. Big Data

    (2017)
  • D. Gavalas et al.

    A survey on algorithmic approaches for solving tourist trip design problems

    J. Heurist.

    (2014)
  • S. Venugopalan et al.

    ILP formulations for optimal task scheduling with communication delays on parallel systems

    IEEE Trans. Parallel Distrib. Syst.

    (2015)
  • C. Papagianni et al.

    On the optimal allocation of virtual resources in cloud computing networks

    IEEE Trans. Comput.

    (2013)
  • R. Sutton et al.

    Reinforcement Learning: An Introduction

    (1998)
  • K. Gai et al.

    Intrusion detection techniques for mobile cloud computing in heterogeneous 5G

    Secur. Commun. Netw.

    (2015)
  • K. Gai et al.

    Spoofing-jamming attack strategy using optimal power distributions in wireless smart grid networks

    IEEE Trans. Smart Grid

    (2017)
  • Cited by (184)

    • A review of optimization methods for computation offloading in edge computing networks

      2023, Digital Communications and Networks
      Citation Excerpt :

      The authors adopted classic Q-Learning, Round Robin, and Weighted Round Robin algorithms to verify the performance of the proposed algorithm. In [84], the authors applied a reinforcement learning based algorithm to create cost mapping tables and optimal resource allocations. In Refs. [38,85], the authors used DRL to minimize the overall cost in terms of energy cost, computation cost, and delay cost in MEC systems.

    • Cyber-security and reinforcement learning — A brief survey

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Therefore, the major problems in IoT smart city projects are optimal resource allocation, task scheduling, network, and resource slicing. The papers (Dutta and Biswas, 2022; Muthanna et al., 2022; Nguyen et al., 2022; Krishnan and Lim, 2021; Nauman et al., 2021; Cong et al., 2021; Ali et al., 2020; Muteba et al., 2020; Gazori et al., 2020; Chowdhury et al., 2019; Gai and Qiu, 2018; Zhu et al., 2021b; He et al., 2020; Hao et al., 2021) solve this problem by using RL. The “SecOFF-FCIoT” scheme, presented by the authors uses Q-learning to determine whether the public or private cloud server should receive the workload (Alli and Alam, 2019).

    • Secure and intelligent slice resource allocation in vehicles-assisted cyber physical systems

      2022, Computer Communications
      Citation Excerpt :

      Zhao et al. [21] modeled the CPS communication resource as one cooperative game and proposed K-M algorithm so as to maximize the communication rate in CPS application. Though abundant resource allocation studies exist in NS [6–13] and CPS [14–20], they had not jointly researched both issues yet. Not to mention adding edge devices, such as the vehicles.

    View all citing articles on Scopus

    Keke Gai received the B.Eng. degree from the Nanjing University of Science and Technology, Nanjing, China, the M.E.T. degree from the University of British Columbia, Vancouver, BC, Canada, the M.B.A. and M.S. degrees from Lawrence Technological University, Southfield, MI, USA, and the Ph.D. degree in computer science from the Department of Computer Science, Pace University, New York, NY, USA. He is currently an Associate Professor with the School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China. He has authored or co-authored over 80 peer-reviewed journals or conference papers, over 30 journal papers (including ACM/IEEE TRANSACTIONS), and over 50 conference papers. His current research interests include cloud computing, cyber security, combinatorial optimization, and edge computing. Dr. Gai was a recipient of the three IEEE Best Paper Awards (IEEE SSC’16, IEEE CSCloud’15, and IEEE BigDataSecurity’15) and the two IEEE Best Student Paper Awards (SmartCloud’16 and HPCC’16) at IEEE conferences in recent years. His paper about cloud computing has been ranked as the “Most Downloaded Articles” of the Journal of Network and Computer Applications. He is involved in a number of professional/ academic associations, including the ACM and IEEE.

    Meikang Qiu received the BE and ME degrees from Shanghai Jiao Tong University and received Ph.D. degree of Computer Science from University of Texas at Dallas. Currently, he is a professor at Huibei Engineering University and Associate Professor of Computer Science at Pace University. He is also an Adjunct Professor at Columbia University. He is an IEEE Senior member and ACM Senior member. He is the Chair of IEEE Smart Computing Technical Committee. His research interests include Cyber Security, Cloud Computing, Smarting Computing, Intelligent Data, Embedded systems, etc. A lot of novel results have been produced and most of them have already been reported to research community through high-quality journal and conference papers. He has published 14 books, 400 peer-reviewed journal and conference papers (including 180+ journal articles, 220+ conference papers, 60+ IEEE/ACM Transactions papers), and 3 patents. He has won ACM Transactions on Design Automation of Electrical Systems (TODAES) 2011 Best Paper Award. His paper about cloud computing has been published in JPDC (Journal of Parallel and Distributed Computing, Elsevier) and ranked #1 in Top Hottest 25 Papers of JPDC 2012. His papers published in IEEE Transactions on Computers and Journal of Computer and System Science (Elsevier) have been recognized as Highly Cited Papers in 2016 and 2017. He has won another 10+ Conference Best Paper Awards in recent years. Currently he is an associate editor of 10+ international journals, including IEEE Transactions on Computer and IEEE Transactions on Cloud Computing. He is the General Chair/Program Chair of a dozen of IEEE/ACM international conferences, such as IEEE HPCC, IEEE TrustCom, IEEE CSCloud, and IEEE BigDataSecurity. He has given 100+ talks all over the world, including Oxford, Princeton, Stanford, and Yale University. He has won Navy Summer Faculty Award in 2012 and Air Force Summer Faculty Award in 2009. His research is supported by US government such as NSF, Air Force, Navy and companies such as GE, Nokia, TCL, and Cavium.

    This work is supported by Beijing Institute of Technology Research Fund Program for Young Scholars (Dr. Keke Gai).

    View full text