Elsevier

Computer Networks

Volume 179, 9 October 2020, 107255
Computer Networks

SmartFCT: Improving power-efficiency for data center networks with deep reinforcement learning

https://doi.org/10.1016/j.comnet.2020.107255Get rights and content

Abstract

Reducing the power consumption of Data Center Networks (DCNs) and guaranteeing the Flow Completion Time (FCT) of applications in DCNs are two major concerns for data center operators. However, existing works cannot realize the two goals together because of two issues: (1) dynamic traffic pattern in DCNs is hard to accurately model; (2) an optimal flow scheduling scheme is computationally expensive.

In this paper, we propose SmartFCT, which employs the Deep Reinforcement Learning (DRL) coupled with Software-Defined Networking (SDN) to improve the power efficiency of DCNs and guarantee FCT. SmartFCT dynamically collects traffic distribution from switches to train its DRL model. The well-trained DRL agent of SmartFCT can quickly analyze the complicated traffic characteristics using neural networks and adaptively generate a action for scheduling flows and deliberately configuring margins for different links. Following the generated action, flows are consolidated into a few of active links and switches for saving power, and fine-grained margin configuration for active links avoids FCT violation of unexpected flow bursts. Simulation results show that SmartFCT can guarantee FCT and save up to 12.2% power consumption, compared with the state-of-the-art solutions.

Introduction

High power consumption of data centers has become a big concern for data center operators [1]. Recent studies show that the power consumption of data centers in the US is estimated to reach 139 billion kWh in 2020 [2]. In a data center, the Data Center Networks (DCN), which consists of switches and links, consumes around 10% to 20% of the total energy consumption [2], [3], [4]. Since the traffic in DCNs exhibits fluctuation [5], some recent studies propose power-efficient DCNs, which employ Software-Defined Networking (SDN) to reduce the power consumption of DCN by traffic consolidation [6]. In a power-efficient DCN, traffic flows are consolidated into a minimum-power subnet, which consists of a small set of switches and links to accommodate the traffic, and the unused switches and links can be put into sleep mode or turned off to save power [3], [4], [7].

However, existing traffic scheduling schemes have two limitations. First, a simple flow scheduling scheme may incur Quality of Service (QoS) degradation. Data centers have many delay-sensitive applications (e.g., web search) with stringent performance requirement. Such applications require their flows to be transmitted before a given deadline to guarantee the QoS, which can be called a Flow Completion Time (FCT) constraint [8]. In a power-efficient DCN, simply deactivating some network links and devices may increase the FCT of some flows since the active links have to contain more flows and thus have a higher chance to experience congestion, which will increase the FCT of flows [9], [10]. However, some existing works do not consider the FCT constraint in their traffic consolidating schemes [3], [4].

Second, some existing flow scheduling schemes do not fully consider the traffic diversity in DCNs, so they cannot optimally save the power. The traffic distribution in DCNs exhibits a temporal fluctuation (i.e., the workload on a certain link is different at different times) and a spatial distribution imbalance (i.e., different links carry different workload at the same time) [5], [11]. However, exploiting such two features to get an optimized consolidation solution requires high computation complexity. Existing works usually follow some fixed high-level patterns and do not generalize well to the dynamic traffic variation [12].

It is difficult to jointly reduce the power consumption and guarantee the FCT of applications in DCNs. However, the emerging machine learning technology has brought a new opportunity to solve the complicated designing problems in networks. For example, Knowledge-Defined Networking (KDN) [13] proposes a framework to combine Artificial Intelligence (AI) with SDN. KDN implements a knowledge plane above the control plane of SDN with machine learning technologies. With the global network view data collected by the SDN controller, the knowledge plane can conduct AI-assisted data analytic such as Deep Learning (DL) and Reinforcement Learning (RL) and then generate fast and automatic control decisions. Specifically, the Deep Reinforcement Learning (DRL) technologies have shown great potentials in solving the networking problems for the following two reasons. First, with the combination of deep neural networks and RL, DRL can directly map the complicated input data into a control action for the networks. Second, the training process of DRL algorithms requires no labeled data set, which further makes DRL appropriate in many networking problems.

In this paper, we propose SmartFCT, a dynamic flow scheduling scheme to guarantee the FCT and improve the power efficiency in DCNs. Based on SDN, SmartFCT can extract the temporal and spatial distribution characteristics of the traffic, and then leverages DRL to generate dynamic traffic consolidation policies to reduce the power consumption in DCNs. At the same time, SmartFCT leaves margins on active links and devices to guarantee the FCT deadline of burst flows to ensure the network performance. The main contributions of this paper are summarized as follows:

  • 1.

    We propose a flow scheduling scheme that considers the temporal fluctuation and spatial distribution of the traffic in DCNs to guarantee the FCT and improve the power efficiency using DRL.

  • 2.

    We customize a DRL algorithm on the SDN controller, which can automatically perform the feature analysis of the input traffic and generate a dynamic traffic scheduling policy without expert knowledge.

  • 3.

    We run a simulation on NS2 to test SmartFCT under real-world DCN traces. Simulation results demonstrate that SmartFCT can guarantee the FCT constraints of traffic flows and save 12.2% more power consumption than existing schemes.

The rest of the paper is organized as follows. Section II introduces the background and motivation of this paper. Section III formulates the flow consolidation problem. Section IV describe the overall architecture and working process of SmartFCT. Section V presents the details of algorithm and interface design in SmartFCT. Section VI presents evaluation results and analysis. Section VII discusses related work. Finally, Section VIII concludes the paper.

Section snippets

Background and motivation

In this section, we introduce the background of SmartFCT and present our motivation to design SmartFCT.

Problem formulation

In a DCN, there are a lot of components that have a power consumption, including the switch chassis, line-card and ports, etc. An optimal flow consolidation scheme from the perspective of power efficiency is to consolidate flows into fewer active links or even switches, so as to reduce the power consumption of such components. In practice, the power consumption of ports varies with the traffic intensity on the ports, which makes this problem dynamic. Formally, in a DCN that consists of N

Overview of SmartFCT

SmartFCT is implemented based on SDN, as shown in Fig. 3. The core function of SmartFCT is implemented with a DRL agent above the SDN controller, which communicates with the controller through the northbound interface. In SmartFCT, we consolidate the delay-sensitive flows and delay-tolerant flows into a subset of network links and devices for power saving. First, to reduce the power consumption, SmartFCT consolidates flows into fewer links and devices in use. Then, to ensure the FCT of delay

Details of SmartFCT

In this section, we introduce the implementation details of SmartFCT, which mainly contains two algorithms: the DRL algorithm used in SmartFCT and the routing path update scheme.

Simulation evaluation

In this section, we use simulations to compare SmartFCT with other existing solutions.

Power-efficiency in data center networks

To improve the power efficiency of DCNs, there are a lot of schemes proposed to adjust the network components to consume power proportionally to the traffic demand [3], [4], [29]. For instance, ElasticTree [3] proposes a simple flow-level traffic consolidation scheme in a flat-tree network. In ElasticTree, the flows are consolidated based on their maximum bandwidth requirements to maximize link utilization of a subset of network links, then the unused links and devices can be turned off for

Conclusion

Traffic consolidation is widely considered as an effective approach to improve the power efficiency of DCNs. To ensure the QoS of flows, both the FCT constraints and the power efficiency should be jointly considered. In this paper, we propose SmartFCT that uses DRL to dynamically and effectively generate traffic consolidation strategy for DCN. The DRL agent can efficiently generate actions for traffic consolidation adjustment based on the input traffic distribution. The experiment verifies the

CRediT authorship contribution statement

Penghao Sun: Conceptualization, Writing - original draft, Methodology, Software. Zehua Guo: Conceptualization, Formal analysis, Writing - review & editing. Sen Liu: Visualization, Formal analysis. Julong Lan: Funding acquisition, Supervision. Junchao Wang: Software, Validation. Yuxiang Hu: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This paper is supported by the National Key Research and Development Plan under Grant Number 2017YFB0803204, the National Natural Science Fund of China under Grant Numbers 61521003 and 61872382, and the Beijing Institute of Technology Research Fund Program for Young Scholars.

Penghao Sun is currently a Ph.D candidate at National Digital Switching System Engineering and Technological R&D Center (NDSC), China. He received the B.S and M.S degrees from NDSC in 2014 and 2017 respectively. His current research interests include network architecture, edge computing and machine learning on networking.

References (48)

  • Z. Guo et al.

    JET: electricity cost-aware dynamic workload management in geographically distributed datacenters

    Comput. Commun.

    (2014)
  • G. Urdaneta et al.

    Wikipedia workload analysis for decentralized hosting

    Comput. Netw.

    (2009)
  • P. Sun et al.

    TIDE: time-relevant deep reinforcement learning for routing optimization

    Future Gener. Comput. Syst.

    (2019)
  • P. Delforge, America’s data centers consuming and wasting growing amounts of energy,...
  • B. Heller et al.

    ElasticTree: Saving energy in data center networks

    Nsdi

    (2010)
  • X. Wang et al.

    Correlation-aware traffic consolidation for power optimization of data center networks

    IEEE Trans. Parallel Distrib. Syst.

    (2015)
  • M. Chowdhury et al.

    Managing data transfers in computer clusters with orchestra

    ACM SIGCOMM Comput. Commun. Rev.

    (2011)
  • M.d.S. Conterato et al.

    Reducing energy consumption in SDN-based data center networks through flow consolidation strategies

    Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

    (2019)
  • Z. Guo et al.

    Dynamic flow scheduling for power-efficient data center networks

    2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)

    (2016)
  • D. Zats et al.

    Detail: reducing the flow completion time tail in datacenter networks

    Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication

    (2012)
  • M. Alizadeh et al.

    Data center TCP (DCTCP)

    ACM SIGCOMM Comput. Commun. Rev.

    (2011)
  • T. Hoff

    Latency is everywhere and it costs you sales-how to crush it

    High Scalability

    (2009)
  • A. Aghdai et al.

    Traffic measurement and analysis in an organic enterprise data center

    2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR)

    (2013)
  • N. Hamedazimi et al.

    FireFly: a reconfigurable wireless data center fabric using free-space optics

    ACM SIGCOMM Computer Communication Review

    (2014)
  • A. Mestres et al.

    Knowledge-defined networking

    ACM SIGCOMM Comput. Commun. Rev.

    (2017)
  • T. Benson et al.

    Understanding data center traffic characteristics

    Proceedings of the 1st ACM Workshop on Research on Enterprise Networking

    (2009)
  • B. Vamanan et al.

    Deadline-aware datacenter TCP (D2TCP)

    ACM SIGCOMM Comput. Commun. Rev.

    (2012)
  • C. Wilson et al.

    Better never than late: Meeting deadlines in datacenter networks

    ACM SIGCOMM Computer Communication Review

    (2011)
  • N. Dukkipati et al.

    Why flow-completion time is the right metric for congestion control

    ACM SIGCOMM Comput. Commun. Rev.

    (2006)
  • K. Zheng et al.

    FCTcon: dynamic control of flow completion time in data center networks for power efficiency

    IEEE Trans. Cloud Comput.

    (2019)
  • D. Li et al.

    EXR: greening data center network with software defined exclusive routing

    IEEE Trans. Comput.

    (2014)
  • O.S. Specification-Version, 1.4. 0,...
  • V. Mnih et al.

    Human-level control through deep reinforcement learning

    Nature

    (2015)
  • T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra, Continuous control with...
  • Cited by (36)

    • Energy saving evaluation of an energy efficient data center using a model-free reinforcement learning approach

      2022, Applied Energy
      Citation Excerpt :

      Within each, they are further categorized into their specific problem objective. Network-based implementations mainly focus on solving service related issues such as the efficient allocation of storage resource [7], task scheduling [8], efficient allocation of resources to minimize network power usage [4–5] and providing predictive and efficient routing schemes [6–12]. Facility-based implementations on the other hand, mainly focus on improving the energy efficiency of operational techniques at the component level [13] or the facility level.

    • Efficient instance reuse approach for service function chain placement in mobile edge computing

      2022, Computer Networks
      Citation Excerpt :

      By training an agent to interact with the environment, the DRL agent learns a strategy to maximize the long-term cumulative reward. In related research about resource scheduling, DRL has proved its strong ability to make continuous online decision [32–34]. Moreover, recent SFC placement solutions have been combined with reinforcement learning.

    View all citing articles on Scopus

    Penghao Sun is currently a Ph.D candidate at National Digital Switching System Engineering and Technological R&D Center (NDSC), China. He received the B.S and M.S degrees from NDSC in 2014 and 2017 respectively. His current research interests include network architecture, edge computing and machine learning on networking.

    Zehua Guo received a B.S. degree from Northwestern Polytechnical University, an M.S. degree from Xidian University, and a Ph.D. degree from Northwestern Polytechnical University. He is an Associate Professor at School of Automation, Beijing Institute of Technology. He was a Research Fellow at Department of Electrical and Computer Engineering, New York University Tandon School of Engineering, , a Post-Doctoral Research Associate at Department of Computer Science and Engineering, University of Minnesota Twin Cities, and a Visiting Associate Professor at Pillar of Information Systems Technology and Design, Singapore University of Technology and Design. His research interests include software-defined networking, network function virtualization, data center network, cloud computing, content delivery network, network security, edge computing, machine learning, and Internet exchange. Dr. Guo is an Associate Editor for IEEE ACCESS and the EURASIP Journal on Wireless Communications and Networking (Springer), and an Editor for the KSII Transactions on Internet and Information Systems. He was the Session Chair for the IEEE International Conference on Communications 2018 and the Technical Program Committee Member of Computer Communications (Elsevier), ICCCN 2020, ICA3PP 2020, CSCloud 2020, SmartCloud 2020, SPDE 2020. He is a Senior Member of IEEE.

    Sen Liu received the B.S degree from Northeastern University, the M.S. degrees from South China University of Technology, and the Ph.D degree from Central South University. He is currently a Post-Doctoral Research Associate at Fudan University. He also worked as a Visiting Scholar with the Department of Computer Science and Engineering, University of Minnesota, Twin Cities, MN, USA, from 2018 to 2019. His research interests include congestion control, network traffic load balancing, and performance optimization in data center networks.

    Julong Lan is currently a professor and the chief engineer at National Digital Switching System Engineering and Technological R&D Center (NDSC), China. His is also the Chief scientist of China National Program on Key Basic Research Project (973 Program). His current research interests include 5G communication system, next generation of computer network and artificial intelligence.

    Junchao Wang in an assistant professor at National Digital Switching System Engineering and Technological R&D Center.

    Yuxiang Hu received his Ph.D. degree in 2011 in National Digital Switching System Engineering and Technological Research Center of China. He is currently an associate professor, and his research interests include Internet architecture, novel switching and routing, multimedia network technology and so on.

    View full text