SmartFCT: Improving power-efficiency for data center networks with deep reinforcement learning
Introduction
High power consumption of data centers has become a big concern for data center operators [1]. Recent studies show that the power consumption of data centers in the US is estimated to reach 139 billion kWh in 2020 [2]. In a data center, the Data Center Networks (DCN), which consists of switches and links, consumes around 10% to 20% of the total energy consumption [2], [3], [4]. Since the traffic in DCNs exhibits fluctuation [5], some recent studies propose power-efficient DCNs, which employ Software-Defined Networking (SDN) to reduce the power consumption of DCN by traffic consolidation [6]. In a power-efficient DCN, traffic flows are consolidated into a minimum-power subnet, which consists of a small set of switches and links to accommodate the traffic, and the unused switches and links can be put into sleep mode or turned off to save power [3], [4], [7].
However, existing traffic scheduling schemes have two limitations. First, a simple flow scheduling scheme may incur Quality of Service (QoS) degradation. Data centers have many delay-sensitive applications (e.g., web search) with stringent performance requirement. Such applications require their flows to be transmitted before a given deadline to guarantee the QoS, which can be called a Flow Completion Time (FCT) constraint [8]. In a power-efficient DCN, simply deactivating some network links and devices may increase the FCT of some flows since the active links have to contain more flows and thus have a higher chance to experience congestion, which will increase the FCT of flows [9], [10]. However, some existing works do not consider the FCT constraint in their traffic consolidating schemes [3], [4].
Second, some existing flow scheduling schemes do not fully consider the traffic diversity in DCNs, so they cannot optimally save the power. The traffic distribution in DCNs exhibits a temporal fluctuation (i.e., the workload on a certain link is different at different times) and a spatial distribution imbalance (i.e., different links carry different workload at the same time) [5], [11]. However, exploiting such two features to get an optimized consolidation solution requires high computation complexity. Existing works usually follow some fixed high-level patterns and do not generalize well to the dynamic traffic variation [12].
It is difficult to jointly reduce the power consumption and guarantee the FCT of applications in DCNs. However, the emerging machine learning technology has brought a new opportunity to solve the complicated designing problems in networks. For example, Knowledge-Defined Networking (KDN) [13] proposes a framework to combine Artificial Intelligence (AI) with SDN. KDN implements a knowledge plane above the control plane of SDN with machine learning technologies. With the global network view data collected by the SDN controller, the knowledge plane can conduct AI-assisted data analytic such as Deep Learning (DL) and Reinforcement Learning (RL) and then generate fast and automatic control decisions. Specifically, the Deep Reinforcement Learning (DRL) technologies have shown great potentials in solving the networking problems for the following two reasons. First, with the combination of deep neural networks and RL, DRL can directly map the complicated input data into a control action for the networks. Second, the training process of DRL algorithms requires no labeled data set, which further makes DRL appropriate in many networking problems.
In this paper, we propose SmartFCT, a dynamic flow scheduling scheme to guarantee the FCT and improve the power efficiency in DCNs. Based on SDN, SmartFCT can extract the temporal and spatial distribution characteristics of the traffic, and then leverages DRL to generate dynamic traffic consolidation policies to reduce the power consumption in DCNs. At the same time, SmartFCT leaves margins on active links and devices to guarantee the FCT deadline of burst flows to ensure the network performance. The main contributions of this paper are summarized as follows:
- 1.
We propose a flow scheduling scheme that considers the temporal fluctuation and spatial distribution of the traffic in DCNs to guarantee the FCT and improve the power efficiency using DRL.
- 2.
We customize a DRL algorithm on the SDN controller, which can automatically perform the feature analysis of the input traffic and generate a dynamic traffic scheduling policy without expert knowledge.
- 3.
We run a simulation on NS2 to test SmartFCT under real-world DCN traces. Simulation results demonstrate that SmartFCT can guarantee the FCT constraints of traffic flows and save 12.2% more power consumption than existing schemes.
The rest of the paper is organized as follows. Section II introduces the background and motivation of this paper. Section III formulates the flow consolidation problem. Section IV describe the overall architecture and working process of SmartFCT. Section V presents the details of algorithm and interface design in SmartFCT. Section VI presents evaluation results and analysis. Section VII discusses related work. Finally, Section VIII concludes the paper.
Section snippets
Background and motivation
In this section, we introduce the background of SmartFCT and present our motivation to design SmartFCT.
Problem formulation
In a DCN, there are a lot of components that have a power consumption, including the switch chassis, line-card and ports, etc. An optimal flow consolidation scheme from the perspective of power efficiency is to consolidate flows into fewer active links or even switches, so as to reduce the power consumption of such components. In practice, the power consumption of ports varies with the traffic intensity on the ports, which makes this problem dynamic. Formally, in a DCN that consists of N
Overview of SmartFCT
SmartFCT is implemented based on SDN, as shown in Fig. 3. The core function of SmartFCT is implemented with a DRL agent above the SDN controller, which communicates with the controller through the northbound interface. In SmartFCT, we consolidate the delay-sensitive flows and delay-tolerant flows into a subset of network links and devices for power saving. First, to reduce the power consumption, SmartFCT consolidates flows into fewer links and devices in use. Then, to ensure the FCT of delay
Details of SmartFCT
In this section, we introduce the implementation details of SmartFCT, which mainly contains two algorithms: the DRL algorithm used in SmartFCT and the routing path update scheme.
Simulation evaluation
In this section, we use simulations to compare SmartFCT with other existing solutions.
Power-efficiency in data center networks
To improve the power efficiency of DCNs, there are a lot of schemes proposed to adjust the network components to consume power proportionally to the traffic demand [3], [4], [29]. For instance, ElasticTree [3] proposes a simple flow-level traffic consolidation scheme in a flat-tree network. In ElasticTree, the flows are consolidated based on their maximum bandwidth requirements to maximize link utilization of a subset of network links, then the unused links and devices can be turned off for
Conclusion
Traffic consolidation is widely considered as an effective approach to improve the power efficiency of DCNs. To ensure the QoS of flows, both the FCT constraints and the power efficiency should be jointly considered. In this paper, we propose SmartFCT that uses DRL to dynamically and effectively generate traffic consolidation strategy for DCN. The DRL agent can efficiently generate actions for traffic consolidation adjustment based on the input traffic distribution. The experiment verifies the
CRediT authorship contribution statement
Penghao Sun: Conceptualization, Writing - original draft, Methodology, Software. Zehua Guo: Conceptualization, Formal analysis, Writing - review & editing. Sen Liu: Visualization, Formal analysis. Julong Lan: Funding acquisition, Supervision. Junchao Wang: Software, Validation. Yuxiang Hu: Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This paper is supported by the National Key Research and Development Plan under Grant Number 2017YFB0803204, the National Natural Science Fund of China under Grant Numbers 61521003 and 61872382, and the Beijing Institute of Technology Research Fund Program for Young Scholars.
Penghao Sun is currently a Ph.D candidate at National Digital Switching System Engineering and Technological R&D Center (NDSC), China. He received the B.S and M.S degrees from NDSC in 2014 and 2017 respectively. His current research interests include network architecture, edge computing and machine learning on networking.
References (48)
- et al.
JET: electricity cost-aware dynamic workload management in geographically distributed datacenters
Comput. Commun.
(2014) - et al.
Wikipedia workload analysis for decentralized hosting
Comput. Netw.
(2009) - et al.
TIDE: time-relevant deep reinforcement learning for routing optimization
Future Gener. Comput. Syst.
(2019) - P. Delforge, America’s data centers consuming and wasting growing amounts of energy,...
- et al.
ElasticTree: Saving energy in data center networks
Nsdi
(2010) - et al.
Correlation-aware traffic consolidation for power optimization of data center networks
IEEE Trans. Parallel Distrib. Syst.
(2015) - et al.
Managing data transfers in computer clusters with orchestra
ACM SIGCOMM Comput. Commun. Rev.
(2011) - et al.
Reducing energy consumption in SDN-based data center networks through flow consolidation strategies
Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing
(2019) - et al.
Dynamic flow scheduling for power-efficient data center networks
2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)
(2016) - et al.
Detail: reducing the flow completion time tail in datacenter networks
Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
(2012)
Data center TCP (DCTCP)
ACM SIGCOMM Comput. Commun. Rev.
Latency is everywhere and it costs you sales-how to crush it
High Scalability
Traffic measurement and analysis in an organic enterprise data center
2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR)
FireFly: a reconfigurable wireless data center fabric using free-space optics
ACM SIGCOMM Computer Communication Review
Knowledge-defined networking
ACM SIGCOMM Comput. Commun. Rev.
Understanding data center traffic characteristics
Proceedings of the 1st ACM Workshop on Research on Enterprise Networking
Deadline-aware datacenter TCP (D2TCP)
ACM SIGCOMM Comput. Commun. Rev.
Better never than late: Meeting deadlines in datacenter networks
ACM SIGCOMM Computer Communication Review
Why flow-completion time is the right metric for congestion control
ACM SIGCOMM Comput. Commun. Rev.
FCTcon: dynamic control of flow completion time in data center networks for power efficiency
IEEE Trans. Cloud Comput.
EXR: greening data center network with software defined exclusive routing
IEEE Trans. Comput.
Human-level control through deep reinforcement learning
Nature
Cited by (36)
A port consolidation model for data center network infrastructure energy efficiency
2024, Sustainable Computing: Informatics and SystemsIntelligent privacy-preserving demand response for green data centers
2023, Electric Power Systems ResearchEnergy saving evaluation of an energy efficient data center using a model-free reinforcement learning approach
2022, Applied EnergyCitation Excerpt :Within each, they are further categorized into their specific problem objective. Network-based implementations mainly focus on solving service related issues such as the efficient allocation of storage resource [7], task scheduling [8], efficient allocation of resources to minimize network power usage [4–5] and providing predictive and efficient routing schemes [6–12]. Facility-based implementations on the other hand, mainly focus on improving the energy efficiency of operational techniques at the component level [13] or the facility level.
Efficient instance reuse approach for service function chain placement in mobile edge computing
2022, Computer NetworksCitation Excerpt :By training an agent to interact with the environment, the DRL agent learns a strategy to maximize the long-term cumulative reward. In related research about resource scheduling, DRL has proved its strong ability to make continuous online decision [32–34]. Moreover, recent SFC placement solutions have been combined with reinforcement learning.
Scheduling In-Band Network Telemetry with Convergence-Preserving Federated Learning
2023, IEEE/ACM Transactions on Networking
Penghao Sun is currently a Ph.D candidate at National Digital Switching System Engineering and Technological R&D Center (NDSC), China. He received the B.S and M.S degrees from NDSC in 2014 and 2017 respectively. His current research interests include network architecture, edge computing and machine learning on networking.
Zehua Guo received a B.S. degree from Northwestern Polytechnical University, an M.S. degree from Xidian University, and a Ph.D. degree from Northwestern Polytechnical University. He is an Associate Professor at School of Automation, Beijing Institute of Technology. He was a Research Fellow at Department of Electrical and Computer Engineering, New York University Tandon School of Engineering, , a Post-Doctoral Research Associate at Department of Computer Science and Engineering, University of Minnesota Twin Cities, and a Visiting Associate Professor at Pillar of Information Systems Technology and Design, Singapore University of Technology and Design. His research interests include software-defined networking, network function virtualization, data center network, cloud computing, content delivery network, network security, edge computing, machine learning, and Internet exchange. Dr. Guo is an Associate Editor for IEEE ACCESS and the EURASIP Journal on Wireless Communications and Networking (Springer), and an Editor for the KSII Transactions on Internet and Information Systems. He was the Session Chair for the IEEE International Conference on Communications 2018 and the Technical Program Committee Member of Computer Communications (Elsevier), ICCCN 2020, ICA3PP 2020, CSCloud 2020, SmartCloud 2020, SPDE 2020. He is a Senior Member of IEEE.
Sen Liu received the B.S degree from Northeastern University, the M.S. degrees from South China University of Technology, and the Ph.D degree from Central South University. He is currently a Post-Doctoral Research Associate at Fudan University. He also worked as a Visiting Scholar with the Department of Computer Science and Engineering, University of Minnesota, Twin Cities, MN, USA, from 2018 to 2019. His research interests include congestion control, network traffic load balancing, and performance optimization in data center networks.
Julong Lan is currently a professor and the chief engineer at National Digital Switching System Engineering and Technological R&D Center (NDSC), China. His is also the Chief scientist of China National Program on Key Basic Research Project (973 Program). His current research interests include 5G communication system, next generation of computer network and artificial intelligence.
Junchao Wang in an assistant professor at National Digital Switching System Engineering and Technological R&D Center.
Yuxiang Hu received his Ph.D. degree in 2011 in National Digital Switching System Engineering and Technological Research Center of China. He is currently an associate professor, and his research interests include Internet architecture, novel switching and routing, multimedia network technology and so on.