Large-Scale and Adaptive Service Composition Using Deep Reinforcement Learning

Wang, Hongbing; Gu, Mingzhu; Yu, Qi; Fei, Huanhuan; Li, Jiajie; Tao, Yong

doi:10.1007/978-3-319-69035-3_27

Hongbing Wang¹⁷,
Mingzhu Gu¹⁷,
Qi Yu¹⁸,
Huanhuan Fei¹⁷,
Jiajie Li¹⁷ &
…
Yong Tao¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10601))

Included in the following conference series:

International Conference on Service-Oriented Computing

3837 Accesses
11 Citations

Abstract

Service composition provides an effective way to implement a Service-Oriented Architecture (SOA) by combining existing multiple services to meet user requirements. The increasingly complex user requirements and large amount of services pose a significant challenge to service selection and composition. Furthermore, web services are network based, which are inherently dynamic. The environment of service composition may also be complex and unstable. These demand a service composition solution to adapt to the change of environment. In this paper, we propose a new service composition solution based on Deep Reinforcement Learning (DRL) for adaptive and large-scale service composition problems. The experimental results demonstrate the effectiveness, scalability and self-adaptivity of our approach.

You have full access to this open access chapter, Download conference paper PDF

A Deep Reinforcement Learning Approach for Large-Scale Service Composition

On Learning Adaptive Service Compositions

Article 11 August 2021

A Novel Approach to Large-Scale Services Composition

1 Introduction

In service computing, web service composition is the most effective technology to implement a Service-Oriented Architecture (SOA) [10]. In recent years a large number of enterprises distribute and release their products through web services that can be accessed by others. This leads to rapid growth of the number of web services. One service usually does not meet complex user requirements, so it’s necessary to combine multiple services to form a service composition. Given that the number of services with same functional attributes may be quite large, Quality of Service (QoS) has become an important factor to differentiate competing services. QoS-aware service composition has become a key research direction in the service computing community [1, 6].

In practical applications, under the condition of meeting user’s requirements, the criteria for evaluating whether a service composition solution has applied values are the quality, adaptability and efficiency of composition [11]. Web services rely on the network environment inherently, so the network fluctuation will lead to changes in QoS performance, such as long delay. Therefore, due to the dynamic network environment, a good web service composition solution needs to adapt to the dynamic environment. In addition, the growth of the number of services with similar functionality but different QoS significantly expands the candidate service space. More specifically, if the number of abstract services in composition workflow is m and the candidate service number is n, there exist $n^m$ possible composition solutions, which leads to a “combinatorial explosion problem” [2, 9]. Existing works mainly focus on using reinforcement learning (RL) to adapt to the dynamic environment. However, the existing RL methods show a poor efficiency for large-scale problems [11].

In this paper, we develop an adaptive service composition method based on deep reinforcement learning (DRL), which integrates reinforcement learning and deep learning. RL helps achieve adaptivity in service composition, deep learning is to enhance the ability of expression and generalization.

2 Related Work

In this section, we review some related works that deal with large-scale and adaptive problems in service composition, including the planing solution, reinforcement learning (RL), and deep reinforcement learning (DRL).

In recent years, there are many studies to address the adaptability issue, such as integer programming technology, graph planning, artificial intelligence. In [13], the authors develop a method using AI planning to build the service composition workflow. A repairing approach is used to deal with the changes in process of composition. However, building a service composition workflow needs some priori knowledge about the environment. Reinforcement learning provides an effective method to achieve adaptive service composition. RL is more suited to resolve the incomplete scenario using the trial and error exploration to discover the optimal policy [4]. Wang et al. [12] propose a service composition method based on Markov Decision Process (MDP). This method only utilizes RL, so it cannot deal with lager-scale service composition problems.

To address the high-dimensional inputs in RL, the deep learning which can extract features from raw data can be employed. In [5], a multi-layer perceptron is adopted to approximate the Q-value, leading to a Neural Fitted Q Iteration (NFQ) algorithm. Mnih et al. [7] apply the DRL with the Atari 2600 game, which successful learns control policies from the high dimension sensation input and with expert level performance.

3 Preliminaries

3.1 Reinforcement Learning

In a standard RL framework, the agent interacts with environment by executing certain actions, and gets a feedback, and adjusts its behaviors. Q-learning [3] is an widely used RL method. Q-learning approximates value function of the state-action pair by reducing the difference between neighboring condition estimated Q-value at every step of learning. The update rule of Q-function is defined as.

$$\begin{aligned} Q\left( s,a \right) \leftarrow (1-\alpha )\overset{{}}{\mathop {Q}}\,\left( s,a \right) +\alpha \left[ r+\gamma {{\max }_{{{a}'}}}\overset{{}}{\mathop {Q}}\,\left( {s}',{a}' \right) \right] \end{aligned}$$

(1)

where $\alpha $ is learning rate, $\gamma $ is the discount factor, and Q(s, a) is the state-action value under state s executing action a. And in RL, the discounted cumulative reward is used to evaluate the result which is defined as:

$$\begin{aligned} V = \sum _{i=0}^\infty \gamma ^i r_i \end{aligned}$$

(2)

where $r_i$ is the $i-th$ step immediate reward.

3.2 Deep Learning

LSTM is a recurrent neural network (RNN) extended with memory. Three other layers are added as hidden memory units compared with the original RNN, including the input gate, output gate, and forget gate. As shown in Fig. 1: The LSTM can be divided into three parts: (1) Forget gate is used to decide what information will be discarded, and the output value will be delivered to cell state $C_{t-1}$. (2) Determine what information can be put into the cell, which consists of two parts. One part will be updated by the input gate and another part is a new candidate vector created by Tanh layer. (3) Update the old information.

3.3 Deep Reinforcement Learning

Google DeepMind Team combines perception of Deep Learning and decision-making ability of RL to develop Deep Reinforcement Learning (DRL). The learning process is divided into three steps: (1) Through interacting with the environment (achieved by RL), an agent obtains observation and delivers the high dimension results to a neural network, to learn abstract representations; (2) The agent evaluates the action based on repayment value, and maps the current condition to a corresponding action by two kinds of strategy; (3) The environment responds to the action and gets the next observation.

This paper adopts the structure of RNN to remember the continuous state information in a history timeline and uses a Adaptive Deep Q-learning and RNN Composition Network (ADQRCN), which are suitable for service composition.

4 Problem Formulation

Consider someone who wants to arrange his trip schedule after determining departure and return back time. He may consume services, such as weather forecast, flight information search, and hotel reservation. The process of whole trip can be modelled as a transition graph in Fig. 2. It consists of two kinds of nodes. The hollow node represents state node (i.e., abstract service), such as $S_{0}$. Another type is a solid node, namely the concrete service. Abstract service refers to a class of services with the same function attributes and different QoS. Every abstract service has multiple concrete services.

Based on the flow chart of vocation planning, we need to construct the model to solve the problem. We model service composition using a Markov Decision Process (MDP) and further exploit how to generate an effective policy.

Definition 1

(MDP-based web service composition (MDP-WSC)). A MDP-WSC is a 6-tuple MDP-WSC=$<S,S_{0},S_{\tau },A(.),P,R>$, where

S is a finite set of the world states;
$S_{0}\in S$ is the initial state from which an execution of the service composition starts;
$S_{\tau }\subset S$ is the set of terminal states, indicating an end of composition execution when reaching one state $S_{\tau }^{i} \in S_{\tau }$;
A(s) represents the set of services that can be executed in state $s \in S $;
P is a probability distribution function. When a web service $\alpha $ is invoked, the world makes a transition from its current state s to a succeeding state $s'$. The probability for this transition is labeled as $P(s'\left| {s,\alpha } \right. )$;
R is the immediate reward function. When the current state is s and a service $\alpha $ is selected, we get an immediate reward $r= R(s,a)$ from the environment after executing the action.

The immediate reward from environment can be calculated by the aggregated QoS value [12]. Att represents the attribute of a service, and w is the weighting factor of Att.

$$\begin{aligned} R(s)=\sum w_i \times \frac{Att^{s}_{i} - Att^{min}_{i}}{Att^{max}_{i} - Att^{min}_{i}} \end{aligned}$$

(3)

5 Service Composition Based on DRL

5.1 RNN in Deep Reinforcement Learning

The purpose of the neural network is mainly to generalize state-action pairs and the corresponding Q-value. Figure 3 depicts the basic RNN structure in ADQRCN, where the input layer consists of state and action information collection. The input is passed through a hidden layer composed of 30 Long Short-Term Memory (LSTM) units and a full connection layer. Finally, the Q value is generated by the output layer.

5.2 Learning Strategies

With regard the training of ADQRCN, we adopt a similar method as in [7, 8]. The neural network of ADQRCN simulates the Q function, given by formula (4) which means the neural network $f(s,a;\theta )$ is used to predict the Q-value and $\theta $ are the parameters of neural network. Bellman Equation (5) is used to calculate variance (6). Then, gradient descent (7) is used to update the network parameters.

$$\begin{aligned} f(s,a;\theta )&\approx {{Q}}(s,a;\theta ) \end{aligned}$$

(4)

$$\begin{aligned} Q(s,a)&=r+\gamma max{}_{{{a}^{'}}}Q({{s}^{'}},{{a}^{'}};\theta ) \end{aligned}$$

(5)

$$\begin{aligned} L&=E[{{(r+\gamma ma{{x}_{a'}}Q(s',a';\theta )-Q(s,a;\theta ))}^{2}}] \end{aligned}$$

(6)

$$\begin{aligned} \frac{\partial L(\theta )}{\partial \theta }&=E[(r+\gamma ma{{x}_{a'}}Q(s',a';\theta )-Q(s,a;\theta ))\frac{\partial Q(s,a;\theta )}{\partial \theta }] \end{aligned}$$

(7)

5.3 Algorithm

Algorithm 1 describes the detailed process of training of ADQRCN. At first, the empty dataset of the recurrent neural network is initialized with capacity N. The action-value function Q and the target action-value function $\hat{Q}$ are both implemented by the recurrent neural network with random weights. In the training process, an agent selects an action according to the Q value function and executes the action. After obtaining the reward $r_t$ and next state $s_{t+1}$, the transition $\left( {{s}_{t}},{{a}_{t}},{{r}_{t}},{{s}_{t+1}} \right) $ will be stored in the replay memory D. Then, the adjusting process will begin, according to the method in Sect. 5.2, which can improve prediction accuracy of Q value function. The algorithm will repeat the above process until convergence (the service composition result remains the same over two iterations) and output the final service composition result.

6 Experiments and Analysis

We conduct the experiments to assess the proposed Adaptive Deep Reinforcement Learning algorithm (ADQRCN) on three aspects: effectiveness, adaptability and scalability. And the traditional Q-Learning Service Composition Network (QCN) [12] which use the Q-learning and MDP to obtain the optimal service composition is implemented to be compared with our method.

6.1 Experiment Setting

In the experiment, we mainly consider four QoS attributes, including ResponseTime, Throughput, Availability, and Reliability. The experimental data comes from QWS Dataset^{Footnote 1}. Considering that the scale of QWS Dataset is small, we randomly expand the dataset to simulate a large-scale scenario, which will allow us to verify the advantage of our methods. In the evaluation of result, we use the discounted cumulative reward mentioned in formula (2) to represent the performance of composition scheme.

The experiment environment is based on a Window7 (64bit) system, running on an Intel i7-6700K 4.00GHz CPU with 16GB RAM.

6.2 Result Analysis

6.2.1 Validation of Effectiveness

The experiments are conducted with 100 state nodes (abstract services) and each state node corresponding to 500 candidate services. Therefore, the total number of possible service composition schema is ${{500}^{100}}$, which qualifies for a large-scale scenario.

As shown in Fig. 4(a), the ADQRCN is better than the QCN and the ADQRCN converges more rapidly than the QCN. Thus, this experiment also demonstrates the efficiency of ADQRCN. Due to the QCN is based on the table storage with random exploration, its performance is worse than ADQRCN on basis of generalization expression.

6.2.2 Validation of Scalability

In this series of experiments, the number of state nodes are fixed at 100, and candidate services is set as 700 in Fig. 4(b) to compare with the experiment with 500 candidate services in Fig. 4(a). From the figure the convergence rate of ADQRCN significantly outperforms QCN. Because ADQRCN adopts the neural network as the generalization value function, the method maintains strong ability of generalization and the ability to quickly achieve convergence.

6.2.3 Validation of Adaptability

In the experiment, to simulate a changeable environment, we change 1%, 5% and 10% QoS values of services in period of fixed time (between 2000th episode- 2500th episode). The result of three groups of experiments are shown in Fig. 4(c). The fluctuation of services has certain influence on the learning performance, but these effects are temporary. From an overall perspective ADQRCN has stronger adaptability when facing the fluctuations, which may be related to the forecast model.

7 Conclusion

The paper proposes an adaptive deep reinforcement learning framework to ensure the adaptability and efficiency in large-scale service composition. The adaptive deep reinforcement learning framework uses recurrent neural network simulation of reinforcement learning function and effective information storage to improve the ability to scale to a large and dynamic service environment. The main innovation of this paper include the following:

We propose the MDP-WSC model, which is closer to the real service composition problem and suitable for the large-scale scenario.
In view of the limitation of reinforcement learning, we integrate the perception of deep learning with reinforcement learning to solve large-scale service composition problem.

Notes

1.
http://www.uoguelph.ca/~qmahmoud/qws/.

References

Canfora, G., Di Penta, M., Esposito, R., Villani, M.L.: A framework for QOS-aware binding and re-binding of composite web services. J. Syst. Softw. 81(10), 1754–1769 (2008)
Article Google Scholar
Constantinescu, I., Faltings, B., Binder, W.: Large scale, type-compatible service composition. In: Proceedings of the IEEE International Conference on Web Services (ICWS), pp. 506–513. IEEE (2004)
Google Scholar
Goldberg, D.E., Holland, J.H.: Genetic algorithms and machine learning. Mach. Learn. 3(2), 95–99 (1988)
Article Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Google Scholar
Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2010)
Google Scholar
Li, W., Badr, Y., Biennier, F.: Service farming: an ad-hoc and QOS-aware web service composition approach. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, pp. 750–756. ACM (2013)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning. arXiv preprint (2013). arXiv:1312.5602
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Oh, S.C., Lee, D., Kumara, S.R.: Effective web service composition in diverse and large-scale service networks. IEEE Trans. Serv. Comput. (TSC) 1(1), 15–32 (2008)
Article Google Scholar
Trummer, I., Faltings, B.: Optimizing the tradeoff between discovery, composition, and execution cost in service composition. In: Proceedings of the IEEE International Conference on Web Services (ICWS), pp. 476–483. IEEE (2011)
Google Scholar
Wang, H., Chen, X., Wu, Q., Yu, Q., Zheng, Z., Bouguettaya, A.: Integrating on-policy reinforcement learning with multi-agent techniques for adaptive service composition. In: Franch, X., Ghose, A.K., Lewis, G.A., Bhiri, S. (eds.) ICSOC 2014. LNCS, vol. 8831, pp. 154–168. Springer, Heidelberg (2014). doi:10.1007/978-3-662-45391-9_11
Chapter Google Scholar
Wang, H., Zhou, X., Zhou, X., Liu, W., Li, W., Bouguettaya, A.: Adaptive service composition based on reinforcement learning. In: Maglio, P.P., Weske, M., Yang, J., Fantinato, M. (eds.) ICSOC 2010. LNCS, vol. 6470, pp. 92–107. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17358-5_7
Chapter Google Scholar
Yan, Y., Poizat, P., Zhao, L.: Repairing service compositions in a changing world. In: Lee, R., Ormandjieva, O., Abran, A., Constantinides, C. (eds.) Software Engineering Research, Management and Applications 2010. SCI, vol. 296, pp. 17–36. Springer, Heidelberg (2010)
Chapter Google Scholar

Download references

Acknowledgments

This work was partially supported by NSFC Projects(Nos. 61672152, 61232007, 61532013), Collaborative Innovation Centers of Novel Software Technology and Industrialization and Wireless Communications Technology.

Author information

Authors and Affiliations

Key Laboratory of Computer Network and Information Integration and School of Computer Science and Engineering, Southeast University, Nanjing, China
Hongbing Wang, Mingzhu Gu, Huanhuan Fei, Jiajie Li & Yong Tao
College of Computing and Information Sciences, Rochester Institute of Technology, Rochester, USA
Qi Yu

Authors

Hongbing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mingzhu Gu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Yu
View author publications
You can also search for this author in PubMed Google Scholar
Huanhuan Fei
View author publications
You can also search for this author in PubMed Google Scholar
Jiajie Li
View author publications
You can also search for this author in PubMed Google Scholar
Yong Tao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongbing Wang .

Editor information

Editors and Affiliations

IBM Clouds Lab, San Francisco, California, USA
Michael Maximilien
ETSI Informatica, Universidad Malaga, Malaga, Spain
Antonio Vallecillo
Tsinghua University, Tsinghua, China
Jianmin Wang
Polytechnic University of Catalonia, Barcelona, Spain
Marc Oriol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Gu, M., Yu, Q., Fei, H., Li, J., Tao, Y. (2017). Large-Scale and Adaptive Service Composition Using Deep Reinforcement Learning. In: Maximilien, M., Vallecillo, A., Wang, J., Oriol, M. (eds) Service-Oriented Computing. ICSOC 2017. Lecture Notes in Computer Science(), vol 10601. Springer, Cham. https://doi.org/10.1007/978-3-319-69035-3_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-69035-3_27
Published: 18 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69034-6
Online ISBN: 978-3-319-69035-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics