Real-time scheduling for a smart factory using a reinforcement learning approach
Introduction
Industry 4.0, also called “Smart Factory,” aims to increase factory productivity and the efficient utilization of resources in real time (Herrmann, Pentek, & Otto, 2016; Wang, Wan, Li, & Zhang, 2016). These objectives are achieved via flexible event-driven reactions to changes in the factory environment, resource allocation, scheduling, optimization, and control in real time. Most of the “Smart Factory” concepts share the attributes of cyber-physical systems (CPS) for monitoring physical processes by creating a virtual copy of the physical world and making decentralized decisions (Lee, Bagheri, & Kao, 2015).
CPS is defined as a transformative technology for managing interconnected systems according to their physical assets and computational capabilities, and recent developments have improved the availability and affordability of sensors, data acquisition systems, and computer networks (Lee, 2008, Wolf, 2009). The competitive nature of current industry is forcing more factories to implement high-tech methods. Thus, the increasing use of sensors, RFID, and networked machines has resulted in the continuous generation of high volume data known as Big Data (Lee et al., 2014, Lee et al., 2013). In this environment, CPS can be developed further to manage Big Data and exploit the interconnectivity among machines to fulfill the goal of producing intelligent, resilient, and self-adaptable machines. Furthermore, by integrating CPS with production, logistics, and services in current industrial practices, it will be possible to transform current factories into Industry 4.0 factories with significant economic potential. This is why it is timely and crucial to consider adaptive scheduling and control (i.e., real-time scheduling; RTS) for dynamic manufacturing environments as key research issues in CPS production management (Goryachev et al., 2013, Kück et al., 2016).
RTS employs different scheduling rules in a dynamic and multi-pass manner in order to select the best scheduling strategy among the feasible alternatives at each decision point over a series of scheduling periods, thereby meeting the shop floor performance criteria (Son, Rodriguez-Rivera, & Wysk, 1999). According to previous studies, RTS involves two main approaches (Priore, Gómez, Pino, & Rosillo, 2014): the multi-pass simulation approach (Ishii and Talavage, 1994, Wu and Wysk, 1989) and machine learning approach (Metan et al., 2010, Olafsson and Li, 2010, Shiue, 2009, Shiue et al., 2012). Multi-pass simulations are used to evaluate candidate scheduling rules and select the best strategy based on simulated information, such as the current system status and the management goals for each scheduling period. However, the multi-pass simulation approach is inappropriate for shop floor control because it requires intensive computational effort to select the best scheduling rule for each scheduling period. In the machine learning approach for RTS, a set of training examples generated by system simulations are used to determine the best scheduling rule for each possible system state. However, the machine learning approach employed for collecting training examples and learning processes in order to acquire an RTS knowledge base (KB) tends to be time consuming and relatively slow. A KB has the advantage of yielding fast and acceptable solutions to allow the system to make decisions in real time, and it can conform to the operational characteristics of a dynamic manufacturing environment (Priore et al., 2014). Previous studies (Shiue, Guh, & Lee, 2012) defined three major machine learning approaches for constructing an RTS system KB: artificial neural networks (ANNs) (Rumelhart, Hinton, & Williams, 1986), decision tree (DT) learning (Quinlan, 1993), and support vector machines (SVMs) (Vapnik, 2000).
According to previous studies, two strategies can be used to determine the scheduling rules in an RTS system: a single dispatching rule (SDR) and multiple dispatching rules (MDRs) for a manufacturing cell. The SDR usually assigns an individual heuristic scheduling rule to all machines in a system during a given scheduling interval (i.e., scheduling period), whereas the MDRs assign different scheduling rules (i.e., scheduling decision variables) to all machines in a system. In the following, we refer to this method as an intelligent multi-controller. Fig. 1 illustrates the role of the RTS MDRs mechanism in a flexible manufacturing system (FMS) case study. For the F1, F2, F3, and load/unload stations, the MDRs method selects the SPT, SRPT, DS, and EDD dispatching rules, respectively, as the scheduling decision variables for job selection in the next scheduling period. Ishii and Talavage (1994) proposed a search algorithm that employs MDRs in bottleneck machines by using predictions based on a multi-pass simulation. Their results showed that the MDRs strategy can improve the performance of an FMS by up to 15.9% compared with the best result obtained using the SDR strategy. However, their approach is not highly suitable an RTS system that uses the machine learning approach.
The classical machine learning approach builds a RTS KB via the MDRs mechanism, and its main disadvantage is that the classes (dispatching decision rules) to which the training examples are assigned must be predefined. For example, for a given set of system attributes, the best dispatching decision rule for each decision variable can be determined after a simulation is run for each dispatching rule. The resulting MDRs are considered as a class. However, this process becomes intolerably time consuming because the rules must be determined for each period (Kim, Min, & Yih, 1998). Furthermore, the local approach, such as using DT learning or SVMs, does not satisfy the global objective function (i.e., the overall production performance of the shop floor). Thus, although the best decision rule can be determined for each scheduling decision variable, the combination of all the decision rules may not simultaneously satisfy the global objective function.
Guh, Shiue, and Tseng (2011) constructed an RTS KB using an MDRs selection mechanism based on a self-organizing map (SOM) neural network (Kohonen, 2001), which can overcome the long training time problem that affects the classical machine learning approach in the training example generation phase. They showed that over a long period, this approach provides better system performance than machine learning-based RTS using the SDR approach and heuristic individual dispatching rules according to various performance criteria. This approach is feasible but the main drawback of this method is its inability to respond to changes in the shop floor environment. However, the RTS KB is not static, so it would be useful to establish a procedure that maintains the KB incrementally if important changes occur in the manufacturing system.
A possible solution may incorporate a reinforcement learning (RL) mechanism that can learn to select appropriate actions for achieving its goals via interactions with the system environment and by responding to receipts for rewards or penalties based on the impact of each action (Stutton & Barto, 1998; Shahrabi, Adibi, & Mahootchi, 2017). Q-learning (Watkins & Dayan, 1992) is a form of model-free RL that provides agents with the capacity to learn to act optimally in Markovian domains by experiencing the consequences of actions, but without requiring them to build maps of the domains. Wang and Usher (2005) found that Q-learning works well for a single machine dispatching rule selection problem when used by a learning agent to select various dispatching rules according to different system criteria, and the results of their study may inspire future applications of RL techniques to the RTS problem.
Based on the studies mentioned above, in order to develop the MDRs mechanism in RTS, the RTS should be capable of updating and maintaining the KB via RL during operations to allow responses to change in the system operating conditions. Hence, using a Q-learning RL agent to refine the RTS KB is an important research issue. In this study, we develop an RL-based RTS using the MDRs mechanism. Implementation results of a study case experiment showed that the production performance has been greatly improved compared with classical SDR.
The remainder of this paper is divided into six sections. In Section 2, we present the theoretical background related to our proposed off-line learning module for determining the system state number for a Q-learning RL agent using the Self-Organizing Map (SOM) algorithm (Kohonen, 2001), and we also introduce an RL module that uses a Q-learning algorithm. In Section 3, we formulate the RTS problem using the MDRs mechanism and state the research objectives. In Section 4, we describe the method for the proposed RL-based RTS using the MDRs mechanism. In Section 5, we present the results of a study case experiment as well as analyzing the proposed RL approach and other approaches. Finally, in Section 6, we give our conclusions as well as providing a summary of this study and some suggestions for future research.
Section snippets
SOM neural networks
SOM networks (Kohonen, 2001) are used widely for data mining because they are a convenient visual tool. Unlike other ANN approaches, the SOM network performs unsupervised learning, i.e., the processing units in the network adjust their weights through lateral feedback connections. The more common approach to ANNs requires supervised learning, i.e., a set of training samples is provided as the input and the output is compared with a known result. Deviations from the correct result lead to
Real-time scheduling (RTS) using the MDRs mechanism
The status of a manufacturing system changes continuously, and previous studies have confirmed that it is possible to improve the system performance by implementing a multi-pass scheduling policy rather than using a single heuristic dispatching rule over an extended planning horizon. The policy is based on the system status at each decision point over a series of short-term scheduling period horizons. RTS based on machine learning has the advantage of rapidly yielding acceptable solutions for
Development of RL-based RTS using the MDRs mechanism
The proposed RL-based RTS using the MDRs mechanism shown in Fig. 3 comprises two major modules. An off-line learning module runs a simulation to generate training examples to determine the system state number (built using a two-level SOM). Next, the system state number is sent to construct the initial state–action pair table for the RL module. During this operation, the Q-learning based agent is assumed to have the ability to perceive information from all the machines on the shop floor, and it
Construction of a simulation model and generation of a training example
To verify the proposed method, a discrete event simulation model was used to generate training examples. The simulation model was built and executed using Tecnomatix Plant Simulation (2006), an object-oriented simulation language, and it was run on a Core i7-4790 3.6 GHz CPU with the Windows 7 operating system.
It was expected that the proposed approach would achieve the desired dynamic dispatching performance. Several parameters were determined based on a preliminary simulation run. The time
Conclusion and future work
In this study, we proposed an RL-based MDRs selection mechanism for constructing an RTS for FMS control. We provide the following conclusions based on the results of this study.
- ●
The proposed RL-based approach using the MDRs selection mechanism responds efficiently to changes in the shop floor environment and it is suitable for incorporating in the operation of an RTS system for a smart factory.
- ●
The proposed RL-based approach employs an intelligent and dynamic method for selecting MDRs, which is
References (46)
- et al.
A cyber-physical systems architecture for industry 4.0-based manufacturing systems
Manufacturing Letters
(2015) - et al.
Service innovation and smart analytics for industry 4.0 and big data environment
Procedia CIRP
(2014) - et al.
Recent advances and trends in predictive manufacturing systems in big data environment
Manufacturing Letters
(2013) - et al.
Learning effective new single machine dispatching rules from optimal scheduling data
International Journal of Production Economics
(2010) - et al.
A reinforcement learning approach to parameter estimation in dynamic job shop scheduling
Computers & Industrial Engineering
(2017) - et al.
Study on shop floor control system in semiconductor fabrication by self-organizing map-based intelligent multi-controller
Computer & Industrial Engineering
(2012) - et al.
Application of reinforcement learning for agent-based production scheduling
Engineering Applications of Artificial Intelligence
(2005) - et al.
Operating an FMC by a decision-tree-based adaptive production control system
International Journal of Production Research
(2000) Sequencing rules and due-date assignments in a job shop
Management Science
(1984)Learning deep architectures for AI
Foundations and Trends® in Machine Learning
(2009)