Elsevier

Computers & Industrial Engineering

Volume 125, November 2018, Pages 604-614
Computers & Industrial Engineering

Real-time scheduling for a smart factory using a reinforcement learning approach

https://doi.org/10.1016/j.cie.2018.03.039Get rights and content

Highlights

  • We proposed an RL-based MDRs selection mechanism for the RTS problem.

  • A two-level SOM is used to determine the system state class.

  • A Q-learning algorithm is used as a reinforcement learning agent.

  • Our approach performs better than a previously proposed MDRs and SDR approach.

Abstract

Previous studies of the real-time scheduling (RTS) problem domain indicate that using a multiple dispatching rules (MDRs) strategy for the various zones in the system can enhance the production performance to a greater extent than using a single dispatching rule (SDR) over a given scheduling interval for all the machines in the shop floor control system. This approach is feasible but the drawback of the previously proposed MDRs method is its inability to respond to changes in the shop floor environment. The RTS knowledge base (KB) is not static, so it would be useful to establish a procedure that maintains the KB incrementally if important changes occur in the manufacturing system. To address this issue, we propose reinforcement learning (RL)-based RTS using the MDRs mechanism by incorporating two main mechanisms: (1) an off-line learning module and (2) a Q-learning-based RL module. According to various performance criteria over a long period, the proposed approach performs better than the previously proposed MDRs method, the machine learning-based RTS using the SDR approach, and heuristic individual dispatching rules.

Introduction

Industry 4.0, also called “Smart Factory,” aims to increase factory productivity and the efficient utilization of resources in real time (Herrmann, Pentek, & Otto, 2016; Wang, Wan, Li, & Zhang, 2016). These objectives are achieved via flexible event-driven reactions to changes in the factory environment, resource allocation, scheduling, optimization, and control in real time. Most of the “Smart Factory” concepts share the attributes of cyber-physical systems (CPS) for monitoring physical processes by creating a virtual copy of the physical world and making decentralized decisions (Lee, Bagheri, & Kao, 2015).

CPS is defined as a transformative technology for managing interconnected systems according to their physical assets and computational capabilities, and recent developments have improved the availability and affordability of sensors, data acquisition systems, and computer networks (Lee, 2008, Wolf, 2009). The competitive nature of current industry is forcing more factories to implement high-tech methods. Thus, the increasing use of sensors, RFID, and networked machines has resulted in the continuous generation of high volume data known as Big Data (Lee et al., 2014, Lee et al., 2013). In this environment, CPS can be developed further to manage Big Data and exploit the interconnectivity among machines to fulfill the goal of producing intelligent, resilient, and self-adaptable machines. Furthermore, by integrating CPS with production, logistics, and services in current industrial practices, it will be possible to transform current factories into Industry 4.0 factories with significant economic potential. This is why it is timely and crucial to consider adaptive scheduling and control (i.e., real-time scheduling; RTS) for dynamic manufacturing environments as key research issues in CPS production management (Goryachev et al., 2013, Kück et al., 2016).

RTS employs different scheduling rules in a dynamic and multi-pass manner in order to select the best scheduling strategy among the feasible alternatives at each decision point over a series of scheduling periods, thereby meeting the shop floor performance criteria (Son, Rodriguez-Rivera, & Wysk, 1999). According to previous studies, RTS involves two main approaches (Priore, Gómez, Pino, & Rosillo, 2014): the multi-pass simulation approach (Ishii and Talavage, 1994, Wu and Wysk, 1989) and machine learning approach (Metan et al., 2010, Olafsson and Li, 2010, Shiue, 2009, Shiue et al., 2012). Multi-pass simulations are used to evaluate candidate scheduling rules and select the best strategy based on simulated information, such as the current system status and the management goals for each scheduling period. However, the multi-pass simulation approach is inappropriate for shop floor control because it requires intensive computational effort to select the best scheduling rule for each scheduling period. In the machine learning approach for RTS, a set of training examples generated by system simulations are used to determine the best scheduling rule for each possible system state. However, the machine learning approach employed for collecting training examples and learning processes in order to acquire an RTS knowledge base (KB) tends to be time consuming and relatively slow. A KB has the advantage of yielding fast and acceptable solutions to allow the system to make decisions in real time, and it can conform to the operational characteristics of a dynamic manufacturing environment (Priore et al., 2014). Previous studies (Shiue, Guh, & Lee, 2012) defined three major machine learning approaches for constructing an RTS system KB: artificial neural networks (ANNs) (Rumelhart, Hinton, & Williams, 1986), decision tree (DT) learning (Quinlan, 1993), and support vector machines (SVMs) (Vapnik, 2000).

According to previous studies, two strategies can be used to determine the scheduling rules in an RTS system: a single dispatching rule (SDR) and multiple dispatching rules (MDRs) for a manufacturing cell. The SDR usually assigns an individual heuristic scheduling rule to all machines in a system during a given scheduling interval (i.e., scheduling period), whereas the MDRs assign different scheduling rules (i.e., scheduling decision variables) to all machines in a system. In the following, we refer to this method as an intelligent multi-controller. Fig. 1 illustrates the role of the RTS MDRs mechanism in a flexible manufacturing system (FMS) case study. For the F1, F2, F3, and load/unload stations, the MDRs method selects the SPT, SRPT, DS, and EDD dispatching rules, respectively, as the scheduling decision variables for job selection in the next scheduling period. Ishii and Talavage (1994) proposed a search algorithm that employs MDRs in bottleneck machines by using predictions based on a multi-pass simulation. Their results showed that the MDRs strategy can improve the performance of an FMS by up to 15.9% compared with the best result obtained using the SDR strategy. However, their approach is not highly suitable an RTS system that uses the machine learning approach.

The classical machine learning approach builds a RTS KB via the MDRs mechanism, and its main disadvantage is that the classes (dispatching decision rules) to which the training examples are assigned must be predefined. For example, for a given set of system attributes, the best dispatching decision rule for each decision variable can be determined after a simulation is run for each dispatching rule. The resulting MDRs are considered as a class. However, this process becomes intolerably time consuming because the rules must be determined for each period (Kim, Min, & Yih, 1998). Furthermore, the local approach, such as using DT learning or SVMs, does not satisfy the global objective function (i.e., the overall production performance of the shop floor). Thus, although the best decision rule can be determined for each scheduling decision variable, the combination of all the decision rules may not simultaneously satisfy the global objective function.

Guh, Shiue, and Tseng (2011) constructed an RTS KB using an MDRs selection mechanism based on a self-organizing map (SOM) neural network (Kohonen, 2001), which can overcome the long training time problem that affects the classical machine learning approach in the training example generation phase. They showed that over a long period, this approach provides better system performance than machine learning-based RTS using the SDR approach and heuristic individual dispatching rules according to various performance criteria. This approach is feasible but the main drawback of this method is its inability to respond to changes in the shop floor environment. However, the RTS KB is not static, so it would be useful to establish a procedure that maintains the KB incrementally if important changes occur in the manufacturing system.

A possible solution may incorporate a reinforcement learning (RL) mechanism that can learn to select appropriate actions for achieving its goals via interactions with the system environment and by responding to receipts for rewards or penalties based on the impact of each action (Stutton & Barto, 1998; Shahrabi, Adibi, & Mahootchi, 2017). Q-learning (Watkins & Dayan, 1992) is a form of model-free RL that provides agents with the capacity to learn to act optimally in Markovian domains by experiencing the consequences of actions, but without requiring them to build maps of the domains. Wang and Usher (2005) found that Q-learning works well for a single machine dispatching rule selection problem when used by a learning agent to select various dispatching rules according to different system criteria, and the results of their study may inspire future applications of RL techniques to the RTS problem.

Based on the studies mentioned above, in order to develop the MDRs mechanism in RTS, the RTS should be capable of updating and maintaining the KB via RL during operations to allow responses to change in the system operating conditions. Hence, using a Q-learning RL agent to refine the RTS KB is an important research issue. In this study, we develop an RL-based RTS using the MDRs mechanism. Implementation results of a study case experiment showed that the production performance has been greatly improved compared with classical SDR.

The remainder of this paper is divided into six sections. In Section 2, we present the theoretical background related to our proposed off-line learning module for determining the system state number for a Q-learning RL agent using the Self-Organizing Map (SOM) algorithm (Kohonen, 2001), and we also introduce an RL module that uses a Q-learning algorithm. In Section 3, we formulate the RTS problem using the MDRs mechanism and state the research objectives. In Section 4, we describe the method for the proposed RL-based RTS using the MDRs mechanism. In Section 5, we present the results of a study case experiment as well as analyzing the proposed RL approach and other approaches. Finally, in Section 6, we give our conclusions as well as providing a summary of this study and some suggestions for future research.

Section snippets

SOM neural networks

SOM networks (Kohonen, 2001) are used widely for data mining because they are a convenient visual tool. Unlike other ANN approaches, the SOM network performs unsupervised learning, i.e., the processing units in the network adjust their weights through lateral feedback connections. The more common approach to ANNs requires supervised learning, i.e., a set of training samples is provided as the input and the output is compared with a known result. Deviations from the correct result lead to

Real-time scheduling (RTS) using the MDRs mechanism

The status of a manufacturing system changes continuously, and previous studies have confirmed that it is possible to improve the system performance by implementing a multi-pass scheduling policy rather than using a single heuristic dispatching rule over an extended planning horizon. The policy is based on the system status at each decision point over a series of short-term scheduling period horizons. RTS based on machine learning has the advantage of rapidly yielding acceptable solutions for

Development of RL-based RTS using the MDRs mechanism

The proposed RL-based RTS using the MDRs mechanism shown in Fig. 3 comprises two major modules. An off-line learning module runs a simulation to generate training examples to determine the system state number (built using a two-level SOM). Next, the system state number is sent to construct the initial state–action pair table for the RL module. During this operation, the Q-learning based agent is assumed to have the ability to perceive information from all the machines on the shop floor, and it

Construction of a simulation model and generation of a training example

To verify the proposed method, a discrete event simulation model was used to generate training examples. The simulation model was built and executed using Tecnomatix Plant Simulation (2006), an object-oriented simulation language, and it was run on a Core i7-4790 3.6 GHz CPU with the Windows 7 operating system.

It was expected that the proposed approach would achieve the desired dynamic dispatching performance. Several parameters were determined based on a preliminary simulation run. The time

Conclusion and future work

In this study, we proposed an RL-based MDRs selection mechanism for constructing an RTS for FMS control. We provide the following conclusions based on the results of this study.

  • The proposed RL-based approach using the MDRs selection mechanism responds efficiently to changes in the shop floor environment and it is suitable for incorporating in the operation of an RTS system for a smart factory.

  • The proposed RL-based approach employs an intelligent and dynamic method for selecting MDRs, which is

References (46)

  • J.H. Blackstone et al.

    A state-of-the-art survey of dispatching rules for manufacturing job shop operations

    International Journal of Production Research

    (1982)
  • C.C. Chen et al.

    Auto-bias selection for learning-based scheduling systems

    International Journal of Production Research

    (1999)
  • F.D. Davies et al.

    A cluster separation measure

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1979)
  • A. Goryachev et al.

    Smart factory: Intelligent system for workshop resource allocation, scheduling, optimization and controlling in real time

    Advanced Materials Research

    (2013)
  • R.S. Guh et al.

    The study of real time scheduling by an intelligent multi-controller approach

    International Journal of Production Research

    (2011)
  • J. Han et al.

    Data mining concepts and techniques

    (2006)
  • Herrmann, M., Pentek, T., & Otto, B. (2016). Design principles for industrie 4.0 scenarios. In: 49th Hawaii...
  • N. Ishii et al.

    A mixed dispatching rule approach in FMS scheduling

    International Journal of Flexible Manufacturing Systems

    (1994)
  • C.O. Kim et al.

    Integration of inductive learning and neural networks for multi-objective FMS scheduling

    International Journal of Production Research

    (1998)
  • T. Kohonen

    Self-organizing map

    (2001)
  • M. Kück et al.

    A data-driven simulation-based optimisation approach for adaptive scheduling and control of dynamic manufacturing systems

    Advanced Materials Research

    (2016)
  • Y. LeCun et al.

    Deep learning

    Nature

    (2015)
  • Lee, E. A. (2008). Cyber physical systems: Design challenges. In: 11th IEEE symposium on object oriented real-time...
  • Cited by (0)

    View full text