A reinforcement learning based RMOEA/D for bi-objective fuzzy flexible job shop scheduling

https://doi.org/10.1016/j.eswa.2022.117380Get rights and content

Highlights

  • The bi-objective FFJSP with two objectives is considered.

  • An adaptive MOEA/D with VNS is proposed.

  • The results indicate the superior performance of our approach.

Abstract

The flexible job shop scheduling problem (FJSP) is significant for realistic manufacturing. However, the job processing time usually is uncertain and changeable during manufacturing. This paper presents a multi-objective FJSP with fuzzy processing time (MOFFJSP) for optimizing the makespan and total machine workload as objectives. To solve the MOFFJSP, a MOEA/D based on reinforcement learning named RMOEA/D is proposed. RMOEA/D can be featured as: (i) an initial strategy with three rules is used to get a high-quality initial population; (ii) a parameter adaption strategy based on Q-learning is proposed to guide the population choose the best parameter to increase diversity; (iii) a variable neighborhood search based on reinforcement learning is designed to lead the solution to choose the right local search method; and (iv) an elite archive is used to improve the usage rate of the abandoned historical solution. RMOEA/D is compared with five well-known realted methods, i.e., MOEA/D, NSGA-II, MOEA/D-M2M, NSGA-III and IAIS on three benchmark suites. The results show that RMOEA/D outperforms these five state-of-art algorithms.

Introduction

With the development of economic globalization, traditional flexible manufacturing met a quite huge challenge. And it is difficult to satisfy the requirement of the market (Lang et al., 2021, Rifai et al., 2021);. Flexible job shop scheduling problem (FJSP) is a classical scheduling problem has been intensively studied over the past decades. Many heuristic algorithms have been proposed for FJSP such as genetic algorithm (GA) (Yuan et al., 2020), artificial bee colony algorithm (ABC) (Li, Huang, et al., 2020), two-phase meta-heuristic (Lei et al., 2019), Jaya (Caldeira & Gnanavelbabu, 2021), and teaching–learning-based optimization (TLBO) (Lei et al., 2018). However, fixed processing time is too idealized to simulate the practical manufacturing. In practical flexible manufacturing, the processing time is uncontrollable and it floats between an interval (Pan et al., 2021, Zhu and Zhou, 2021). So it is necessary to fuzzy the processing time for FJSP. FJSP with fuzzy processing time (FFJSP) is an extension of FJSP. FJSP has been proved as an NP-hard problem and FFJSP is also an NP-hard problem (Pavlov et al., 2019). Moreover, it is worth for studying how to efficiently solve FFJSP.

Furthermore, with the development of intelligent manufacturing and industrial 4.0, many industrial start to consider the low energy consumption manufacturing (Lu et al., 2021). Many energy-aware scheduling models have been proposed including distributed hybrid flow shop (Shao et al., 2021b),distributed permutation flow shop (Lu et al., 2022), distributed flow shop with heterogeneous factory (Lu et al., 2021), distributed reentrant permutation flow shop scheduling (Rifai et al., 2021), distributed heterogeneous welding flow shop (Wang et al., 2021). However the energy aware FFJSP is seldom considered. The energy consumption is related to machine workload, idle time, and turn on/off times (Meng et al., 2020, Wang and Wang, 2021). And the main effect part of energy consumption is machine workload. Thus, multi-objective FFJSP (MOFFJSP) with minimizing makespan and total workload is significantly considered.

Inspired by scalar objective optimization problem, Zhang and Li (2007) proposed multi-objective evolutionary algorithm based on decomposition (MOEA/D) for multi-objective optimization problem (MoP). Based on reference vectors and the Tchebicheff function, MOEA/D can get good convergence and perform well diversity simultaneously. Recently, reinforcement learning (RL) is truing into a hot topic due to its strong ability of decision and optimization. Cooperation between the RL and evolutionary algorithms is a promising direction to solve complex optimization problems (Gong et al., 2021, Shiue et al., 2018, Zhao et al., 2019). Facing the complex optimization problems, it is expected to achieve superior performance via the synergy between RL and evolutionary computation as well as the problem-specific operators within the MOEA/D framework.

The MOFFJSP has been extensively studied for several years. However, in the existing literature, most methods execute local search strategies by poll pattern, which is inefficient and blind. Moreover, the performance of the algorithm is limited to parameter selection. A lot of black-box testing is performed to find the best parameter. Parameter selection problem lacks prior knowledge and it is time-consuming. To optimize makespan and total machine workload simultaneously, it is challenging and significant to develop effective algorithms for the MOFFJSP. Motivated by the above problems, this paper proposed an RL-based MOEA/D (RMOEA/D), which contains the following innovations: First in order to guide each solution adaptively to select the best local search strategy, a variable neighborhood search based on RL (RVNS) is proposed. Second, to make MOEA/D automatically adjust the parameter T, a parameter selection strategy based on Q-learning (Q-PAS) is designed. Next, an initial method integrating a variety of initial strategies is designed to get a high convergence and diverse population. Then, a discrete crossover and mutation method is used to obtain a large searching step. Moreover, an elite archive is applied to improve the utilization rate of abandoned solutions. Finally, to verify the performance of RMOEA/D, extensive numerical tests are carried out and the comparative results demonstrate the effectiveness of the above designs and the superiority of the proposed algorithm in solving the MOFFJSP.

The main contributions of this paper go in five directions.

  • (1)

    An initial strategy which combines the advantage of the three strategies is designed to provide an initial population with high convergence and diversity.

  • (2)

    A parameter selection strategy based on Q-learning is proposed to let MOEA/D automatically select the best T to improve the diversity of PF.

  • (3)

    A variable neighborhood search method based on RL is proposed to efficiently execute several local search strategies.

  • (4)

    An elite archive is designed to collect the historical elite solution to increase the utilization rate of abandoned solutions.

  • (5)

    The performance of RMOEA/D is executed on 23 FFJSP instances with different features. Experimental results show that RMOEA/D is superior to state-of-art algorithms under the condition of faster convergence.

The rest of this paper is organized as follows. Section 2 illustrates the literature review and some basic concept of triangular fuzzy processing time. In Section 3, the problem statement and modeling are introduced. In Section 4, the details of our approach RMOEA/D are reported. Numerical test experiments on RMOEA/D and discussion are shown in Section 5 and conclusions are summarized in Section 6.

Section snippets

Related work and background knowledge

In this section the literature of FFJSP, MOEA/D and RL applied to scheduling are briefly reviewed.

Problem statement

A flexible job shop scheduling problem with fuzzy processing time from a real-world manufacturing process can be described as follow. J={J1,J2,,Ji ,,Jn} is the job set and M={M1,M2,,Mk,,Mm} is the machine set. Each job Ji has a set of Θi operations, Oi={Oi,1,Oi,2,,Oi,j, ,Oi,Θi}, Oi,jOi. Each operation can be processed on part of machines or all machines. And the processing time is a TFN P̃i,j,k=(p1,p2,p3). FFJSP includes two subproblems, machine assignment and operation sequencing. The

Framework of RMOEA/D

In this section, framework of the proposed algorithm MOEA/D based on RL is introduced, which is stated as following: first, the parameters and population are initialized. Then, perform variable neighborhood search for each solution to improve they exploitation. Next, apply Q-learning to select a parameter T for MOEA/D. Moreover, execute MOEA/D to generate new solution by using T. Finally, according to the change of PF’s convergence and diversity, Q-table will be updated. And the elite archive

Experimental results

In Section 4, the RMOEA/D algorithm has been described in details. In this section, we design detailed experiments to evaluate RMOEA/D’s performance. RMOEA/D and the comparison algorithms are coded in MATLAB on an Intel Core i7 6700 CPU @ 3.4 GHz with 8G RAM. For fairness, all algorithm runs 30 independent times on each instance. Noting that, to verify the convergence and diversity of the proposed algorithm, after 30 independent runs, the average results are collected for performance comparison.

Conclusion

This paper proposed a RMOEA/D that combined two RL techniques to solve multi-objective fuzzy flexible job shop scheduling problems. The objective is to minimize the fuzzy makespan and total workload. As a classical adaption technique, RL can guide the algorithm to automatically select the best parameter or local search strategy. A novel Q-learning parameter adaption method including state definition, action definition, and reward definition is designed to help multi-objective optimization

CRediT authorship contribution statement

Rui Li: Resources, Project administration, Software, Data curation, Writing – original draft, Writing – review & editing. Wenyin Gong: Funding acquisition, Supervision, Conceptualization, Methodology, Writing – review & editing. Chao Lu: Methodology, Writing – review & editing.

Acknowledgments

This work was partly supported by the National Natural Science Fund of China under Grant Nos. 62076225 and 62073300, and the Natural Science Fund for Distinguished Young Scholars of Hubei, China under Grant No. 2019CFA081. All authors approved the version of the manuscript to be published.

References (60)

  • LinJ.

    A hybrid biogeography-based optimization for the fuzzy flexible job-shop scheduling problem

    Knowledge-Based Systems

    (2015)
  • LinJ.

    Backtracking search based hyper-heuristic for the flexible job-shop scheduling problem with fuzzy processing time

    Engineering Applications of Artificial Intelligence

    (2019)
  • LuC. et al.

    A Pareto-based collaborative multi-objective optimization algorithm for energy-efficient scheduling of distributed permutation flow-shop with limited buffers

    Robotics and Computer-Integrated Manufacturing

    (2022)
  • LuoS.

    Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning

    Applied Soft Computing

    (2020)
  • PalombariniJ.A. et al.

    Closed-loop rescheduling using deep reinforcement learning

    IFAC-PapersOnLine

    (2019)
  • RifaiA.P. et al.

    Multi-objective distributed reentrant permutation flow shop scheduling with sequence-dependent setup time

    Expert Systems with Applications

    (2021)
  • SakawaM. et al.

    Fuzzy programming for multiobjective job shop scheduling with fuzzy processing time and fuzzy duedate through genetic algorithms

    European Journal of Operational Research

    (2000)
  • ShahrabiJ. et al.

    A reinforcement learning approach to parameter estimation in dynamic job shop scheduling

    Computers & Industrial Engineering

    (2017)
  • ShaoW. et al.

    Multi-objective evolutionary algorithm based on multiple neighborhoods local search for multi-objective distributed hybrid flow shop scheduling problem

    Expert Systems with Applications

    (2021)
  • ShiueY.-R. et al.

    Real-time scheduling for a smart factory using a reinforcement learning approach

    Computers & Industrial Engineering

    (2018)
  • WangG. et al.

    Energy-efficient distributed heterogeneous welding flow shop scheduling problem using a modified MOEA/D

    Swarm and Evolutionary Computation

    (2021)
  • WangJ.-j. et al.

    A cooperative memetic algorithm with learning-based agent for energy-aware distributed hybrid flow-shop scheduling

    IEEE Transactions on Evolutionary Computation

    (2021)
  • XuY. et al.

    An effective teaching-learning-based optimization algorithm for the flexible job-shop scheduling problem with fuzzy processing time

    Neurocomputing

    (2015)
  • YuanS. et al.

    A co-evolutionary genetic algorithm for the two-machine flow shop group scheduling problem with job-related blocking and transportation times

    Expert Systems with Applications

    (2020)
  • ZhangJ. et al.

    MOEA/D with many-stage dynamical resource allocation strategy to solution of many-objective OPF problems

    International Journal of Electrical Power & Energy Systems

    (2020)
  • ZhuZ. et al.

    A multi-objective multi-micro-swarm leadership hierarchy-based optimizer for uncertain flexible job shop scheduling problem with job precedence constraints

    Expert Systems with Applications

    (2021)
  • BrandimarteP.

    Routing and scheduling in a flexible job shop by tabu search

    Annals of Operations Research

    (1993)
  • DebK. et al.

    An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, Part I: Solving problems with box constraints

    IEEE Transactions on Evolutionary Computation

    (2014)
  • DebK. et al.

    A fast and elitist multiobjective genetic algorithm: NSGA-II

    IEEE Transactions on Evolutionary Computation

    (2002)
  • GaoK.Z. et al.

    An effective discrete harmony search algorithm for flexible job shop scheduling problem with fuzzy processing time

    International Journal of Productions Research

    (2015)
  • Cited by (66)

    View all citing articles on Scopus
    View full text