Multistage MR-CART: Multiresponse optimization in a multistage process using a classification and regression tree method

https://doi.org/10.1016/j.cie.2021.107513Get rights and content

Abstract

A multistage process consists of sequential consecutive stages. In this process, each stage has multiple responses and is affected by its preceding stage, while at the same time, affecting the following stage. This complex structure makes it difficult to optimize the multistage process. Recently, it became easy to obtain a large amount of operational data from the multistage process due to development of information technologies. The proposed method employs a data mining method called a classification and regression tree for analyzing the data and desirability functions for simultaneously optimizing the multiresponse. To consider the relationship between stages, a backward optimization procedure which treats the multiresponse of the preceding stage as the input variables is proposed. The proposed method is described using a steel manufacturing process example and is compared with existing multiresponse optimization methods. The case study shows that the proposed method works well and outperforms the existing methods.

Introduction

A multistage process that consists of a series of sequential stages is a common structure in manufacturing lines. Most manufacturing industries require several stages to complete their final products, such as semiconductor, printed circuit board, chemical and telecommunication manufacturing processes (Pan, Li, & Wu, 2016). Fig. 1 describes a representative multistage manufacturing process consisting of K stages. The rectangles represent stages, and each stage is followed by its inspection stage shown as a circle. The raw materials enter Stage 1 and become a final product through the K stages. xk and yk in Fig. 1 are vectors of Ik input variables and Jk response variables, respectively, at the kth stage.

In the multistage process, each stage has multiple responses and is affected by its preceding stage, while at the same time, affecting the following stage. Several methods have been proposed to solve the multiresponse problem in the multistage process. Mukherjee and Ray (2008) employed desirability functions for optimizing multiple responses in a two-stage process. In this approach, empirical models for the multiple responses are fitted, and desirability functions are constructed by using the empirical models. Then, the optimal condition for the input variables is obtained by maximizing the desirability functions. In order to search the optimal condition, several metaheuristics such as genetic algorithm, simulated annealing, Tabu search were employed. Later, Bera and Mukherjee (2016) extended the scope of Mukherjee and Ray (2008)’s method from the two-stage process to the multistage process. Hejazi, Seyyed-Esfahani, and Mahootchi (2015) suggested a mathematical programming method and a metaheuristic algorithm using iterative seemingly unrelated regression for optimizing the multiple responses in the multistage process. Recently, Yin, He, Niu, and Li (2018) suggested a method for optimizing coal preparation production system, which is a particular multistage process. In this method, a forward iterative modeling method based on support vector regression was presented to consider the interdependency between neighboring stages. In addition, a goal-oriented and backward iterative optimization approach based on genetic algorithm was proposed to determine the globally optimal operating conditions of coal preparation system.

The above methods commonly build empirical models for the multiple responses and obtains the optimal setting for the input variables based on the empirical models. Although these methods are attractive approaches, they have a difficulty in that they require a large number of experiments for building the empirical models. In the multistage process, there are various relationships between stages and relationships between the input and responses that should be investigated for the optimization. A large number of experiments must be conducted to build empirical models that explain these relationships, which requires large amounts of resources (time, material, machine, etc.).

Alternatively, process operational data gathered from manufacturing lines can be used instead of conducting a large number of experiments. Recently, many manufacturing companies have been able to obtain a large volume and variety of operational data from the manufacturing lines due to network sensors and IoT (Internet of Things). This large and variety of operational data may contain meaningful information. Using data mining methods can be attractive when dealing with a large volume and variety of operational data. Classification and regression tree (CART) and patient rule induction method (PRIM) are representative data mining methods applicable to the process optimization. Recently, Lee, Kim, Kim, Kim, and Zhen (2021) suggested applying CART for optimizing multiple responses in a single stage process. However, none of the methods have employed CART to optimize the multistage process. The proposed method extends the scope of CART-based optimization from single stage to multistage by considering the relationship between stages. For this purpose, a backward sequential optimization procedure suggested. In this procedure, optimization is sequentially conducted from the last stage to the first stage. Additionally, the proposed method employs a desirability function method (Derringer & Suich, 1980) as the objective function of CART for simultaneous optimizing the multiple responses. The proposed method obtains the subregions in the input variables space where high desirability function value is obtained for each stage.

The rest of the paper is organized as follows. Section 2.1 provides a reviews CART which is employed in the proposed method and compares it with PRIM. Section 2.2 reviews desirability function method which is also employed in the proposed method. The proposed method is presented in Section 3 and is illustrated with a case study in Section 4. Finally, a discussion and concluding remarks are given in Section 5.

Section snippets

Review of CART and PRIM

In this section, we review CART and PRIM, which are applicable to process optimization. CART was first introduced by Breimen et al. (1984). It is a binary recursive partitioning procedure that finds the subregion in the input variable space where the performance of the response is considerably better. When the response variable is nominal (continuous), it becomes a classification (regression) tree. CART has the advantage of being able to process various types of data (Lee, Jeong, & Lee, 2016).

Proposed multistage MR-CART

In this section, the proposed multistage MR-CART is presented. As mentioned in Section 1, it is important to obtain reliable response surface models in MRSO because the optimal solution is obtained by analyzing the response surface models. Nevertheless, it is not easy to obtain reliable response surface models, especially when dealing with the large amount of data. This is because not only the form of functional relationships between input variables and responses might not be clear, but also

Step 1. Prepare the data

We have a total of 5609 observations denoted by {xn,k,i,yn,k,j for n=1,2,,5609; k=1,2;i=1,2,,Ik;j=1,2,,Jk}. The numbers of input variables of Stage 1 and 2, denoted by I1 and I2, are 13 and 11, respectively. The numbers of response variables of Stage 1 and 2, denoted by J1 and J2, are two for each, as shown in Fig. 5. Thus, every observation includes 28 values (i.e., 28 = 13 + 11 + 2 + 2).

Step 2. Split the data

The entire 5609 observations are randomly divided into training and test datasets at a ratio of 4:1 to

Concluding remarks

In this paper, we proposed a systematic procedure for optimizing the multiple responses in the multistage manufacturing process using CART. In the multistage process, the performance of each stage needs to be considered in the context of the relationship between the stages since each stage is influenced by its preceding stage, and it also affects the stage that follows. We consider this property by modifying the CART algorithm and employing the desirability function method, which optimizes

CRediT authorship contribution statement

Dong-Hee Lee: Conceptualization, Methodology, Writing – original draft. So-Hee Kim: Software. Kwang-Jae Kim: Supervision.

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07049412). Also, this work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2019R1A2C1007834).

References (29)

  • I.G. Chong et al.

    A data mining approach to process optimization without an explicit quality function

    IIE Transactions

    (2007)
  • G. Derringer

    A balancing act: optimizing a product’s properties

    Quality Progress

    (1994)
  • G. Derringer et al.

    Simultaneous optimization of several response variables

    Journal of Quality Technology

    (1980)
  • A. Emrouznejad et al.

    Data envelopment analysis with classification and regression tree – a case of banking efficiency

    Expert Systems

    (2010)
  • Cited by (9)

    • Influence of sample attributes on generalization performance of machine learning models for windage alteration fault diagnosis of the mine ventilation system

      2023, Expert Systems with Applications
      Citation Excerpt :

      Similarly, taking e6 as an example, the distribution of fault volume value is drawn, as shown in Fig. 13. CART is a widely used machine learning algorithm (Lee, Kim, & Kim, 2021). This section constructs a WAFs diagnosis model based on CART.

    • A convex two-dimensional variable selection method for the root-cause diagnostics of product defects

      2023, Reliability Engineering and System Safety
      Citation Excerpt :

      Fault/defect diagnostic is an important component of system Prognostic and Health Management, which provides the prerequisites for fault tolerance, reliability, and security of complex engineering systems [1–7]. One of the industrial areas that have widely employed fault diagnostic techniques is Multistage Manufacturing Processes (MMPs) [8–13]. Many MMPs consist of identical stages, units, stations, or operations.

    • Lithology identification technology based on the stacking fusion model

      2023, International Journal of Oil, Gas and Coal Technology
    View all citing articles on Scopus
    View full text