1 Introduction and literature review

Product quality is related to the core interests of consumers and directly determines the survival and development of a firm. If a firm produces and sells products with quality defects, its reputation can be severely damaged. In the long run, this will negatively affect its competitiveness (Giovanni 2020). For example, Samsung's mobile phones, which caused occasional explosions in 2016, significantly reduced its market value. In the production process, quality inspection is directly related to product quality. Therefore, it is essential for a firm to develop an effective quality inspection system. Manual sampling inspection is common, which mainly depends on the experience of inspectors. However, with the expansion of production volume and the increase of product complexity, manual quality inspection has become inadequate. In contrast, the artificial intelligence-based quality inspection can be superior in terms of cost, efficiency, and accuracy. For example, Intel applied smart technology to chip quality inspection, which saved about $3 million in manufacturing costs (Mangal and Kumar 2016).

The main goal of quality inspection is to reduce quality risk. Quality risk is a broad concept in the field of statistical process control, which refers to the risk for manufacturers and customers due to classification errors in quality inspection (Rotondo et al. 2013; Kaya and Ozer 2009). Quality inspection can be considered as a classification problem, a common topic in machine learning. Researchers have investigated data-driven quality inspection, including the use of artificial intelligence. In practice, artificial intelligence-based quality inspection has been developed in the quality inspection of fruits (Nureize et al. 2014), seafood products (Huang et al. 2016), software products (Zhang et al. 2020), and industrial products (Kumar and Kumar 2018). According to the detection methods employed, quality detection can be divided into quality detection based on image features and quality detection based on process features. The former is mainly based on deep learning methods, which are often used for product surfaces quality inspections such as bearing surface scratches, chip surface defects, and uneven dyeing of clothing. The latter is mainly based on machine learning methods and is often adopted for more complex inspections of internal product quality. This article focuses on the latter.

The current machine learning methods used for quality detection and classification mainly include SVM (Liu et al. 2018a, b), logistic regression (Kurt et al. 2018), and BP neural network (Xing et al. 2019; Xu et al. 2013). Among them, BP neural networks have been widely employed due to its superior performance at mining the relationship between complex data and its high fault tolerance. Xing et al. (2019) used BP neural networks to predict and analyze steel quality. Xu et al. (2013) used BP neural networks optimized by particle swarm to establish a laser milling quality control model. Yuen et al. (2009) combined a genetic algorithm and BP neural networks to identify clothing with quality defects. Fu et al. (2019) employed a genetic algorithm to improve BP neural networks and applied it to the quality inspection of eggs. Jiang and Wang (2016) developed an electronic nose to detect the internal quality changes of Chinese pecans during storage. Using methods including BP neural network and random forest, their methods can detect rotting pecan kernels without peeling the shell. Liu et al. (2018a, b) applied methods such as BP neural networks and SVM in the quality inspection of cold-rolled strip. Liu et al. (2017) extracted fourteen ultrasonic characteristic signals that can reflect different types of spot welding. They achieved the intelligent identification of resistance spot welding defects with the help of BP neural networks. Liang et al. (2012) conducted wavelet texture analysis to extract variable features and employed fuzzy ARTMAP and BP neural networks to classify the quality of industrial spinning yarns. Liu et al. (2015) developed BP neural networks for soybean seed quality detection, which can quickly identify damaged soybean seeds.

The existing product quality detection research using BP neural networks mainly focuses on the high-dimensional and non-linear characteristics of the data. However, most research omits the imbalanced nature of the related data, which can be detrimental to reducing quality risks due to the following reasons. First, If BP neural networks aim for the overall classification accuracy, they may treat the misclassification cost of all products as equally important. Second, product quality inspection aims for classification loss minimization. The loss from nonconforming products can be very high. Thus, an effective model needs to achieve acceptable overall accuracy but also, more importantly, accurately identify the less-common defective products.

Therefore, this research addresses the following question: considering the imbalanced characteristics of data, how to employ machine learning to improve the effectiveness of product quality inspection? At present, there are two types of solutions to the classification problem of imbalanced sample categories in machine learning. The first type is data-level research, and the second type focuses on improving a learning algorithm to make it suitable for imbalanced samples. In this first type, sampling methods are mainly used to adjust the distribution of data training set to make it more balanced. The core of the sampling method is "reducing the majority and increasing the minority" (Guo et al. 2019; Liu et al. 2009; Douzas et al. 2019). For example, Furundzic et al. (2017) applied a variety of sampling techniques for industrial product quality classification and fault detection. In the second type, Schapire (1990) proposed the Boosting ensemble learning and proved that under Probably Approximately Correct (PAC) learning, a classifier with a lower classification level could be transformed into a stronger classifier with better classification effect. Specifically, boosting ensemble learning incorporates the "collective wisdom" of multiple base classifiers to build a strong classifier with better performance than each individual classifier. This can significantly reduce the difficulty of designing and constructing a single classifier (Freund and Schapire 1997). In the boosting ensemble learning family, Adaptive Boosting (AdaBoost) is the most representative model and was rated as one of the top ten algorithms for data mining (Rutledge 2009). The model cascades multiple base classifiers through a "competition system" and updates the sample weights after each iteration to assign higher weights to the misclassified samples in the next round of classification. In this way, these misclassified samples receive extra attention, yielding improved classification accuracy (Landesavazquez and Albacastro 2012; Liu et al. 2009).

The AdaBoost-BP model using AdaBoost cascaded multiple BP neural network can improve its overall classification accuracy and reduce its complexity. However, for imbalanced data, high overall classification accuracy is inadequate. For example, when a sample contains 5% unqualified products and 95% qualified products, even if a model classifies all products as qualified, the classification accuracy rate can still reach 95%. However, in this case, the misclassification rate of defectives is 100%. In actual quality testing, the proportion of non-conforming product samples is even lower. Thus, the consequence of misclassification is more severe. Therefore, an effective product quality inspection model should also focus on reducing the rate of missed detection, that is, the rate of misclassification of nonconforming products. Although AdaBoost's asymmetric learning feature of adjusting sample distribution can improve the classification of unbalanced data (Rutledge 2009), it treats all misclassified samples as equally important, which weakens the model's sensitivity to non-conforming products (Galar et al. 2011; Guo et al. 2017).

Therefore, in this paper, we propose an ensemble learning classification model, named CS-AdaBoost-BP, where we adopt the AdaBoost adaptive ensemble algorithm to cascade multiple backpropagation (BP) neural networks. In order to improve the model's sensitivity to non-conforming samples, we introduce cost sensitivity into the loss function of the neural networks to distinguish the misclassification costs of conforming and non-conforming products. Furthermore, we empirically test our model in the quality detection of Bosch home appliances. To confirm the effectiveness of our model, we compare its performance with that of the under-sampling method, the single BP neural network, and the AdaBoost-BP model.

The main contributions of this paper are as follows. First, in the literature, quality detection models based on traditional BP neural networks typically aim for overall classification accuracy. Hence, they do not fully consider imbalanced data related to product quality. In contrast, this paper emphasizes the imbalance of data in model construction. Second, different from existing quality detection models relying on a single BP neural network, this article uses an ensemble learning method. At the same time, we introduce cost-sensitive learning into BP neural networks. In this way, ensemble learning and cost-sensitive learning are combined to establish a CS-AdaBoost-BP quality detection method. This method can also be applied to imbalanced data classification in general.

The rest of this article is organized as follows. In Sect. 2, we provide the details of the BP neural network, the AdaBoost algorithm, and the CS-AdaBoost-BP model. In Sect. 3, we implement the CS-AdaBoost-BP model to quality detection in home appliance production lines from Bosch. We also compare its performance with that of several other models. The conclusions and future research are drawn in Sect. 4.

2 CS-AdaBoost-BP model

This paper proposes a CS-AdaBoost-BP product quality detection model to deal with the potential imbalanced data in quality detection. The basic ideas of our model are as follows. First, we employ BP neural networks, which exhibit a high level of fault tolerance and adaptability as the basic classifier. Then, we adopt an AdaBoost ensemble algorithm to focus on learning from misclassified samples. In this way, we can improve the generalization ability of the model and reduce the difficulty of model design. We then introduce the concept of cost sensitivity into the base classifier to deal with the imbalanced distribution of data samples of product quality.

2.1 BP neural network

Rumelhart et al. (1986) proposed the BP neural network, a dynamic neural network with feedback and memory functions. BP neural network focuses on signal forward propagation and error backward propagation (Li et al. 2012). In the forward propagation, the input sample is passed from the input layer, processed through multiple hidden layers, and then reaches the output layer. When the actual output of the output layer does not match the expected output, backpropagation transfers the output error back to the input layer through the hidden layers. This process distributes the error to all units of each layer. Thus, the error signal of each layer unit is obtained. This error signal is the basis for correcting the weight of each unit. Due to their simple and flexible structures, BP neural networks have been widely used. The structure of a classic BP neural network is shown in Fig. 1.

Fig. 1
figure 1

Structure of a classic BP neural network

The main steps for a BP neural network are as follows. First, forward propagation of information through BP neural network is given by:

$$ z^{l} = \theta^{l} a^{l - 1} + b^{l} $$
$$ a^{l} = \delta \left( {z^{l} } \right) $$
$$ \hat{y} = a^{L} $$

where \(a^{1} = z^{1} = x,L\), represents the total number of layers, \(l\) represents the l-th layer, \(\theta^{l}\) represents the weight coefficient matrix of the l-th layer, \(b^{l}\) represents the neuron bias coefficient matrix of the l-th layer, \(\delta \left( \bullet \right)\) is the activation function, \(Z^{l}\) is the neuron weighted output vector of the l-th layer, \(a^{l}\) is the output vector of the neuron activation function of the l-th layer, and \(\hat{y}\) is the output value, i.e., the predicted value. The loss value for the BP neural network is evaluated using

$$ Loss = \frac{1}{m}\mathop \sum \limits_{i = 1}^{m} (\left( {y^{\left( i \right)} \log_{2} \hat{y}^{\left( i \right)} } \right) + \left( {1 - y^{\left( i \right)} } \right)\log_{2} (1 - \hat{y}^{\left( i \right)} )). $$

where m represents the total number of samples, \(y^{\left( i \right)}\) represents the true value of the i-th sample, and \(\hat{y}^{\left( i \right)}\) represents the predicted value of the i-th sample. We adopt the following gradient descent algorithm to calculate the neuron weights and biases of each layer:

$$ \theta^{l} - = \alpha \frac{\partial Loss}{{\partial \theta^{l} }} $$
$$ b^{l} - = \alpha \frac{\partial Loss}{{\partial \theta^{l} }} $$

where α is the learning rate.

2.2 The AdaBoost algorithm

As an ensemble algorithm with adaptive adjustment capability (Liu et al. 2019), AdaBoost has two typical features. First, in each iteration, AdaBoost increases the weight of misclassified samples to construct a new training set, allowing the model to pay more attention to these misclassified samples. Second, the algorithm will generate a classification model (called "weak classifier") after each iteration. The weight of each weak classifier is adjusted according to classification accuracy. In this way, the weak classifiers with higher classification accuracy will obtain higher weights. Thus, a powerful classifier is constructed by combining multiple weak classifiers. The AdaBoost algorithm can integrate multiple low-accuracy models into an accurate one with strong classification capability while reducing the difficulty of model design. The training process of AdaBoost algorithm is as follows.

Step 1: Input the data

The training data set is given by \(T = \left\{ {\left( {x_{1} ,y_{1} } \right),\left( {x_{2} ,y_{2} } \right),...,\left( {x_{N} ,y_{N} } \right)} \right\}\), where \(x_{i} \in {\mathcal{X}} \subseteq R^{n}\), y is the category, and \(G\) is the basic classifier.

Step 2: Data initialization

Initializing the weight distribution of training data yields

$$ D_{1} = \left( {w_{11} , \ldots ,w_{1i} , \ldots ,w_{1N} } \right),\quad w_{1i} = \frac{1}{N},\quad i = 1,2,...,N $$
(1)

Step 3: Produce the weak classifiers

For m = 1 to M, we follow the following steps:

  1. (1)

    Use the training data set with weight distribution \(D_{m} \) to learn and obtain a weak classifier \(G_{m} \left( x \right)\).

  2. (2)

    Calculate the classification error rate \(e_{m}\) of \(G_{m} \left( x \right)\) on the training data set. If it is greater than 50%, discard the weak classifier. Then,

    $$ e_{m} = \mathop \sum \limits_{i = 1}^{N} P\left( {G_{m} \left( {x_{i} } \right) \ne y_{i} } \right) = \mathop \sum \limits_{i = 1}^{N} w_{mi} I\left( {G_{m} \left( {x_{i} } \right) \ne y_{i} } \right) $$
    (2)
  3. (3)

    Calculate the weight of the weak classifier \(G_{m} \left( x \right) \) as

    $$ \alpha_{m} = \frac{1}{2}\ln \frac{{1 - e_{m} }}{{e_{m} }} $$
    (3)
  4. (4)

    Update the weight distribution of the training set data set yields

    $$ D_{m + 1} = \left( {w_{m + 1,1} ,...,w_{m + 1,i} ,...,w_{m + 1,N} } \right) $$
    $$ w_{m + 1,i} = \frac{{w_{mi} }}{{Z_{m} }}\exp \left( { - \alpha_{m} y_{i} G_{m} \left( {x_{i} } \right)} \right)\quad i = 1,2,...,N $$
    (4)

where \(Z_{m} \) is to ensure the sum of the distribution weights is 1 while keeping the weight ratio unchanged.

Step 4: Construct the strong classifier and the output strong classifier is given by

$$ H\left( x \right) = sign(f\left( x \right)) = sign\left( {\mathop \sum \limits_{m = 1}^{M} \alpha_{m} G_{m} \left( x \right)} \right) $$
(5)

2.3 The CS-AdaBoost-BP model

Under the framework of AdaBoost ensemble learning, the AdaBoost-BP model cascades multiple BP neural networks, focusing on learning from wrongly classified samples, thereby increasing its generalization ability of the model and the difficulty of model design. For quality testing, firms typically pay more attention to reducing the rate of missed inspections. Therefore, on the premise of ensuring the overall accuracy, the model should accurately identify the defective products. However, the loss function of most traditional BP neural networks aims at improving the overall classification accuracy. Table 1 is the binary classification confusion matrix, which shows that the data wrongly classified are FP and FN. The loss function of the traditional BP neural networks classifier is designed as

$$ Loss = Loss\left( {FP} \right) + Loss\left( {FN} \right) $$
Table 1 Binary classification confusion matrix

In this regard, we introduce the concept of cost sensitivity. Based on the data of imbalanced products and different misclassification costs, we construct a novel CS-AdaBoost-BP model. The model improves the loss function of the basic classifier and assigns different weights to the misclassified conforming and defective products in the loss function. In this way, nonconforming products with the wrong classification will be weighted more during the iterative process. In this way, these defectives can be effectively identified while ensuring the overall accuracy of the model. To be more specific, the loss value of the model is mainly determined by the number of wrongly classified samples and the loss value of each type of sample. The loss function of BP neural network does not distinguish the misclassification cost of different samples. In the CS-AdaBoost-BP model proposed in this paper, we multiply the miscalculation loss value of defectives in the loss function by the coefficient K, where K > 1. Then, in the training process, the model will focus more on inspecting defective products to avoid higher classification loss. We thus have.

$$ Loss = Loss\left( {FP} \right) + KLoss\left( {FN} \right) $$

The flow chart of CS-AdaBoost-BP algorithm is shown in Fig. 2.

Fig. 2
figure 2

The flow chart of the CS-AdaBoost-BP model

The major steps of the CS-AdaBoost-BP model are described as follows:

Step 1: Construct a cost-sensitive BP neural network as the basic classifier.

We update the loss function by introducing cost sensitivity and increasing the weight of misclassified products in the loss function. We have

$$ Loss = \frac{1}{m}\mathop \sum \limits_{i = 1}^{m} \left( {\left( {y^{\left( i \right)} \log_{2} \hat{y}^{\left( i \right)} } \right) + K\left( {1 - y^{\left( i \right)} } \right)\log_{2} \left( {1 - \hat{y}^{\left( i \right)} } \right)} \right). $$

Step 2: Initialize the weight distribution of training data

$$ D_{1} = \left( {w_{11} , \ldots ,w_{1i} , \ldots ,w_{1N} } \right),\quad w_{1i} = \frac{1}{N},i = 1,2,...,N $$
(6)

Step 3: Connect multiple cost-sensitive BP neural networks in series:

  1. (1)

    Use the training data set with weight distribution \(D_{m}\) to learn and obtain a weak classifier \(C_{m} \left( x \right)\) with cost sensitivity.

  2. (2)

    Calculating the classification error rate \(e_{m} \) of \(C_{m} \left( x \right)G_{m} \left( x \right)\) on the training data set yields

    $$ e_{m} = \mathop \sum \limits_{i = 1}^{N} P\left( {G_{m} \left( {x_{i} } \right) \ne y_{i} } \right) = \mathop \sum \limits_{i = 1}^{N} w_{mi} I\left( {G_{m} \left( {x_{i} } \right) \ne y_{i} } \right) $$
    (7)

    If \(e_{m}\) is greater than 50%, the weak classifier will be discarded.

  3. (3)

    Calculating the weight of the weak classifier \(C_{m} \left( x \right)\) yields

    $$ \alpha_{m} = \frac{1}{2}\ln \frac{{1 - e_{m} }}{{e_{m} }} $$
    (8)
  4. (4)

    Updating the weight distribution of the training set and data set, we can have

    $$ D_{m + 1} = \left( {w_{m + 1,1} , \ldots ,w_{m + 1,i} , \ldots ,w_{m + 1,N} } \right) $$
    $$ w_{m + 1,i} = \frac{{w_{mi} }}{{Z_{m} }}\exp \left( { - \alpha_{m} y_{i} G_{m} \left( {x_{i} } \right)} \right)\quad i = 1,2,...,N $$
    (9)

where \( Z_{m}\) is the norm factor to ensure that the sum of distribution weights is 1 while keeping the weight ratio unchanged.

Step 4: Output a strong classifier

$$ H\left( x \right) = sign(f\left( x \right)) = sign\left( {\mathop \sum \limits_{m = 1}^{M} \alpha_{m} G_{m} \left( x \right)} \right) $$
(10)

3 Empirical experiments

In this section, we empirically test the effectiveness of our proposed CS-AdaBoost-BP model. Specifically, we use actual data from the production processes of Bosch. First, we describe the empirical data collected. Second, we preprocess these data considering their characteristics of a high-dimensional distribution. Third, we implement the CS-AdaBoost-BP model proposed in this paper. Finally, we evaluate and compare our model with three other machine learning models.

3.1 Data description

For this research, we collected the workshop production data published by Bosch, the German home appliance firm, on the Kaggle competition platform. Product serial number, process number, and product category are available. Each product has a unique number, which is listed in Table 2. Here, each process has a unique number. For example, L0S8_F_144 indicates that the 144th process is on the 8th station of the production line with number 0. The product category is listed in the Response column, where -1 represents the qualified product, and 1 represents a defective product. The intelligent classification model learns and iterates based on a large amount of data, and finally classifies the products according to the parameter values of the products in each process (Lu et al. 2020). We choose 6,731 unqualified products and 36,781 qualified products. Thus, the distribution ratio is 1.83: 10.

Table 2 Data description

3.2 Filtering variables

During production, the products' parameter values set in different processes directly affect the quality of the final products. Since there are many complex production processes, the corresponding data is high-dimensional. If all processes are directly used, it will increase the complexity of the model and cause classification to be overfitting. Therefore, we need to extract the characteristics which significantly impact product quality.

The product quality data contains various discrete and continuous variables, hence there is a higher requirement for variables filtering. As the XGBoost algorithm evaluates the importance of variables with information gain, which is not affected by the type of variables, we adopt the XGBoost algorithm for filtering variables. Specifically, processes with higher scores have a greater impact on the classification results. The importance ranking of different processes is shown in Fig. 3. In this way, we retained the top 50 processes for classification. As shown in Fig. 3, the L3_S29_F3407, L0_S8_F149, and L3_33_F3855 processes have higher scores, indicating that these processes have more significant impacts on the classification results. In other words, the parameter values of these processes can be used to distinguish the conforming and defective products.

Fig. 3
figure 3

The importance ranking of different processes

3.3 Evaluation and comparison of the overall performance

The CS-AdaBoost-BP model in this paper employs 50 BP neural network classifiers. Each neural network has four layers. The learning rate is 0.001. To obtain the cost sensitivity coefficient K, we combine the method proposed by De Castro and Braga et al. (2013) and the grid method and set K to be 3. We use the tenfold cross-validation method to evaluate the performance of four models, including the BP neural network, the BP neural network based on data sampling (Sample-BP), the AdaBoost-BP model, and the CS-AdaBoost-BP model. We then compare their average accuracy and stability. Data is divided into ten equal parts in the tenfold cross-validation method. Nine of them are used as the training set, and one is used as the testing set. Compared with a single test method, tenfold cross-validation method can avoid the interference led by random sampling. Therefore, we employ this method to evaluate the performance of the four models.

3.3.1 Comparison of average model accuracy

Classification average accuracy rate is a common index for evaluating model performance. It refers to the average ratio of the data classified correctly. Because we use the tenfold cross-validation method, ten test results will be produced. We further calculate the mean of the ten results. Table 3 shows the average classification accuracy rate for the four models using tenfold cross-validation.

Table 3 The average classification accuracy rates using tenfold cross-validation

Table 3 shows that the following ranking of the four models in terms of the average classification accuracy using the ten-fold cross-validation: CS-AdaBoost-BP > AdaBoost-BP > Sample-BP > BP. We can see that after removing the error caused by randomly selected training data, the average accuracy of the ensemble learning-based models, including the CS-AdaBoost-BP model, the AdaBoost-BP model, and the Sample-BP model is better than that of the BP model. Therefore, we can conclude that the optimization and ensemble algorithms proposed in this article can improve the overall accuracy.

3.3.2 Comparison of model stability

Stability is an important factor that determines a model's applicability. In order to better show the stability of the classification accuracy rate, we evaluate the stability of the four models in two ways. On the one hand, we study the stability of the models by evaluating their standard deviations, maximum, and minimum. On the other hand, we investigate the stability of the models by comparing their line chart of the ten-fold cross-validation results. The results are presented in Table 4 and Fig. 4, respectively.

Table 4 The maximum, minimum, and standard deviation of the classification accuracy rates of the four models
Fig. 4
figure 4

The classification accuracy rates of the four models using tenfold cross-validation

Table 4 shows that the standard deviation of the improved CS-AdaBoost-BP model is only 0.00606, the smallest out of the four models. Based on the standard deviations, the other three models are ranked as AdaBoost-BP, Sample-BP, and BP neural network. Thus, we can conclude that first, the stability of Sample-BP and AdaBoost-BP is better than that of BP neural network, indicating that the sampling algorithm and ensemble neural network are effective. Second, the stability of AdaBoost-BP and Sample-BP is relatively close. Third, the CS-AdaBoost-BP model is more stable than the AdaBoost-BP model.

3.4 Comparison of quality detection capabilities of the models

Different from general classification models, in addition to ensure a high overall classification accuracy, a quality classification model must also be highly capable of detecting defective products. As argued earlier, the information provided by the overall accuracy rate of the product quality classification model is quite inadequate. One reason is that false negative and false positive (missed detection rate) information is not used. However, models with the same overall classification accuracy may have different rates of false positive and false negative. Hence, we need to choose performance indicators that can compare models' capability to classify defective products, including rates of missed inspection and Youden index.

3.4.1 Comparison of missed detection rates

The rate of missed detection, also known as the false positive rate, refers to the probability that a substandard product is misclassified as conforming. The rate of missed detection directly affects the effectiveness of a model, and it is as important as the overall accuracy of the model. Hence, we compare the rates of missed detection of the four models through tenfold cross-validation. The comparison results are shown in Table 5 and Fig. 5.

Table 5 The missed detection rates of the four models using tenfold cross-validation
Fig. 5
figure 5

The missed detection rates of the four models using tenfold cross-validation

Table 5 shows that the average rates of missed detection of AdaBoost-BP and CS-AdaBoost-BP are significantly higher than those of BP neural network and Sample-BP neural network. This demonstrates that ensemble learning can yield better classification results. Among them, CS-AdaBoost-BP has the lowest rate of missed detection. This confirms that the proposed CS-AdaBoost-BP model in this paper can better detect the defective products.

The tenfold cross-validation results of the rates of missed detection of the four models are plotted in Fig. 5. We can see that the classification result of the CS-AdaBoost-BP model has the smallest fluctuation, which further confirms that our model can not only accurately identify defective products but also possess higher classification stability.

3.4.2 Comparison of Youden index

Youden index is defined as the difference between the true positive rate (TPR) and false-positive rate (FPR). As a comprehensive indicator for evaluating classification models, the Youden index represents the total ability of a model to detect conforming and nonconforming products. The Youden index ranges from − 1 to + 1. The closer the value is to + 1, the better the model is. Through ten-fold cross-validation, we obtain the Youden indices of the four models, as shown in Table 6.

Table 6 The Youden indices of the four models using tenfold cross-validation

Table 6 shows that the average Youden index value of the four models is ranked as CS-AdaBoost-BP > AdaBoost-BP > Sample-BP > BP. This means that the optimization and ensemble algorithms proposed in this article not only improve the overall consistency but also enhance the stability of detecting defective products.

The ten-fold cross-validation results of the Youden indices for the four models are plotted in Fig. 6. It shows that the Youden index of the CS-AdaBoost-BP model has the smallest fluctuation, indicating that the model is particularly stable in quality detection.

Fig. 6
figure 6

The Youden indices of the four models using tenfold cross-validation

Based on our analysis and discussions so far, we can conclude the following. First, there is an insignificant improvement in terms of the Youden index from the use of BP neural network and Sample-BP neural network. Second, there is a significant improvement in terms of the Youden index from the AdaBoost-BP model and the CS-AdaBoost-BP model. This indicates that the ensemble method of AdaBoost is superior to a single classification algorithm in detecting defective products. Third, the Youden index of CS-AdaBoost-BP is better and more stable than that of AdaBoost-BP. This implies that the cost-sensitive CS-AdaBoost-BP model is more stable than the other models.

The experimental results so far show that the CS-AdaBoost-BP model is the best in terms of overall performance and product quality detection. It improves the average accuracy of the AdaBoost-BP model from 87.4 to 90.6%. Furthermore, it reduces the average rate of missed detection of the AdaBoost-BP model from 17.7 to 8.4%. Thus, the CS-AdaBoost-BP model is significantly better than the AdaBoost-BP model in identifying defective products. Furthermore, it has a higher Youden index, indicating that the model performs well in comprehensive product quality inspection.

4 Conclusions and future research

In the information age, quality inspection based on artificial intelligence technology has become increasingly important. However, quality detection models based on BP neural networks proposed in the literature are ineffective for classifying imbalanced data. In this paper, we aim to overcome the classification interference caused by data tilt and reduce the risk of quality detection inaccuracy. To this end, we combine BP neural network, AdaBoost ensemble learning, and cost-sensitive learning into a CS-AdaBoost-BP product quality inspection model. Specifically, first, we adopt the XGBoost method to filter the production processes and conduct input data preprocessing. Second, we present the BP neural network, the sampling-based BP neural network, the AdaBoost-BP model, and the CS-AdaBoost-BP model. Third, we implement the models above to the quality inspection of home appliance products from Bosch. We comprehensively compare their classification accuracy, rate of missed detection, and Youden index.

Our results demonstrate that the proposed CS-AdaBoost-BP model performs better than the other three methods. First, the CS-AdaBoost-BP model applies ensemble learning to reduce the difficulty of model design. Therefore, CS-AdaBoost-BP is more practical. Second, compared with the other three models, CS-AdaBoost-BP has a higher overall classification accuracy and, at the same time, can detect unqualified products much better. This confirms that our model is suitable for processing imbalanced data. Finally, compared to the other three methods, our model is more stable according to the ten-fold cross-validation results.

However, there may still be several potential shortcomings associated with the CS-AdaBoost-BP model. First, its computational speed is not the fastest. Compared with a single BP neural network, this model integrates 50 weak classifiers based on BP neural networks. The training, iteration, and integration of these classifiers take more time. Second, in this research, we focus on algorithm improvement alone. In the future, we can focus on algorithm improvement and data processing simultaneously. Third, this research mainly adopts the direct embedding cost-sensitive method. Since the weight of the loss function varies with different data sets, how to find an efficient strategy for weight search can be further explored in the future. Fourth, we can integrate big data analytics and social media into quality management (Wamba et al. 2018, 2019). Last but not least, the current research on quality detection can be combined with supply chain management (Li et al. 2020) and corporate social responsibility (Bian et al. 2020).

It should be noted that when implementing the CS-AdaBoost-BP model in practice, users should first consider the potential issue of missing data, which requires data cleaning. Second, the high-dimensional characteristics of the data should also be examined. In this paper, we use the XGBoost algorithm to identify the features that can significantly impact the classification capability to reduce data dimensions. Third, it is worthwhile to note that if data are extremely imbalanced, the model may be inapplicable.