Abstract
Industries are going through the fourth industrial revolution (Industry 4.0), where technologies like the Industrial Internet of things, big data analytics, and machine learning (ML) are extensively utilized to improve the productivity and efficiency of manufacturing systems and processes. This work aims to further investigate the applicability and improve the effectiveness of ML prediction models for fault diagnosis in the smart manufacturing process. Hence, we propose several methodologies and ML models for fault diagnosis for smart manufacturing process applications. A case study has been conducted on a real dataset from a semiconductor manufacturing (SECOM) process. However, this dataset contains missing values, noisy features, and class imbalance problem. This imbalance problem makes it so difficult to accurately predict the minority class, due to the majority class size difference. In the literature, efforts have been made to alleviate the class imbalance problem using several synthetic data generation techniques (SDGT) on the UCI machine learning repository SECOM dataset. In this work, to handle the imbalance problem, we employed, compared, and evaluated the feasibility of three SDGT on this dataset. To handle issues related to the missing values and noisy features, we implemented two missing values imputation techniques and feature selection techniques, respectively. We then developed and compared the performance of ten predictive ML models against these proposed methodologies. The results obtained across several evaluation metrics of performance were significant. A comparative analysis shows the feasibility and validate the effectiveness of these SDGT and the proposed methodologies. Some among the proposed methodologies could produce an accuracy in the range of 99.5% to 100%. Furthermore, based on a comparative analysis with similar models from the literature, our proposed models outpaced those proposed in the literature.
Similar content being viewed by others
Data availability
All data generated or analyzed during this study are included in this published article.
Abbreviations
- 1NNC:
-
1-Nearest neighbor classifier
- AdaBoost:
-
Adaptive boosting
- ADASYN:
-
Adaptive synthetic oversampling
- ANN:
-
Artificial neural network
- AWSMOTE:
-
Adaptive-weighting SMOTE
- BEBS:
-
Bagging of extrapolation borderline-SMOTE SVM
- BSMOTE-SVM:
-
Borderline-SMOTE-SVM
- CV:
-
Cross-validation
- DL:
-
Deep learning
- DT:
-
Decision tree
- FN:
-
False negative
- FP:
-
False positive
- FPR:
-
False positive rate
- FS:
-
Feature scaling
- GBM:
-
Gradient boosting machine
- k-NN:
-
K-nearest neighbor
- k-NNI:
-
K-NN imputation
- LDA:
-
Linear discriminant analysis
- LR:
-
Logistic regression
- MARS:
-
Multivariate adaptive regression splines
- MI:
-
Mean imputation
- ML:
-
Machine learning
- MLP:
-
Multilayer perceptron
- MRD:
-
Modified raw dataset
- MWMOTE:
-
Majority weighted minority oversampling technique
- NB:
-
Naïve Bayes
- PCA:
-
Principal component analysis
- PSO-DBN:
-
Particle swarm optimization–deep belief network
- RF:
-
Random forest
- RFE:
-
Recursive feature elimination
- SDGT:
-
Synthetic data generation techniques
- SECOM:
-
Semiconductor manufacturing
- SELECTFDR:
-
Estimated false discovery rate
- SMOTE:
-
Synthetic minority oversampling technique
- SVM:
-
Support vector machines
- TN:
-
True negative
- TP:
-
True positive
- TPR:
-
True positive rate
- UCI SECOM dataset:
-
UCI machine learning repository SECOM dataset
- UFS:
-
Univariate feature selection
- XGBoost:
-
Extreme gradient boosted trees
References
Lee DH, Yang JK, Lee CH, Kim KJ (2019) A data-driven approach to selection of critical process steps in the semiconductor manufacturing process considering missing and imbalanced data. J Manuf Syst 52:146–156. https://doi.org/10.1016/J.JMSY.2019.07.001
Pfingsten T, Herrmann DJL, Schnitzler T, Feustel A, Schölkopf B (2007) Feature selection for troubleshooting in complex assembly lines. IEEE Trans Autom Sci Eng 4:465–469. https://doi.org/10.1109/TASE.2006.888054
Mccann M, Li Y, Maquire L, Johnston A (2010) Causality challenge: benchmarking relevant signal components for effective monitoring and process control. J Mach Learn Res Work Conf Proc 6:277–288
Shin CK, Park SC (2000) A machine learning approach to yield management in semiconductor manufacturing. Int J Prod Res 38:4261–4271. https://doi.org/10.1080/00207540050205073
Kumar N, Kennedy K, Gildersleeve K, Abelson R, Mastrangelo CM, Montgomery DC (2006) A review of yield modelling techniques for semiconductor manufacturing. Int J Prod Res 44:5019–5036. https://doi.org/10.1080/00207540600596874
Chien CF, Wang WC, Cheng JC (2007) Data mining for yield enhancement in semiconductor manufacturing and an empirical study. Expert Syst Appl 33:192–198. https://doi.org/10.1016/j.eswa.2006.04.014
Çinar ZM, Nuhu AA, Zeeshan Q, Korhan O, Asmael M, Safaei B (2020) Machine learning in predictive maintenance towards sustainable smart manufacturing in industry 4.0. Sustainability 12:8211. https://doi.org/10.3390/su12198211
Kang S, An D, Rim J (2019) Incorporating virtual metrology into failure prediction. IEEE Trans Semicond Manuf 32:553–558. https://doi.org/10.1109/TSM.2019.2932377
Su AJ, Jeng JC, Huang HP, Yu CC, Hung SY, Chao CK (2007) Control relevant issues in semiconductor manufacturing: overview with some new results. Control Eng Pract 15:1268–1279. https://doi.org/10.1016/j.conengprac.2006.11.003
Mack CA (2011) Fiftyyears of Moore’ s law. IEEE Fellow 24:2008
Ghahramani M, Qiao Y, Zhou MC, O’Hagan A, Sweeney J, Hagan AO, Sweeney J (2020) AI-based modeling and data-driven evaluation for smart manufacturing processes. IEEE/CAA J Autom Sin 7:1026–1037. https://doi.org/10.1109/JAS.2020.1003114
Zhang W, Yang D, Wang H (2019) Data-driven methods for predictive maintenance of industrial equipment: a survey. IEEE Syst J 13:2213–2227. https://doi.org/10.1109/jsyst.2019.2905565
Wuest T, Weimer D, Irgens C, Thoben KD (2016) Machine learning in manufacturing: advantages, challenges, and applications. Prod Manuf Res 4:23–45. https://doi.org/10.1080/21693277.2016.1192517
Carbery CM, Woods R, Marshall AH (2019) A new data analytics framework emphasising preprocessing of data to generate insights into complex manufacturing systems. Proc Inst Mech Eng Part C J Mech Eng Sci 233:6713–6726. https://doi.org/10.1177/0954406219866867
Munirathinam S, Ramadoss B (2016) Predictive models for equipment fault detection in the semiconductor manufacturing process. Int J Eng Technol 8:273–285. https://doi.org/10.7763/ijet.2016.v8.898
Kerdprasop K, Kerdprasop N (2011) A data mining approach to automate fault detection model development in the semiconductor manufacturing process. Int J Mech 5:336–344
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1002/eap.2043
Susan S, Kumar A (2019) SSO Maj-SMOTE-SSO Min: three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput J 78:141–149. https://doi.org/10.1016/j.asoc.2019.02.028
Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
Nguyen HM, Cooper EW, Kamei K (2009) Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Paradig 3:4. https://doi.org/10.1504/ijkesdp.2011.039875
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of international joint conference on neural networks, pp 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 5476 LNAI, pp 475–482. https://doi.org/10.1007/978-3-642-01307-2_43
Last F, Douzas G, Bacao F (2017) Oversampling for imbalanced learning based on K-means and SMOTE, pp 1–19. https://doi.org/10.1016/j.ins.2018.06.056
Barua S, Islam MM, Yao X, Murase K (2014) MWMOTE - Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26:405–425. https://doi.org/10.1109/TKDE.2012.232
Wang JB, Zou CA, Fu GH (2021) AWSMOTE: an SVM-based adaptive weighted SMOTE for class-imbalance learning. Sci Program. https://doi.org/10.1155/2021/9947621
Wang Q, Luo Z, Huang J, Feng Y, Liu Z (2017) A novel ensemble method for imbalanced data learning. Comput Intell Neurosci 2017:1–11
Wang L, Wang Y (2020) Application of machine learning for process control in semiconductor manufacturing. In: The ACM international conference proceeding series, pp 109–111. https://doi.org/10.1145/3424311.3424326
Mat Jizat JA, Abdul Majeed APP, Ahmad AF, Taha Z, Yuen E (2021) Evaluation of the machine learning classifier in wafer defects classification. ICT Express. https://doi.org/10.1016/j.icte.2021.04.007
Saqlain M, Abbas Q, Lee JY (2020) A deep convolutional neural network for wafer defect identification on an imbalanced dataset in semiconductor manufacturing processes. IEEE Trans Semicond Manuf 33:436–444. https://doi.org/10.1109/TSM.2020.2994357
Stich P, Wahl M, Czerner P, Weber C, Fathi M (2020) Yield prediction in semiconductor manufacturing using an AI-based cascading classification system. https://doi.org/10.1109/EIT48999.2020.9208250
Moldovan D, Anghel I, Cioara T, Salomie I (2020) Particle swarm optimization based deep learning ensemble for manufacturing processes. In: Proceedings of 2020 IEEE 16th international conference on intelligent computer communication and processing ICCP 2020, pp 563–570. https://doi.org/10.1109/ICCP51029.2020.9266269
Moldovan D, Anghel I, Cioara T, Salomie I (2020) Machine learning in manufacturing: processes classification using support vector machine and Horse optimization algorithm. In: Proceedings of the RoEduNet IEEE international conference 2020-Decem. https://doi.org/10.1109/ROEDUNET51892.2020.9324855
Lee KB, Kim CO (2020) Recurrent feature-incorporated convolutional neural network for virtual metrology of the chemical mechanical planarization process. J Intell Manuf 31:73–86. https://doi.org/10.1007/s10845-018-1437-4
Fan SKS, Hsu CY, Jen CH, Chen KL, Juan LT (2020) Defective wafer detection using a denoising autoencoder for semiconductor manufacturing processes. Adv Eng Inform 46:101166. https://doi.org/10.1016/j.aei.2020.101166
Salem M, Taheri S, Yuan J-S (2018) An experimental evaluation of fault diagnosis from imbalanced and incomplete data for smart semiconductor manufacturing. Big Data Cogn Comput 2:30. https://doi.org/10.3390/bdcc2040030
Chazhoor A, Mounika Y, Vergin Raja Sarobin M, Sanjana MV, Yasashvini R (2020) Predictive maintenance using machine learning based classification models. IOP Conf Ser Mater Sci Eng. https://doi.org/10.1088/1757-899X/954/1/012001
Anghel I, Cioara T, Moldovan D, Salomie I, Tomus MM (2018) Prediction of manufacturing processes errors: gradient boosted trees versus deep neural networks. In Proceedings of the 16th international conference on embedded and ubiquitous computing EUC 2018, pp 29–36. https://doi.org/10.1109/EUC.2018.00012
Kao HA, Hsieh YS, Chen CH, Lee J (2017) Quality prediction modeling for multistage manufacturing based on classification and association rule mining. MATEC Web Conf. https://doi.org/10.1051/matecconf/201712300029
Moldovan D, Cioara T, Anghel I, Salomie I (2017) Machine learning for sensor-based manufacturing processes. https://doi.org/10.1109/ICCP.2017.8116997
Kim J, Han Y, Lee J (2016) Data imbalance problem solving for SMOTE based oversampling: study on fault detection prediction model in semiconductor manufacturing process. Adv Sci Technol Lett 133:79–84. https://doi.org/10.14257/astl.2016.133.15
Ko YC, Fujita H (2019) An evidential analytics for buried information in big data samples: case study of semiconductor manufacturing. Inf Sci (NY) 486:190–203. https://doi.org/10.1016/j.ins.2019.01.079
Takahashi Y, Asahara M, Shudo K (2019) A framework for model search across multiple machine learning implementations. arXiv:1908.10310
Moldovan D, Chifu V, Pop C, Cioara T, Anghel I, Salomie I (2018) Chicken swarm optimization and deep learning for manufacturing processes. In: Proceedings of the 17th RoEduNet IEEE international conference networking in education and research RoEduNet 2018, pp 18–23. https://doi.org/10.1109/ROEDUNET.2018.8514152
Kim JK, Han YS, Lee JS (2017) Particle swarm optimization–deep belief network–based rare class prediction model for highly class imbalance problem. Concurr Comput. https://doi.org/10.1002/cpe.4128
Cho E, Chang TW, Hwang G (2022) Data preprocessing combination to improve the performance of quality classification in the manufacturing process. Electronics. https://doi.org/10.3390/ELECTRONICS11030477
UCI Machine Learning Repository: SECOM Data Set, (n.d.). https://archive.ics.uci.edu/ml/datasets/SECOM (accessed January 17, 2021)
Batista GEAPA, Monard MC (2002) A study of k-nearest neighbour as an imputation method. Front Artif Intell Appl 87:251–260
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525. https://doi.org/10.1093/bioinformatics/17.6.520
De Silva H, Perera AS (2017) Missing data imputation using Evolutionary k-Nearest neighbor algorithm for gene expression data. In: 16th international conference on advances in ICT for emerging regions ICTer 2016—conference proceeding, pp 141–146. https://doi.org/10.1109/ICTER.2016.7829911
Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2:433–459. https://doi.org/10.1002/wics.101
Zhou Y, Mazzuchi TA, Sarkani S (2020) M-AdaBoost-A based ensemble system for network intrusion detection. Expert Syst Appl 162:113864. https://doi.org/10.1016/j.eswa.2020.113864
Josse J, Prost N, Scornet E, Varoquaux G, Josse J, Prost N, Scornet E, Varoquaux G, Josse J (2020) On the consistency of supervised learning with missing values
El Mourabit Y, El Habouz Y, Zougagh H, Wadiai Y (2020) Predictive system of semiconductor failures based on machine learning approach. Int J Adv Comput Sci Appl 11:199–203. https://doi.org/10.14569/IJACSA.2020.0111225
Twala BETH, Jones MC, Hand DJ (2008) Good methods for coping with missing data in decision trees. Pattern Recognit Lett 29:950–956. https://doi.org/10.1016/j.patrec.2008.01.010
Hsu C-W, Chang C-C, Lin C-J (2003) A practical guide to support vector classification. http://www.csie.ntu.edu.tw/~cjlin (accessed July 13, 2021)
Arora M, Bhambhu L, Tech Scholar M (2014) Role of scaling in data classification using SVM. Int J Adv Res Comput Sci Softw Eng 4:2277
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nuhu, A.A., Zeeshan, Q., Safaei, B. et al. Machine learning-based techniques for fault diagnosis in the semiconductor manufacturing process: a comparative study. J Supercomput 79, 2031–2081 (2023). https://doi.org/10.1007/s11227-022-04730-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04730-x