Machine learning-based techniques for fault diagnosis in the semiconductor manufacturing process: a comparative study

Nuhu, Abubakar Abdussalam; Zeeshan, Qasim; Safaei, Babak; Shahzad, Muhammad Atif

doi:10.1007/s11227-022-04730-x

Machine learning-based techniques for fault diagnosis in the semiconductor manufacturing process: a comparative study

Published: 06 August 2022

Volume 79, pages 2031–2081, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Abubakar Abdussalam Nuhu¹,
Qasim Zeeshan¹,
Babak Safaei ORCID: orcid.org/0000-0002-1675-4902^1,2 &
…
Muhammad Atif Shahzad³

1559 Accesses
6 Citations
Explore all metrics

Abstract

Industries are going through the fourth industrial revolution (Industry 4.0), where technologies like the Industrial Internet of things, big data analytics, and machine learning (ML) are extensively utilized to improve the productivity and efficiency of manufacturing systems and processes. This work aims to further investigate the applicability and improve the effectiveness of ML prediction models for fault diagnosis in the smart manufacturing process. Hence, we propose several methodologies and ML models for fault diagnosis for smart manufacturing process applications. A case study has been conducted on a real dataset from a semiconductor manufacturing (SECOM) process. However, this dataset contains missing values, noisy features, and class imbalance problem. This imbalance problem makes it so difficult to accurately predict the minority class, due to the majority class size difference. In the literature, efforts have been made to alleviate the class imbalance problem using several synthetic data generation techniques (SDGT) on the UCI machine learning repository SECOM dataset. In this work, to handle the imbalance problem, we employed, compared, and evaluated the feasibility of three SDGT on this dataset. To handle issues related to the missing values and noisy features, we implemented two missing values imputation techniques and feature selection techniques, respectively. We then developed and compared the performance of ten predictive ML models against these proposed methodologies. The results obtained across several evaluation metrics of performance were significant. A comparative analysis shows the feasibility and validate the effectiveness of these SDGT and the proposed methodologies. Some among the proposed methodologies could produce an accuracy in the range of 99.5% to 100%. Furthermore, based on a comparative analysis with similar models from the literature, our proposed models outpaced those proposed in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment

Article Open access 07 December 2023

Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities

Article Open access 25 July 2020

Artificial intelligence, cyber-threats and Industry 4.0: challenges and opportunities

Article 04 February 2021

Data availability

All data generated or analyzed during this study are included in this published article.

Abbreviations

1NNC:: 1-Nearest neighbor classifier
AdaBoost:: Adaptive boosting
ADASYN:: Adaptive synthetic oversampling
ANN:: Artificial neural network
AWSMOTE:: Adaptive-weighting SMOTE
BEBS:: Bagging of extrapolation borderline-SMOTE SVM
BSMOTE-SVM:: Borderline-SMOTE-SVM
CV:: Cross-validation
DL:: Deep learning
DT:: Decision tree
FN:: False negative
FP:: False positive
FPR:: False positive rate
FS:: Feature scaling
GBM:: Gradient boosting machine
k-NN:: K-nearest neighbor
k-NNI:: K-NN imputation
LDA:: Linear discriminant analysis
LR:: Logistic regression
MARS:: Multivariate adaptive regression splines
MI:: Mean imputation
ML:: Machine learning
MLP:: Multilayer perceptron
MRD:: Modified raw dataset
MWMOTE:: Majority weighted minority oversampling technique
NB:: Naïve Bayes
PCA:: Principal component analysis
PSO-DBN:: Particle swarm optimization–deep belief network
RF:: Random forest
RFE:: Recursive feature elimination
SDGT:: Synthetic data generation techniques
SECOM:: Semiconductor manufacturing
SELECTFDR:: Estimated false discovery rate
SMOTE:: Synthetic minority oversampling technique
SVM:: Support vector machines
TN:: True negative
TP:: True positive
TPR:: True positive rate
UCI SECOM dataset:: UCI machine learning repository SECOM dataset
UFS:: Univariate feature selection
XGBoost:: Extreme gradient boosted trees

References

Lee DH, Yang JK, Lee CH, Kim KJ (2019) A data-driven approach to selection of critical process steps in the semiconductor manufacturing process considering missing and imbalanced data. J Manuf Syst 52:146–156. https://doi.org/10.1016/J.JMSY.2019.07.001
Article Google Scholar
Pfingsten T, Herrmann DJL, Schnitzler T, Feustel A, Schölkopf B (2007) Feature selection for troubleshooting in complex assembly lines. IEEE Trans Autom Sci Eng 4:465–469. https://doi.org/10.1109/TASE.2006.888054
Article Google Scholar
Mccann M, Li Y, Maquire L, Johnston A (2010) Causality challenge: benchmarking relevant signal components for effective monitoring and process control. J Mach Learn Res Work Conf Proc 6:277–288
Google Scholar
Shin CK, Park SC (2000) A machine learning approach to yield management in semiconductor manufacturing. Int J Prod Res 38:4261–4271. https://doi.org/10.1080/00207540050205073
Article Google Scholar
Kumar N, Kennedy K, Gildersleeve K, Abelson R, Mastrangelo CM, Montgomery DC (2006) A review of yield modelling techniques for semiconductor manufacturing. Int J Prod Res 44:5019–5036. https://doi.org/10.1080/00207540600596874
Article MATH Google Scholar
Chien CF, Wang WC, Cheng JC (2007) Data mining for yield enhancement in semiconductor manufacturing and an empirical study. Expert Syst Appl 33:192–198. https://doi.org/10.1016/j.eswa.2006.04.014
Article Google Scholar
Çinar ZM, Nuhu AA, Zeeshan Q, Korhan O, Asmael M, Safaei B (2020) Machine learning in predictive maintenance towards sustainable smart manufacturing in industry 4.0. Sustainability 12:8211. https://doi.org/10.3390/su12198211
Article Google Scholar
Kang S, An D, Rim J (2019) Incorporating virtual metrology into failure prediction. IEEE Trans Semicond Manuf 32:553–558. https://doi.org/10.1109/TSM.2019.2932377
Article Google Scholar
Su AJ, Jeng JC, Huang HP, Yu CC, Hung SY, Chao CK (2007) Control relevant issues in semiconductor manufacturing: overview with some new results. Control Eng Pract 15:1268–1279. https://doi.org/10.1016/j.conengprac.2006.11.003
Article Google Scholar
Mack CA (2011) Fiftyyears of Moore’ s law. IEEE Fellow 24:2008
Google Scholar
Ghahramani M, Qiao Y, Zhou MC, O’Hagan A, Sweeney J, Hagan AO, Sweeney J (2020) AI-based modeling and data-driven evaluation for smart manufacturing processes. IEEE/CAA J Autom Sin 7:1026–1037. https://doi.org/10.1109/JAS.2020.1003114
Article Google Scholar
Zhang W, Yang D, Wang H (2019) Data-driven methods for predictive maintenance of industrial equipment: a survey. IEEE Syst J 13:2213–2227. https://doi.org/10.1109/jsyst.2019.2905565
Article Google Scholar
Wuest T, Weimer D, Irgens C, Thoben KD (2016) Machine learning in manufacturing: advantages, challenges, and applications. Prod Manuf Res 4:23–45. https://doi.org/10.1080/21693277.2016.1192517
Article Google Scholar
Carbery CM, Woods R, Marshall AH (2019) A new data analytics framework emphasising preprocessing of data to generate insights into complex manufacturing systems. Proc Inst Mech Eng Part C J Mech Eng Sci 233:6713–6726. https://doi.org/10.1177/0954406219866867
Article Google Scholar
Munirathinam S, Ramadoss B (2016) Predictive models for equipment fault detection in the semiconductor manufacturing process. Int J Eng Technol 8:273–285. https://doi.org/10.7763/ijet.2016.v8.898
Article Google Scholar
Kerdprasop K, Kerdprasop N (2011) A data mining approach to automate fault detection model development in the semiconductor manufacturing process. Int J Mech 5:336–344
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1002/eap.2043
Article MATH Google Scholar
Susan S, Kumar A (2019) SSO Maj-SMOTE-SSO Min: three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput J 78:141–149. https://doi.org/10.1016/j.asoc.2019.02.028
Article Google Scholar
Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
Nguyen HM, Cooper EW, Kamei K (2009) Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Paradig 3:4. https://doi.org/10.1504/ijkesdp.2011.039875
Article Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of international joint conference on neural networks, pp 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 5476 LNAI, pp 475–482. https://doi.org/10.1007/978-3-642-01307-2_43
Last F, Douzas G, Bacao F (2017) Oversampling for imbalanced learning based on K-means and SMOTE, pp 1–19. https://doi.org/10.1016/j.ins.2018.06.056
Barua S, Islam MM, Yao X, Murase K (2014) MWMOTE - Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26:405–425. https://doi.org/10.1109/TKDE.2012.232
Article Google Scholar
Wang JB, Zou CA, Fu GH (2021) AWSMOTE: an SVM-based adaptive weighted SMOTE for class-imbalance learning. Sci Program. https://doi.org/10.1155/2021/9947621
Article Google Scholar
Wang Q, Luo Z, Huang J, Feng Y, Liu Z (2017) A novel ensemble method for imbalanced data learning. Comput Intell Neurosci 2017:1–11
Google Scholar
Wang L, Wang Y (2020) Application of machine learning for process control in semiconductor manufacturing. In: The ACM international conference proceeding series, pp 109–111. https://doi.org/10.1145/3424311.3424326
Mat Jizat JA, Abdul Majeed APP, Ahmad AF, Taha Z, Yuen E (2021) Evaluation of the machine learning classifier in wafer defects classification. ICT Express. https://doi.org/10.1016/j.icte.2021.04.007
Article Google Scholar
Saqlain M, Abbas Q, Lee JY (2020) A deep convolutional neural network for wafer defect identification on an imbalanced dataset in semiconductor manufacturing processes. IEEE Trans Semicond Manuf 33:436–444. https://doi.org/10.1109/TSM.2020.2994357
Article Google Scholar
Stich P, Wahl M, Czerner P, Weber C, Fathi M (2020) Yield prediction in semiconductor manufacturing using an AI-based cascading classification system. https://doi.org/10.1109/EIT48999.2020.9208250
Moldovan D, Anghel I, Cioara T, Salomie I (2020) Particle swarm optimization based deep learning ensemble for manufacturing processes. In: Proceedings of 2020 IEEE 16th international conference on intelligent computer communication and processing ICCP 2020, pp 563–570. https://doi.org/10.1109/ICCP51029.2020.9266269
Moldovan D, Anghel I, Cioara T, Salomie I (2020) Machine learning in manufacturing: processes classification using support vector machine and Horse optimization algorithm. In: Proceedings of the RoEduNet IEEE international conference 2020-Decem. https://doi.org/10.1109/ROEDUNET51892.2020.9324855
Lee KB, Kim CO (2020) Recurrent feature-incorporated convolutional neural network for virtual metrology of the chemical mechanical planarization process. J Intell Manuf 31:73–86. https://doi.org/10.1007/s10845-018-1437-4
Article Google Scholar
Fan SKS, Hsu CY, Jen CH, Chen KL, Juan LT (2020) Defective wafer detection using a denoising autoencoder for semiconductor manufacturing processes. Adv Eng Inform 46:101166. https://doi.org/10.1016/j.aei.2020.101166
Article Google Scholar
Salem M, Taheri S, Yuan J-S (2018) An experimental evaluation of fault diagnosis from imbalanced and incomplete data for smart semiconductor manufacturing. Big Data Cogn Comput 2:30. https://doi.org/10.3390/bdcc2040030
Article Google Scholar
Chazhoor A, Mounika Y, Vergin Raja Sarobin M, Sanjana MV, Yasashvini R (2020) Predictive maintenance using machine learning based classification models. IOP Conf Ser Mater Sci Eng. https://doi.org/10.1088/1757-899X/954/1/012001
Article Google Scholar
Anghel I, Cioara T, Moldovan D, Salomie I, Tomus MM (2018) Prediction of manufacturing processes errors: gradient boosted trees versus deep neural networks. In Proceedings of the 16th international conference on embedded and ubiquitous computing EUC 2018, pp 29–36. https://doi.org/10.1109/EUC.2018.00012
Kao HA, Hsieh YS, Chen CH, Lee J (2017) Quality prediction modeling for multistage manufacturing based on classification and association rule mining. MATEC Web Conf. https://doi.org/10.1051/matecconf/201712300029
Article Google Scholar
Moldovan D, Cioara T, Anghel I, Salomie I (2017) Machine learning for sensor-based manufacturing processes. https://doi.org/10.1109/ICCP.2017.8116997
Kim J, Han Y, Lee J (2016) Data imbalance problem solving for SMOTE based oversampling: study on fault detection prediction model in semiconductor manufacturing process. Adv Sci Technol Lett 133:79–84. https://doi.org/10.14257/astl.2016.133.15
Article Google Scholar
Ko YC, Fujita H (2019) An evidential analytics for buried information in big data samples: case study of semiconductor manufacturing. Inf Sci (NY) 486:190–203. https://doi.org/10.1016/j.ins.2019.01.079
Article Google Scholar
Takahashi Y, Asahara M, Shudo K (2019) A framework for model search across multiple machine learning implementations. arXiv:1908.10310
Moldovan D, Chifu V, Pop C, Cioara T, Anghel I, Salomie I (2018) Chicken swarm optimization and deep learning for manufacturing processes. In: Proceedings of the 17th RoEduNet IEEE international conference networking in education and research RoEduNet 2018, pp 18–23. https://doi.org/10.1109/ROEDUNET.2018.8514152
Kim JK, Han YS, Lee JS (2017) Particle swarm optimization–deep belief network–based rare class prediction model for highly class imbalance problem. Concurr Comput. https://doi.org/10.1002/cpe.4128
Article Google Scholar
Cho E, Chang TW, Hwang G (2022) Data preprocessing combination to improve the performance of quality classification in the manufacturing process. Electronics. https://doi.org/10.3390/ELECTRONICS11030477
Article Google Scholar
UCI Machine Learning Repository: SECOM Data Set, (n.d.). https://archive.ics.uci.edu/ml/datasets/SECOM (accessed January 17, 2021)
Batista GEAPA, Monard MC (2002) A study of k-nearest neighbour as an imputation method. Front Artif Intell Appl 87:251–260
Google Scholar
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525. https://doi.org/10.1093/bioinformatics/17.6.520
Article Google Scholar
De Silva H, Perera AS (2017) Missing data imputation using Evolutionary k-Nearest neighbor algorithm for gene expression data. In: 16th international conference on advances in ICT for emerging regions ICTer 2016—conference proceeding, pp 141–146. https://doi.org/10.1109/ICTER.2016.7829911
Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2:433–459. https://doi.org/10.1002/wics.101
Article Google Scholar
Zhou Y, Mazzuchi TA, Sarkani S (2020) M-AdaBoost-A based ensemble system for network intrusion detection. Expert Syst Appl 162:113864. https://doi.org/10.1016/j.eswa.2020.113864
Article Google Scholar
Josse J, Prost N, Scornet E, Varoquaux G, Josse J, Prost N, Scornet E, Varoquaux G, Josse J (2020) On the consistency of supervised learning with missing values
El Mourabit Y, El Habouz Y, Zougagh H, Wadiai Y (2020) Predictive system of semiconductor failures based on machine learning approach. Int J Adv Comput Sci Appl 11:199–203. https://doi.org/10.14569/IJACSA.2020.0111225
Article Google Scholar
Twala BETH, Jones MC, Hand DJ (2008) Good methods for coping with missing data in decision trees. Pattern Recognit Lett 29:950–956. https://doi.org/10.1016/j.patrec.2008.01.010
Article Google Scholar
Hsu C-W, Chang C-C, Lin C-J (2003) A practical guide to support vector classification. http://www.csie.ntu.edu.tw/~cjlin (accessed July 13, 2021)
Arora M, Bhambhu L, Tech Scholar M (2014) Role of scaling in data classification using SVM. Int J Adv Res Comput Sci Softw Eng 4:2277
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mechanical Engineering, Eastern Mediterranean University, Famagusta, North Cyprus via Mersin 10, Turkey
Abubakar Abdussalam Nuhu, Qasim Zeeshan & Babak Safaei
Department of Mechanical Engineering Science, University of Johannesburg, Johannesburg, Gauteng, 2006, South Africa
Babak Safaei
Department of Industrial Engineering, King Abdulaziz University, Jeddah, Saudi Arabia
Muhammad Atif Shahzad

Authors

Abubakar Abdussalam Nuhu
View author publications
You can also search for this author in PubMed Google Scholar
Qasim Zeeshan
View author publications
You can also search for this author in PubMed Google Scholar
Babak Safaei
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Atif Shahzad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Babak Safaei.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nuhu, A.A., Zeeshan, Q., Safaei, B. et al. Machine learning-based techniques for fault diagnosis in the semiconductor manufacturing process: a comparative study. J Supercomput 79, 2031–2081 (2023). https://doi.org/10.1007/s11227-022-04730-x

Download citation

Accepted: 16 July 2022
Published: 06 August 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11227-022-04730-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning-based techniques for fault diagnosis in the semiconductor manufacturing process: a comparative study

Abstract

Access this article

Similar content being viewed by others

A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment

Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities

Artificial intelligence, cyber-threats and Industry 4.0: challenges and opportunities

Data availability

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Machine learning-based techniques for fault diagnosis in the semiconductor manufacturing process: a comparative study

Abstract

Access this article

Similar content being viewed by others

A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment

Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities

Artificial intelligence, cyber-threats and Industry 4.0: challenges and opportunities

Data availability

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation