Skip to main content

Advertisement

Log in

Machine learning-based techniques for fault diagnosis in the semiconductor manufacturing process: a comparative study

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Industries are going through the fourth industrial revolution (Industry 4.0), where technologies like the Industrial Internet of things, big data analytics, and machine learning (ML) are extensively utilized to improve the productivity and efficiency of manufacturing systems and processes. This work aims to further investigate the applicability and improve the effectiveness of ML prediction models for fault diagnosis in the smart manufacturing process. Hence, we propose several methodologies and ML models for fault diagnosis for smart manufacturing process applications. A case study has been conducted on a real dataset from a semiconductor manufacturing (SECOM) process. However, this dataset contains missing values, noisy features, and class imbalance problem. This imbalance problem makes it so difficult to accurately predict the minority class, due to the majority class size difference. In the literature, efforts have been made to alleviate the class imbalance problem using several synthetic data generation techniques (SDGT) on the UCI machine learning repository SECOM dataset. In this work, to handle the imbalance problem, we employed, compared, and evaluated the feasibility of three SDGT on this dataset. To handle issues related to the missing values and noisy features, we implemented two missing values imputation techniques and feature selection techniques, respectively. We then developed and compared the performance of ten predictive ML models against these proposed methodologies. The results obtained across several evaluation metrics of performance were significant. A comparative analysis shows the feasibility and validate the effectiveness of these SDGT and the proposed methodologies. Some among the proposed methodologies could produce an accuracy in the range of 99.5% to 100%. Furthermore, based on a comparative analysis with similar models from the literature, our proposed models outpaced those proposed in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Data availability

All data generated or analyzed during this study are included in this published article.

Abbreviations

1NNC:

1-Nearest neighbor classifier

AdaBoost:

Adaptive boosting

ADASYN:

Adaptive synthetic oversampling

ANN:

Artificial neural network

AWSMOTE:

Adaptive-weighting SMOTE

BEBS:

Bagging of extrapolation borderline-SMOTE SVM

BSMOTE-SVM:

Borderline-SMOTE-SVM

CV:

Cross-validation

DL:

Deep learning

DT:

Decision tree

FN:

False negative

FP:

False positive

FPR:

False positive rate

FS:

Feature scaling

GBM:

Gradient boosting machine

k-NN:

K-nearest neighbor

k-NNI:

K-NN imputation

LDA:

Linear discriminant analysis

LR:

Logistic regression

MARS:

Multivariate adaptive regression splines

MI:

Mean imputation

ML:

Machine learning

MLP:

Multilayer perceptron

MRD:

Modified raw dataset

MWMOTE:

Majority weighted minority oversampling technique

NB:

Naïve Bayes

PCA:

Principal component analysis

PSO-DBN:

Particle swarm optimization–deep belief network

RF:

Random forest

RFE:

Recursive feature elimination

SDGT:

Synthetic data generation techniques

SECOM:

Semiconductor manufacturing

SELECTFDR:

Estimated false discovery rate

SMOTE:

Synthetic minority oversampling technique

SVM:

Support vector machines

TN:

True negative

TP:

True positive

TPR:

True positive rate

UCI SECOM dataset:

UCI machine learning repository SECOM dataset

UFS:

Univariate feature selection

XGBoost:

Extreme gradient boosted trees

References

  1. Lee DH, Yang JK, Lee CH, Kim KJ (2019) A data-driven approach to selection of critical process steps in the semiconductor manufacturing process considering missing and imbalanced data. J Manuf Syst 52:146–156. https://doi.org/10.1016/J.JMSY.2019.07.001

    Article  Google Scholar 

  2. Pfingsten T, Herrmann DJL, Schnitzler T, Feustel A, Schölkopf B (2007) Feature selection for troubleshooting in complex assembly lines. IEEE Trans Autom Sci Eng 4:465–469. https://doi.org/10.1109/TASE.2006.888054

    Article  Google Scholar 

  3. Mccann M, Li Y, Maquire L, Johnston A (2010) Causality challenge: benchmarking relevant signal components for effective monitoring and process control. J Mach Learn Res Work Conf Proc 6:277–288

    Google Scholar 

  4. Shin CK, Park SC (2000) A machine learning approach to yield management in semiconductor manufacturing. Int J Prod Res 38:4261–4271. https://doi.org/10.1080/00207540050205073

    Article  Google Scholar 

  5. Kumar N, Kennedy K, Gildersleeve K, Abelson R, Mastrangelo CM, Montgomery DC (2006) A review of yield modelling techniques for semiconductor manufacturing. Int J Prod Res 44:5019–5036. https://doi.org/10.1080/00207540600596874

    Article  MATH  Google Scholar 

  6. Chien CF, Wang WC, Cheng JC (2007) Data mining for yield enhancement in semiconductor manufacturing and an empirical study. Expert Syst Appl 33:192–198. https://doi.org/10.1016/j.eswa.2006.04.014

    Article  Google Scholar 

  7. Çinar ZM, Nuhu AA, Zeeshan Q, Korhan O, Asmael M, Safaei B (2020) Machine learning in predictive maintenance towards sustainable smart manufacturing in industry 4.0. Sustainability 12:8211. https://doi.org/10.3390/su12198211

    Article  Google Scholar 

  8. Kang S, An D, Rim J (2019) Incorporating virtual metrology into failure prediction. IEEE Trans Semicond Manuf 32:553–558. https://doi.org/10.1109/TSM.2019.2932377

    Article  Google Scholar 

  9. Su AJ, Jeng JC, Huang HP, Yu CC, Hung SY, Chao CK (2007) Control relevant issues in semiconductor manufacturing: overview with some new results. Control Eng Pract 15:1268–1279. https://doi.org/10.1016/j.conengprac.2006.11.003

    Article  Google Scholar 

  10. Mack CA (2011) Fiftyyears of Moore’ s law. IEEE Fellow 24:2008

    Google Scholar 

  11. Ghahramani M, Qiao Y, Zhou MC, O’Hagan A, Sweeney J, Hagan AO, Sweeney J (2020) AI-based modeling and data-driven evaluation for smart manufacturing processes. IEEE/CAA J Autom Sin 7:1026–1037. https://doi.org/10.1109/JAS.2020.1003114

    Article  Google Scholar 

  12. Zhang W, Yang D, Wang H (2019) Data-driven methods for predictive maintenance of industrial equipment: a survey. IEEE Syst J 13:2213–2227. https://doi.org/10.1109/jsyst.2019.2905565

    Article  Google Scholar 

  13. Wuest T, Weimer D, Irgens C, Thoben KD (2016) Machine learning in manufacturing: advantages, challenges, and applications. Prod Manuf Res 4:23–45. https://doi.org/10.1080/21693277.2016.1192517

    Article  Google Scholar 

  14. Carbery CM, Woods R, Marshall AH (2019) A new data analytics framework emphasising preprocessing of data to generate insights into complex manufacturing systems. Proc Inst Mech Eng Part C J Mech Eng Sci 233:6713–6726. https://doi.org/10.1177/0954406219866867

    Article  Google Scholar 

  15. Munirathinam S, Ramadoss B (2016) Predictive models for equipment fault detection in the semiconductor manufacturing process. Int J Eng Technol 8:273–285. https://doi.org/10.7763/ijet.2016.v8.898

    Article  Google Scholar 

  16. Kerdprasop K, Kerdprasop N (2011) A data mining approach to automate fault detection model development in the semiconductor manufacturing process. Int J Mech 5:336–344

    Google Scholar 

  17. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1002/eap.2043

    Article  MATH  Google Scholar 

  18. Susan S, Kumar A (2019) SSO Maj-SMOTE-SSO Min: three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput J 78:141–149. https://doi.org/10.1016/j.asoc.2019.02.028

    Article  Google Scholar 

  19. Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

  20. Nguyen HM, Cooper EW, Kamei K (2009) Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Paradig 3:4. https://doi.org/10.1504/ijkesdp.2011.039875

    Article  Google Scholar 

  21. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of international joint conference on neural networks, pp 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969

  22. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 5476 LNAI, pp 475–482. https://doi.org/10.1007/978-3-642-01307-2_43

  23. Last F, Douzas G, Bacao F (2017) Oversampling for imbalanced learning based on K-means and SMOTE, pp 1–19. https://doi.org/10.1016/j.ins.2018.06.056

  24. Barua S, Islam MM, Yao X, Murase K (2014) MWMOTE - Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26:405–425. https://doi.org/10.1109/TKDE.2012.232

    Article  Google Scholar 

  25. Wang JB, Zou CA, Fu GH (2021) AWSMOTE: an SVM-based adaptive weighted SMOTE for class-imbalance learning. Sci Program. https://doi.org/10.1155/2021/9947621

    Article  Google Scholar 

  26. Wang Q, Luo Z, Huang J, Feng Y, Liu Z (2017) A novel ensemble method for imbalanced data learning. Comput Intell Neurosci 2017:1–11

    Google Scholar 

  27. Wang L, Wang Y (2020) Application of machine learning for process control in semiconductor manufacturing. In: The ACM international conference proceeding series, pp 109–111. https://doi.org/10.1145/3424311.3424326

  28. Mat Jizat JA, Abdul Majeed APP, Ahmad AF, Taha Z, Yuen E (2021) Evaluation of the machine learning classifier in wafer defects classification. ICT Express. https://doi.org/10.1016/j.icte.2021.04.007

    Article  Google Scholar 

  29. Saqlain M, Abbas Q, Lee JY (2020) A deep convolutional neural network for wafer defect identification on an imbalanced dataset in semiconductor manufacturing processes. IEEE Trans Semicond Manuf 33:436–444. https://doi.org/10.1109/TSM.2020.2994357

    Article  Google Scholar 

  30. Stich P, Wahl M, Czerner P, Weber C, Fathi M (2020) Yield prediction in semiconductor manufacturing using an AI-based cascading classification system. https://doi.org/10.1109/EIT48999.2020.9208250

  31. Moldovan D, Anghel I, Cioara T, Salomie I (2020) Particle swarm optimization based deep learning ensemble for manufacturing processes. In: Proceedings of 2020 IEEE 16th international conference on intelligent computer communication and processing ICCP 2020, pp 563–570. https://doi.org/10.1109/ICCP51029.2020.9266269

  32. Moldovan D, Anghel I, Cioara T, Salomie I (2020) Machine learning in manufacturing: processes classification using support vector machine and Horse optimization algorithm. In: Proceedings of the RoEduNet IEEE international conference 2020-Decem. https://doi.org/10.1109/ROEDUNET51892.2020.9324855

  33. Lee KB, Kim CO (2020) Recurrent feature-incorporated convolutional neural network for virtual metrology of the chemical mechanical planarization process. J Intell Manuf 31:73–86. https://doi.org/10.1007/s10845-018-1437-4

    Article  Google Scholar 

  34. Fan SKS, Hsu CY, Jen CH, Chen KL, Juan LT (2020) Defective wafer detection using a denoising autoencoder for semiconductor manufacturing processes. Adv Eng Inform 46:101166. https://doi.org/10.1016/j.aei.2020.101166

    Article  Google Scholar 

  35. Salem M, Taheri S, Yuan J-S (2018) An experimental evaluation of fault diagnosis from imbalanced and incomplete data for smart semiconductor manufacturing. Big Data Cogn Comput 2:30. https://doi.org/10.3390/bdcc2040030

    Article  Google Scholar 

  36. Chazhoor A, Mounika Y, Vergin Raja Sarobin M, Sanjana MV, Yasashvini R (2020) Predictive maintenance using machine learning based classification models. IOP Conf Ser Mater Sci Eng. https://doi.org/10.1088/1757-899X/954/1/012001

    Article  Google Scholar 

  37. Anghel I, Cioara T, Moldovan D, Salomie I, Tomus MM (2018) Prediction of manufacturing processes errors: gradient boosted trees versus deep neural networks. In Proceedings of the 16th international conference on embedded and ubiquitous computing EUC 2018, pp 29–36. https://doi.org/10.1109/EUC.2018.00012

  38. Kao HA, Hsieh YS, Chen CH, Lee J (2017) Quality prediction modeling for multistage manufacturing based on classification and association rule mining. MATEC Web Conf. https://doi.org/10.1051/matecconf/201712300029

    Article  Google Scholar 

  39. Moldovan D, Cioara T, Anghel I, Salomie I (2017) Machine learning for sensor-based manufacturing processes. https://doi.org/10.1109/ICCP.2017.8116997

  40. Kim J, Han Y, Lee J (2016) Data imbalance problem solving for SMOTE based oversampling: study on fault detection prediction model in semiconductor manufacturing process. Adv Sci Technol Lett 133:79–84. https://doi.org/10.14257/astl.2016.133.15

    Article  Google Scholar 

  41. Ko YC, Fujita H (2019) An evidential analytics for buried information in big data samples: case study of semiconductor manufacturing. Inf Sci (NY) 486:190–203. https://doi.org/10.1016/j.ins.2019.01.079

    Article  Google Scholar 

  42. Takahashi Y, Asahara M, Shudo K (2019) A framework for model search across multiple machine learning implementations. arXiv:1908.10310

  43. Moldovan D, Chifu V, Pop C, Cioara T, Anghel I, Salomie I (2018) Chicken swarm optimization and deep learning for manufacturing processes. In: Proceedings of the 17th RoEduNet IEEE international conference networking in education and research RoEduNet 2018, pp 18–23. https://doi.org/10.1109/ROEDUNET.2018.8514152

  44. Kim JK, Han YS, Lee JS (2017) Particle swarm optimization–deep belief network–based rare class prediction model for highly class imbalance problem. Concurr Comput. https://doi.org/10.1002/cpe.4128

    Article  Google Scholar 

  45. Cho E, Chang TW, Hwang G (2022) Data preprocessing combination to improve the performance of quality classification in the manufacturing process. Electronics. https://doi.org/10.3390/ELECTRONICS11030477

    Article  Google Scholar 

  46. UCI Machine Learning Repository: SECOM Data Set, (n.d.). https://archive.ics.uci.edu/ml/datasets/SECOM (accessed January 17, 2021)

  47. Batista GEAPA, Monard MC (2002) A study of k-nearest neighbour as an imputation method. Front Artif Intell Appl 87:251–260

    Google Scholar 

  48. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525. https://doi.org/10.1093/bioinformatics/17.6.520

    Article  Google Scholar 

  49. De Silva H, Perera AS (2017) Missing data imputation using Evolutionary k-Nearest neighbor algorithm for gene expression data. In: 16th international conference on advances in ICT for emerging regions ICTer 2016—conference proceeding, pp 141–146. https://doi.org/10.1109/ICTER.2016.7829911

  50. Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2:433–459. https://doi.org/10.1002/wics.101

    Article  Google Scholar 

  51. Zhou Y, Mazzuchi TA, Sarkani S (2020) M-AdaBoost-A based ensemble system for network intrusion detection. Expert Syst Appl 162:113864. https://doi.org/10.1016/j.eswa.2020.113864

    Article  Google Scholar 

  52. Josse J, Prost N, Scornet E, Varoquaux G, Josse J, Prost N, Scornet E, Varoquaux G, Josse J (2020) On the consistency of supervised learning with missing values

  53. El Mourabit Y, El Habouz Y, Zougagh H, Wadiai Y (2020) Predictive system of semiconductor failures based on machine learning approach. Int J Adv Comput Sci Appl 11:199–203. https://doi.org/10.14569/IJACSA.2020.0111225

    Article  Google Scholar 

  54. Twala BETH, Jones MC, Hand DJ (2008) Good methods for coping with missing data in decision trees. Pattern Recognit Lett 29:950–956. https://doi.org/10.1016/j.patrec.2008.01.010

    Article  Google Scholar 

  55. Hsu C-W, Chang C-C, Lin C-J (2003) A practical guide to support vector classification. http://www.csie.ntu.edu.tw/~cjlin (accessed July 13, 2021)

  56. Arora M, Bhambhu L, Tech Scholar M (2014) Role of scaling in data classification using SVM. Int J Adv Res Comput Sci Softw Eng 4:2277

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Babak Safaei.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nuhu, A.A., Zeeshan, Q., Safaei, B. et al. Machine learning-based techniques for fault diagnosis in the semiconductor manufacturing process: a comparative study. J Supercomput 79, 2031–2081 (2023). https://doi.org/10.1007/s11227-022-04730-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04730-x

Keywords

Navigation