Cost-Sensitive Learning and Threshold-Moving Approach to Improve Industrial Lots Release Process on Imbalanced Datasets

Lobo, Armindo; Oliveira, Pedro; Sampaio, Paulo; Novais, Paulo

doi:10.1007/978-3-031-20859-1_28

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 583))

Included in the following conference series:

International Symposium on Distributed Computing and Artificial Intelligence

341 Accesses

Abstract

With Industry 4.0, companies must manage massive and generally imbalanced datasets. In an automotive company, the lots release decision process must cope with this problem by combining data from different sources to determine if a selected group of products can be released to the customers. This work focuses on this process and aims to classify the occurrence of customer complaints with a conception, tune and evaluation of five ML algorithms, namely XGBoost (XGB), LightGBM (LGBM), CatBoost (CatB), Random Forest(RF) and a Decision Tree (DT), based on an imbalanced dataset of automatic production tests. We used a non-sampling approach to deal with the problem of imbalanced datasets by analyzing two different methods, cost-sensitive learning and threshold-moving. Regarding the obtained results, both methods showed an effective impact on boosting algorithms, whereas RF only showed improvements with threshold-moving. Also, considering both approaches, the best overall results were achieved by the threshold-moving method, where RF obtained the best outcome with a F1-Score value of 76.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Building a Model with AutoML in Machine Faults Detection

A Comparative Study of Demand Forecasting Models for a Multi-Channel Retail Company: A Novel Hybrid Machine Learning Approach

Article 27 September 2022

Decision Making in Industry 4.0 Scenarios Supported by Imbalanced Data Classification

References

Rojko, A.: Industry 4.0 concept: Background and overview. Int. J. Interact. Mobile Technol. 11(5) (2017). https://doi.org/10.3991/ijim.v11i5.7072
Fathy, Y., Jaber, M., Brintrup, A.: Learning with imbalanced data in smart manufacturing: a comparative analysis. IEEE Access 9, 2734–2757 (2020). https://doi.org/10.1109/ACCESS.2020.3047838
Article Google Scholar
Costa, C.F., Nascimento, M.A.: Ida 2016 industrial challenge: using machine learning for predicting failures. In: International Symposium on Intelligent Data Analysis, pp. 381–386. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46349-0_33
Altinger, H., Herbold, S., Schneemann, F., Grabowski, J., Wotawa, F.: Performance tuning for automotive software fault prediction. In: 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 526–530. IEEE (2017). https://doi.org/10.1109/SANER.2017.7884667
Pereira, P.J., Pereira, A., Cortez, P., Pilastri, A.: A comparison of machine learning methods for extremely unbalanced industrial quality data. In: EPIA Conference on Artificial Intelligence, pp. 561–572. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86230-5_44
Lobo, A., Moreira, G.: Tests and Complaints. Mendeley Data V1,(2022). https://doi.org/10.17632/5xnj2z5z48.1
Kanter, J.M., Veeramachaneni, K.: Deep feature synthesis: Towards automating data science endeavors. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10. IEEE (2015). https://doi.org/10.1109/DSAA.2015.7344858
Jeni, L., Cohn, J., De La Torre, F.: Facing imbalanced data-recommendations for the use of performance metrics In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 245–251 (2013). https://doi.org/10.1109/ACII.2013.47
Sharma, H., Kumar, S.: A survey on decision tree algorithms of classification in data mining. Int. J. Sci. Res. (IJSR) 5(4), 2094–2097 (2016). https://doi.org/10.21275/v5i4.NOV162954
Article MathSciNet Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Al Daoud, E.: Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset. Int. J. Comput. Inf. Eng. 13(1), 6–10 (2019). https://doi.org/10.5281/zenodo.3607805
Article Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17, No. 1, pp. 973–978. Lawrence Erlbaum Associates Ltd (2001)
Google Scholar
Sheng, V.S., Ling, C.X.: Thresholding for making classifiers cost-sensitive. In: AAAI, vol. 6, pp. 476–81 (2006)
Google Scholar
Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006). https://doi.org/10.1145/1143844.1143874
Putatunda, S., Rama, K.: A comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of XGBoost. In: Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, pp. 6–10 (2018). https://doi.org/10.1145/3297067.3297080
Brownlee, J.: Probability for machine learning: discover how to harness uncertainty with Python. Machine Learning Mastery (2019)
Google Scholar

Download references

Acknowledgments

This work has been supported by FCT - Fundação para a Ciência e Tecnologia within the R &D Units Project Scope: UIDB/00319/2020.

Author information

Authors and Affiliations

ALGORITMI Centre, University of Minho, Braga, Portugal
Armindo Lobo, Pedro Oliveira, Paulo Sampaio & Paulo Novais

Authors

Armindo Lobo
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Sampaio
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Novais
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Armindo Lobo .

Editor information

Editors and Affiliations

Hiroshima University, Hiroshima, Japan
Sigeru Omatu
King Abdulaziz University, Jeddah, Saudi Arabia
Rashid Mehmood
Kielce University of Technology, Kielce, Poland
Pawel Sitek
Palazzo Camponeschi, University of L'Aquila, L'Aquila, Italy
Serafino Cicerone
BISITE, Edificio I+D+i, University of Salamanca, Salamanca, Spain
Sara Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lobo, A., Oliveira, P., Sampaio, P., Novais, P. (2023). Cost-Sensitive Learning and Threshold-Moving Approach to Improve Industrial Lots Release Process on Imbalanced Datasets. In: Omatu, S., Mehmood, R., Sitek, P., Cicerone, S., Rodríguez, S. (eds) Distributed Computing and Artificial Intelligence, 19th International Conference. DCAI 2022. Lecture Notes in Networks and Systems, vol 583. Springer, Cham. https://doi.org/10.1007/978-3-031-20859-1_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-20859-1_28
Published: 13 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20858-4
Online ISBN: 978-3-031-20859-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Cost-Sensitive Learning and Threshold-Moving Approach to Improve Industrial Lots Release Process on Imbalanced Datasets