Skip to main content

Cost-Sensitive Learning and Threshold-Moving Approach to Improve Industrial Lots Release Process on Imbalanced Datasets

  • Conference paper
  • First Online:
Distributed Computing and Artificial Intelligence, 19th International Conference (DCAI 2022)

Abstract

With Industry 4.0, companies must manage massive and generally imbalanced datasets. In an automotive company, the lots release decision process must cope with this problem by combining data from different sources to determine if a selected group of products can be released to the customers. This work focuses on this process and aims to classify the occurrence of customer complaints with a conception, tune and evaluation of five ML algorithms, namely XGBoost (XGB), LightGBM (LGBM), CatBoost (CatB), Random Forest(RF) and a Decision Tree (DT), based on an imbalanced dataset of automatic production tests. We used a non-sampling approach to deal with the problem of imbalanced datasets by analyzing two different methods, cost-sensitive learning and threshold-moving. Regarding the obtained results, both methods showed an effective impact on boosting algorithms, whereas RF only showed improvements with threshold-moving. Also, considering both approaches, the best overall results were achieved by the threshold-moving method, where RF obtained the best outcome with a F1-Score value of 76.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Rojko, A.: Industry 4.0 concept: Background and overview. Int. J. Interact. Mobile Technol. 11(5) (2017). https://doi.org/10.3991/ijim.v11i5.7072

  2. Fathy, Y., Jaber, M., Brintrup, A.: Learning with imbalanced data in smart manufacturing: a comparative analysis. IEEE Access 9, 2734–2757 (2020). https://doi.org/10.1109/ACCESS.2020.3047838

    Article  Google Scholar 

  3. Costa, C.F., Nascimento, M.A.: Ida 2016 industrial challenge: using machine learning for predicting failures. In: International Symposium on Intelligent Data Analysis, pp. 381–386. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46349-0_33

  4. Altinger, H., Herbold, S., Schneemann, F., Grabowski, J., Wotawa, F.: Performance tuning for automotive software fault prediction. In: 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 526–530. IEEE (2017). https://doi.org/10.1109/SANER.2017.7884667

  5. Pereira, P.J., Pereira, A., Cortez, P., Pilastri, A.: A comparison of machine learning methods for extremely unbalanced industrial quality data. In: EPIA Conference on Artificial Intelligence, pp. 561–572. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86230-5_44

  6. Lobo, A., Moreira, G.: Tests and Complaints. Mendeley Data V1,(2022). https://doi.org/10.17632/5xnj2z5z48.1

  7. Kanter, J.M., Veeramachaneni, K.: Deep feature synthesis: Towards automating data science endeavors. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10. IEEE (2015). https://doi.org/10.1109/DSAA.2015.7344858

  8. Jeni, L., Cohn, J., De La Torre, F.: Facing imbalanced data-recommendations for the use of performance metrics In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 245–251 (2013). https://doi.org/10.1109/ACII.2013.47

  9. Sharma, H., Kumar, S.: A survey on decision tree algorithms of classification in data mining. Int. J. Sci. Res. (IJSR) 5(4), 2094–2097 (2016). https://doi.org/10.21275/v5i4.NOV162954

    Article  MathSciNet  Google Scholar 

  10. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  11. Al Daoud, E.: Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset. Int. J. Comput. Inf. Eng. 13(1), 6–10 (2019). https://doi.org/10.5281/zenodo.3607805

    Article  Google Scholar 

  12. Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17, No. 1, pp. 973–978. Lawrence Erlbaum Associates Ltd (2001)

    Google Scholar 

  13. Sheng, V.S., Ling, C.X.: Thresholding for making classifiers cost-sensitive. In: AAAI, vol. 6, pp. 476–81 (2006)

    Google Scholar 

  14. Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006). https://doi.org/10.1145/1143844.1143874

  15. Putatunda, S., Rama, K.: A comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of XGBoost. In: Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, pp. 6–10 (2018). https://doi.org/10.1145/3297067.3297080

  16. Brownlee, J.: Probability for machine learning: discover how to harness uncertainty with Python. Machine Learning Mastery (2019)

    Google Scholar 

Download references

Acknowledgments

This work has been supported by FCT - Fundação para a Ciência e Tecnologia within the R &D Units Project Scope: UIDB/00319/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Armindo Lobo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lobo, A., Oliveira, P., Sampaio, P., Novais, P. (2023). Cost-Sensitive Learning and Threshold-Moving Approach to Improve Industrial Lots Release Process on Imbalanced Datasets. In: Omatu, S., Mehmood, R., Sitek, P., Cicerone, S., Rodríguez, S. (eds) Distributed Computing and Artificial Intelligence, 19th International Conference. DCAI 2022. Lecture Notes in Networks and Systems, vol 583. Springer, Cham. https://doi.org/10.1007/978-3-031-20859-1_28

Download citation

Publish with us

Policies and ethics