Skip to main content

Challenges of Machine Learning for Data Streams in the Banking Industry

  • Conference paper
  • First Online:
Book cover Big Data Analytics (BDA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13147))

Included in the following conference series:

Abstract

Banking Information Systems continuously generate large quantities of data as inter-connected streams (transactions, events logs, time series, metrics, graphs, process, etc.). Such data streams need to be processed online to deal with critical business applications such as real-time fraud detection, network security attack prevention or predictive maintenance on information system infrastructure. Many algorithms have been proposed for data stream learning, however, most of them do not deal with the important challenges and constraints imposed by real-world applications. In particular, when we need to train models incrementally from heterogeneous data mining and deployment them within complex big data architecture. Based on banking applications and lessons learned in production environments of BNP Paribas - a major international banking group and leader in the Eurozone - we identified the most important current challenges for mining IT data streams. Our goal is to highlight the key challenges faced by data scientists and data engineers within complex industry settings for building or deploying models for real word streaming applications. We provide future research directions on Stream Learning that will accelerate the adoption of online learning models for solving real-word problems. Therefore bridging the gap between research and industry communities. Finally, we provide some recommendations to tackle some of these challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Noah Fiedel talk at TensorFlow Dev Summit 2017 - https://www.youtube.com/watch?v=q_IkJcPyNl0 at 2,24”.

  2. 2.

    https://www.academia.edu/33091102/Anti_Money_Laundering_Model_in_Banking_System.

  3. 3.

    BWT refers to Transferring funds between banks to eliminate source of dirty and criminal money.

  4. 4.

    https://group.bnpparibas/uploads/file/bnpparibas_2019_integrated_report_en.pdf.

  5. 5.

    Presented at the 6th International Workshop on Quantitative Approaches to Software Quality - 2018.

  6. 6.

    https://www.ebf.eu/wp-content/uploads/2020/03/EBF-AI-paper-_final-.pdf.

  7. 7.

    Due to space restrictions, for more details, we refer the reader to the official documentation of technologies of Apache Kafka, Apache Nifi, Apcahe Flink, and Jenkins.

References

  1. AEPD: GDPR compliance of processings that embed Artificial Intelligence An introduction. The Spanish Data Protection Agency (2020). https://www.aepd.es/sites/default/files/2020-07/adecuacion-rgpd-ia-en.pdf. Accessed 10 Dec 2020

  2. Apache: The Apache Software Foundation (2021). https://www.apache.org//. Accessed 10 May 2021

  3. Du, M., Li, F., Zheng, G., Srikumar, V.: Deeplog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 1285–1298. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3133956.3134015

  4. EBF: European Banking Federation, EBF position paper on AI in the banking industry. EU Transparency Register, ID number: 4722660838–23 (2019). https://www.ebf.eu/wp-content/uploads/2020/03/EBF_037419-Artificial-Intelligence-in-the-banking-sector-EBF.pdf. Accessed 15 May 2021

  5. Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. ACM SIGKDD Explor. Newsl. 14(2), 1–5 (2013). https://doi.org/10.1145/2481244.2481246

    Article  Google Scholar 

  6. Flink: Apache Flink, a framework and distributed processing engine (2021). https://flink.apache.org/. Accessed 04 Mar 2021

  7. Gitlab: Gitlab, Create a Jenkins Pipeline (2021). https://about.gitlab.com/handbook/customer-success/demo-systems/tutorials/integrations/create-jenkins-pipeline/. Accessed 10 May 2021

  8. He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: an online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 33–40 (2017)

    Google Scholar 

  9. He, S., He, P., Chen, Z., Yang, T., Su, Y., Lyu, M.: A survey on automated log analysis for reliability engineering (September 2020)

    Google Scholar 

  10. Hoi, S., Sahoo, D., Lu, J., Zhao, P.: Online learning: a comprehensive survey (February 2018)

    Google Scholar 

  11. Jenkins: The leading open source automation server for deploying projects (2021). https://www.jenkins.io/. Accessed 16 Apr 2021

  12. Kafka: Apache Kafka, an open-source distributed event streaming platform (2021). https://kafka.apache.org/. Accessed 5 May 2021

  13. Krempl, G., et al.: Open challenges for data stream mining research. ACM SIGKDD Explor. Newsl. 16, 1–10 (2014). https://doi.org/10.1145/2674026.2674028

    Article  Google Scholar 

  14. Kubeflow: The Machine Learning Toolkit for Kubernetes (2021). https://www.kubeflow.org/. Accessed 01 Apr 2021

  15. Kubernetes: Automating deployment containerized applications (2021). https://kubernetes.io. Accessed 04 Mar 2021

  16. Manzoor, E., Milajerdi, S., Venkatakrishnan, V., Akoglu, L.: Fast memory-efficient anomaly detection in streaming heterogeneous graphs (February 2016)

    Google Scholar 

  17. Mcgregor, A.: Graph stream algorithms: a survey. SIGMOD Rec. 43, 9–20 (2014)

    Article  Google Scholar 

  18. Meng, F.J., Wegman, M.N., Xu, J.M., Zhang, X., Chen, P., Chafle, G.: It troubleshooting with drift analysis in the DevOps era. IBM J. Res. Dev. 61(1), 6:62-6:73 (2017)

    Article  Google Scholar 

  19. Montiel, J., et al.: River: machine learning for streaming data in Python (2020)

    Google Scholar 

  20. NIFI: Apache NIFI, an easy to use, powerful, and reliable system to process and distribute data (2021). https://nifi.apache.org/. Accessed 10 May 2021

  21. Tan, S., Ting, K., Liu, F.T.: Fast anomaly detection for streaming data, pp. 1511–1516 (January 2011). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-254

  22. Wu, W., Gruenwald, L.: Research issues in mining multiple data streams, pp. 56–60 (July 2010)

    Google Scholar 

  23. Zhu, J., et al.: Tools and benchmarks for automated log parsing (November 2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Albert Bifet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barry, M., Bifet, A., Chiky, R., Montiel, J., Tran, VT. (2021). Challenges of Machine Learning for Data Streams in the Banking Industry. In: Srirama, S.N., Lin, J.CW., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds) Big Data Analytics. BDA 2021. Lecture Notes in Computer Science(), vol 13147. Springer, Cham. https://doi.org/10.1007/978-3-030-93620-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93620-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93619-8

  • Online ISBN: 978-3-030-93620-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics