Skip to main content

An End-to-End Framework for Productive Use of Machine Learning in Software Analytics and Business Intelligence Solutions

  • Conference paper
  • First Online:
Product-Focused Software Process Improvement (PROFES 2020)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12562))

Abstract

Nowadays, machine learning (ML) is an integral component in a wide range of areas, including software analytics (SA) and business intelligence (BI). As a result, the interest in custom ML-based software analytics and business intelligence solutions is rising. In practice, however, such solutions often get stuck in a prototypical stage because setting up an infrastructure for deployment and maintenance is considered complex and time-consuming. For this reason, we aim at structuring the entire process and making it more transparent by deriving an end-to-end framework from existing literature for building and deploying ML-based software analytics and business intelligence solutions. The framework is structured in three iterative cycles representing different stages in a model’s lifecycle: prototyping, deployment, update. As a result, the framework specifically supports the transitions between these stages while also covering all important activities from data collection to retraining deployed ML models. To validate the applicability of the framework in practice, we compare it to and apply it in a real-world ML-based SA/BI solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://aws.amazon.com/.

  2. 2.

    https://aws.amazon.com/s3/.

  3. 3.

    https://aws.amazon.com/sagemaker/.

  4. 4.

    https://aws.amazon.com/quicksight/.

  5. 5.

    https://docs.aws.amazon.com/step-functions/latest/dg/concepts-python-sdk.html.

References

  1. Amershi, S., et al.: Software engineering for machine learning: a case study. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 291–300. IEEE (2019)

    Google Scholar 

  2. Arpteg, A., Brinne, B., Crnkovic-Friis, L., Bosch, J.: Software engineering challenges of deep learning. In: 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 50–59. IEEE (2018)

    Google Scholar 

  3. Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999). https://doi.org/10.1023/A:1007515423169

    Article  Google Scholar 

  4. Baylor, D., et al.: TFX: a tensorflow-based production-scale machine learning platform. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1387–1395 (2017)

    Google Scholar 

  5. Breck, E., Polyzotis, N., Roy, S., Whang, S.E., Zinkevich, M.: Data validation for machine learning. In: Conference on Systems and Machine Learning (2019)

    Google Scholar 

  6. Buse, R.P., Zimmermann, T.: Information needs for software development analytics. In: 34th International Conference on Software Engineering, pp. 987–996. IEEE (2012)

    Google Scholar 

  7. Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 36, 1165–1188 (2012)

    Article  Google Scholar 

  8. Chu, X., Ilyas, I.F., Krishnan, S., Wang, J.: Data cleaning: overview and emerging challenges. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2201–2206 (2016)

    Google Scholar 

  9. Crankshaw, D., et al.: The missing piece in complex analytics: low latency, scalable model management and serving with velox (2015)

    Google Scholar 

  10. Cuzzocrea, A., Song, I.Y., Davis, K.C.: Analytics over large-scale multidimensional data: the big data revolution! In: Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, pp. 101–104 (2011)

    Google Scholar 

  11. Dam, H.K., Tran, T., Ghose, A.: Explainable software analytics. In: Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results, pp. 53–56 (2018)

    Google Scholar 

  12. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)

    Article  Google Scholar 

  13. Dayal, U., Castellanos, M., Simitsis, A., Wilkinson, K.: Data integration flows for business intelligence. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 1–11 (2009)

    Google Scholar 

  14. Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)

    Article  Google Scholar 

  15. Figalist, I., Elsner, C., Bosch, J., Olsson, H.H.: Breaking the vicious circle: Why AI for software analytics and business intelligence does not take off in practice. In: 46th Euromicro Conference on Software Engineering and Advanced Applications. IEEE (2020)

    Google Scholar 

  16. Hernández, M.A., Stolfo, S.J.: Real-world data is dirty: data cleansing and the merge/purge problem. Data Min. Knowl. Disc. 2(1), 9–37 (1998). https://doi.org/10.1023/A:1009761603038

    Article  Google Scholar 

  17. Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)

    Article  MathSciNet  Google Scholar 

  18. Keele, S.: Guidelines for performing systematic literature reviews in software engineering. Technical report, Version 2.3 EBSE Technical Report (2007)

    Google Scholar 

  19. Khayyat, Z., et al.: BigDansing: a system for big data cleansing. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1215–1230 (2015)

    Google Scholar 

  20. Kim, M., Zimmermann, T., DeLine, R., Begel, A.: Data scientists in software teams: State of the art and challenges. IEEE Trans. Softw. Eng. 44(11), 1024–1038 (2017)

    Article  Google Scholar 

  21. Lin, J., Kolcz, A.: Large-scale machine learning at twitter. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 793–804 (2012)

    Google Scholar 

  22. Lwakatare, L.E., Raj, A., Crnkovic, I., Bosch, J., Olsson, H.H.: Large-scale machine learning systems in real-world industrial settings a review of challenges and solutions. Inf. Softw. Technol. 127, 106368 (2020)

    Google Scholar 

  23. Menzies, T., Zimmermann, T.: Software analytics: so what? IEEE Softw. 30(4), 31–37 (2013)

    Article  Google Scholar 

  24. Negash, S., Gray, P.: Business Intelligence. In: Handbook on Decision Support Systems 2. International Handbooks Information System. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-48716-6_9

  25. Olston, C., et al.: Tensorflow-serving: flexible, high-performance ml serving. In: Workshop on ML Systems at NIPS (2017)

    Google Scholar 

  26. Polyzotis, N., Roy, S., Whang, S.E., Zinkevich, M.: Data lifecycle challenges in production machine learning: a survey. ACM SIGMOD Rec. 47(2), 17–28 (2018)

    Article  Google Scholar 

  27. Rajaram, S., Mishra, K., O’mara, M.: Finite state automata that enables continuous delivery of machine learning models, US Patent App. 16/229,020, April 2020

    Google Scholar 

  28. Runeson, P., Höst, M., Rainer, A., Regnell, B.: Case study research in software engineering: guidelines and examples. Wiley, Hoboken (2012)

    Book  Google Scholar 

  29. Schelter, S., Lange, D., Schmidt, P., Celikel, M., Biessmann, F., Grafberger, A.: Automating large-scale data quality verification. Proc. VLDB Endow. 11(12), 1781–1794 (2018)

    Article  Google Scholar 

  30. Sculley, D.: Hidden technical debt in machine learning systems. In: Advances in neural information processing systems, pp. 2503–2511 (2015)

    Google Scholar 

  31. Sparks, E.R., Venkataraman, S., Kaftan, T., Franklin, M.J., Recht, B.: KeystoneML: Optimizing pipelines for large-scale advanced analytics. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 535–546. IEEE (2017)

    Google Scholar 

  32. Tata, S., et al.: Quick access: building a smart experience for google drive. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1643–1651 (2017)

    Google Scholar 

  33. Vartak, M., et al.: ModelDB: a system for machine learning model management. In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics (2016)

    Google Scholar 

  34. Vassiliadis, P.: A survey of extract-transform-load technology. Int. J. Data Warehous. Min. (IJDWM) 5(3), 1–27 (2009)

    Article  Google Scholar 

  35. Vassiliadis, P., Simitsis, A.: Extraction, transformation, and loading. Encycl. Database Syst. 10, 1–10 (2009)

    Google Scholar 

  36. Volkovs, M., Chiang, F., Szlichta, J., Miller, R.J.: Continuous data cleaning. In: 30th International Conference on Data Engineering, pp. 244–255. IEEE (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iris Figalist .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Figalist, I., Elsner, C., Bosch, J., Olsson, H.H. (2020). An End-to-End Framework for Productive Use of Machine Learning in Software Analytics and Business Intelligence Solutions. In: Morisio, M., Torchiano, M., Jedlitschka, A. (eds) Product-Focused Software Process Improvement. PROFES 2020. Lecture Notes in Computer Science(), vol 12562. Springer, Cham. https://doi.org/10.1007/978-3-030-64148-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64148-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64147-4

  • Online ISBN: 978-3-030-64148-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics