ABSTRACT
Machine Learning (ML) based solutions are becoming increasingly popular and pervasive. When testing such solutions, there is a tendency to focus on improving the ML metrics such as the F1-score and accuracy at the expense of ensuring business value and correctness by covering business requirements. In this work, we adapt test planning methods of classical software to ML solutions. We use combinatorial modeling methodology to define the space of business requirements and map it to the ML solution data, and use the notion of data slices to identify the weaker areas of the ML solution and strengthen them. We apply our approach to three real-world case studies and demonstrate its value.
- K. Z. Bell. Optimizing Effectiveness and Efficiency of Software Testing: A Hybrid Approach. PhD thesis, North Carolina State University, 2006. Google ScholarDigital Library
- K. Z. Bell and M. A. Vouk. On effectiveness of pairwise methodology for testing network-centric software. In 2005 International Conference on Information and Communication Technology, pages 221–235, 2005.Google ScholarCross Ref
- E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley. What’s your ml test score? a rubric for ml production systems. In Reliable Machine Learning in the Wild - NIPS 2016 Workshop (2016), 2016.Google Scholar
- K. Burroughs, A. Jain, and R. Erickson. Improved quality of protocol testing through techniques of experimental design. In SUPERCOMM/ICC, pages 745–752, 1994.Google ScholarCross Ref
- M. B. Cohen, J. Snyder, and G. Rothermel. Testing across configurations: implications for combinatorial testing. SIGSOFT Softw. Eng. Notes, 31(6):1–9, 2006. Google ScholarDigital Library
- S. R. Dalal, A. Jain, N. Karunanithi, J. M. Leaton, C. M. Lott, G. C. Patton, and B. M. Horowitz. Model-Based Testing in Practice. In ICSE, pages 285–294, 1999. Google ScholarDigital Library
- Bridging the Gap between ML Solutions and Their Business Requirements... ESEC/FSE ’19, August 26–30, 2019, Tallinn, EstoniaGoogle Scholar
- M. Grindal, B. Lindström, J. Offutt, and S. F. Andler. An evaluation of combination strategies for test case selection. Empirical Softw. Eng., 11(4):583–611, 2006. Google ScholarDigital Library
- T. K. Ho. Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1, ICDAR ’95, pages 278–, Washington, DC, USA, 1995. IEEE Computer Society. Google ScholarDigital Library
- Text classification with movie reviews. TensorFlow, 2019.Google Scholar
- Kaggle. Kaggle imdb movie reviews dataset, 2018.Google Scholar
- W. Kan. Kaggle lending club loan data, 2016.Google Scholar
- D. R. Kuhn, R. N. Kacker, and Y. Lei. Introduction to Combinatorial Testing. Chapman & Hall/CRC, 2013. Google ScholarDigital Library
- D. R. Kuhn and M. J. Reilly. An investigation of the applicability of design of experiments to software testing. 27th NASA/IEEE Software Engineering Workshop, NASA Goddard Space Flight Center, 2002. Google ScholarDigital Library
- D. R. Kuhn, D. R. Wallace, and A. M. Gallo, Jr. Software fault interactions and implications for software testing. IEEE Trans. Softw. Eng., 30(6):418–421, 2004. Google ScholarDigital Library
- S. Masuda, H. Nakamura, and K. Kajitani. Rule-based searching for collision test cases of autonomous vehicles simulation. IET Intelligent Transport Systems, 12:1088–1095(7), 2018.Google ScholarCross Ref
- A. Ng. Ai transformation playbook how to lead your company into the ai era. Landing AI, 2018.Google Scholar
- D. M. W. Powers. Evaluation: From precision, recall and f-measure to roc., informedness, markedness & correlation. Journal of Machine Learning Technologies, 2(1):37–63, 2011.Google ScholarCross Ref
- D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young. Machine learning: The high interest credit card of technical debt. In SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop), 2014.Google Scholar
- Linearsvc. SKlearn documentation, 2019.Google Scholar
- D. R. Wallace and D. R. Kuhn. Failure modes in medical device software: an analysis of 15 years of recall data. In ACS/ IEEE International Conference on Computer Systems and Applications, pages 301–311, 2001.Google ScholarCross Ref
- A. W. Williams. Determination of test configurations for pair-wise interaction coverage. In TestCom, pages 59–74, 2000. Google ScholarDigital Library
- P. Wojciak and R. Tzoref-Brill. System Level Combinatorial Testing in Practice – The Concurrent Maintenance Case Study. In ICST, pages 103–112, 2014. Google ScholarDigital Library
- Z. Zhang and J. Zhang. Characterizing failure-causing parameter interactions by adaptive testing. In Proceedings of the 2011 International Symposium on Software Testing and Analysis, ISSTA ’11, pages 331–341, 2011. Google ScholarDigital Library
- M. Zinkevich. Rules of machine learning: Best practices for ml engineering. Google, 2018.Google Scholar
Index Terms
- Bridging the gap between ML solutions and their business requirements using feature interactions
Recommendations
The role of Reinforcement Learning in software testing
Abstract Context:Software testing is applied to validate the behavior of the software system and identify flaws and bugs. Different machine learning technique types such as supervised and unsupervised learning were utilized in software testing. However, ...
Capturing Strategic Business Requirements: An Exploratory Study
APSEC '12: Proceedings of the 2012 19th Asia-Pacific Software Engineering Conference - Volume 01Since IT systems are used to support business strategies of an organisation, strategic business requirements therefore are the motivation of the majority the requirements engineering approaches. But current requirements engineering approaches do not ...
Comments