ABSTRACT
Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested and verified?". This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called "feature labels (FLs)", and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups.
Aim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling.
Method: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company.
Results: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label.
Conclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results.
- Karan Aggarwal, , Finbarr Timbers, , Tanner Rutgers, , Abram Hindle, , Eleni Stroulia, , and Russell Greiner. 2017. Detecting duplicate bug reports with software engineering domain knowledge. Journal of Software: Evolution and Process 29 (2017). Issue 3. e1821 smr.1821.Google Scholar
- Raymond P. L. Buse and Thomas Zimmermann. 2012. Information Needs for Software Development Analytics. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). IEEE Press, Piscataway, NJ, USA, 10. Google ScholarDigital Library
- C. Ebert and H. Soubra. 2014. Functional Size Estimation Technologies for Software Maintenance. IEEE Software 31, 6 (Nov 2014).Google Scholar
- D. Han, C. Zhang, X. Fan, A. Hindle, K. Wong, and E. Stroulia. 2012. Understanding Android Fragmentation with Topic Analysis of Vendor-Specific Bugs. In 2012 19th Working Conference on Reverse Engineering. Google ScholarDigital Library
- Abram Hindle, Christian Bird, Thomas Zimmermann, , and Nachiappan Nagappan. 2012. Relating Requirements to Implementation via Topic Analysis: Do Topics Extracted from Requirements Make Sense to Managers and Developers?. In International Conference on Software Maintenance (ICSM 2012). IEEE. Google ScholarDigital Library
- E. Keenan, A. Czauderna, G. Leach, J. Cleland-Huang, Y. Shin, E. Moritz, M. Gethers, D. Poshyvanyk, J. Maletic, J. H. Hayes, A. Dekhtyar, D. Manukian, S. Hossein, and D. Hearn. 2012. TraceLab: An experimental workbench for equipping researchers to innovate, synthesize, and comparatively evaluate traceability solutions. In 2012 34th International Conference on Software Engineering (ICSE). Google ScholarDigital Library
- Stuart McIlroy, Nasir Ali, Hammad Khalid, and Ahmed E. Hassan. 2016. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering 21, 3 (01 Jun 2016). Google ScholarDigital Library
- Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N Nguyen, David Lo, and Chengnian Sun. 2012. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. ACM. Google ScholarDigital Library
- Annibale Panichella, Bogdan Dit, Rocco Oliveto, Massimiliano Di Penta, Denys Poshyvanyk, and Andrea De Lucia. 2013. How to Effectively Use Topic Models for Software Engineering Tasks? An Approach Based on Genetic Algorithms. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). IEEE Press, Piscataway, NJ, USA. Google ScholarDigital Library
- Miroslaw Staron, Wilhelm Meding, JÃűrgen Hansson, Christoffer HÃűglund, Kent Niesel, and Vilhelm Bergmann. 2014. Chapter 8 - Dashboards for Continuous Monitoring of Quality for Software Product under Development. In Relating System Quality and Software Architecture, Ivan Mistrik, Rami Bahsoon, Peter Eeles, Roshanak Roshandel, and Michael Stal (Eds.). Morgan Kaufmann, Boston.Google Scholar
- D. Suleiman, M. Alian, and A. Hudaib. 2017. A survey on prioritization regression testing test case. In 2017 8th International Conference on Information Technology (ICIT).Google Scholar
- Stephen W Thomas, Hadi Hemmati, Ahmed E Hassan, and Dorothea Blostein. 2014. Static test case prioritization using topic models. Empirical Software Engineering 19, 1 (2014). Google ScholarDigital Library
- C. Treude and M. A. Storey. 2010. Awareness 2.0: staying aware of projects, developers and tasks using dashboards and feeds. In 2010 ACM/IEEE32nd International Conference on Software Engineering, Vol. 1. Google ScholarDigital Library
- Shin Yoo and Mark Harman. 2012. Regression testing minimization, selection and prioritization: a survey. Software Testing, Verification and Reliability 22, 2 (2012). Google ScholarDigital Library
- Kovair Marketing, "White Paper - ALM and Integrated ALM", https://www.kovair.com/What-are-ALM-and-Integrated-ALM.pdfGoogle Scholar
- Radim Rehůřek and Petr Sojka, "Software Framework for Topic Modeling with Large Corpora", Proc. of the LREC Workshop, 45--50, 2010Google Scholar
- Pedregosa, F. et. al., "Scikit-learn: Machine Learning in Python", J. Mach. Learn. Res., 12, 2825--2830, 2011 Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation", J. Mach. Learn. Res., 3, 993--1022, 2003 Google ScholarCross Ref
Index Terms
- Automatic topic classification of test cases using text mining at an Android smartphone vendor
Recommendations
A framework for testing Android apps by reusing test cases
MOBILESoft '19: Proceedings of the 6th International Conference on Mobile Software Engineering and SystemsAndroid apps are generally developed by an individual developer or a small team of developers, and the developers may not have experience of testing Android apps or they may not have experience of testing any software systems. Furthermore, even an ...
A Static Approach to Prioritizing JUnit Test Cases
Test case prioritization is used in regression testing to schedule the execution order of test cases so as to expose faults earlier in testing. Over the past few years, many test case prioritization techniques have been proposed in the literature. Most ...
Improving Fault Detection Capability by Selectively Retaining Test Cases during Test Suite Reduction
Software testing is a critical part of software development. As new test cases are generated over time due to software modifications, test suite sizes may grow significantly. Because of time and resource constraints for testing, test suite minimization ...
Comments