research-article

Automatic topic classification of test cases using text mining at an Android smartphone vendor

Authors:
Junji Shimagaki

Kyushu University, Fukuoka, Japan

Kyushu University, Fukuoka, Japan
View Profile

,
Yasutaka Kamei

Kyushu University, Fukuoka, Japan

Kyushu University, Fukuoka, Japan
View Profile

,
Naoyasu Ubayashi

Kyushu University, Fukuoka, Japan

Kyushu University, Fukuoka, Japan
View Profile

,
Abram Hindle

University of Alberta, Edmonton, Canada

University of Alberta, Edmonton, Canada
View Profile

ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and MeasurementOctober 2018Article No.: 32Pages 1–10https://doi.org/10.1145/3239235.3268927

Published:11 October 2018Publication History

ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

Pages 1–10

ABSTRACT

Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested and verified?". This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called "feature labels (FLs)", and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups.

Aim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling.

Method: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company.

Results: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label.

Conclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results.

References

Karan Aggarwal, , Finbarr Timbers, , Tanner Rutgers, , Abram Hindle, , Eleni Stroulia, , and Russell Greiner. 2017. Detecting duplicate bug reports with software engineering domain knowledge. Journal of Software: Evolution and Process 29 (2017). Issue 3. e1821 smr.1821.Google Scholar
Raymond P. L. Buse and Thomas Zimmermann. 2012. Information Needs for Software Development Analytics. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). IEEE Press, Piscataway, NJ, USA, 10. Google ScholarDigital Library
C. Ebert and H. Soubra. 2014. Functional Size Estimation Technologies for Software Maintenance. IEEE Software 31, 6 (Nov 2014).Google Scholar
D. Han, C. Zhang, X. Fan, A. Hindle, K. Wong, and E. Stroulia. 2012. Understanding Android Fragmentation with Topic Analysis of Vendor-Specific Bugs. In 2012 19th Working Conference on Reverse Engineering. Google ScholarDigital Library
Abram Hindle, Christian Bird, Thomas Zimmermann, , and Nachiappan Nagappan. 2012. Relating Requirements to Implementation via Topic Analysis: Do Topics Extracted from Requirements Make Sense to Managers and Developers?. In International Conference on Software Maintenance (ICSM 2012). IEEE. Google ScholarDigital Library
E. Keenan, A. Czauderna, G. Leach, J. Cleland-Huang, Y. Shin, E. Moritz, M. Gethers, D. Poshyvanyk, J. Maletic, J. H. Hayes, A. Dekhtyar, D. Manukian, S. Hossein, and D. Hearn. 2012. TraceLab: An experimental workbench for equipping researchers to innovate, synthesize, and comparatively evaluate traceability solutions. In 2012 34th International Conference on Software Engineering (ICSE). Google ScholarDigital Library
Stuart McIlroy, Nasir Ali, Hammad Khalid, and Ahmed E. Hassan. 2016. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering 21, 3 (01 Jun 2016). Google ScholarDigital Library
Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N Nguyen, David Lo, and Chengnian Sun. 2012. Duplicate bug report detection with a combination of information retrieval and topic modeling. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. ACM. Google ScholarDigital Library
Annibale Panichella, Bogdan Dit, Rocco Oliveto, Massimiliano Di Penta, Denys Poshyvanyk, and Andrea De Lucia. 2013. How to Effectively Use Topic Models for Software Engineering Tasks? An Approach Based on Genetic Algorithms. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). IEEE Press, Piscataway, NJ, USA. Google ScholarDigital Library
Miroslaw Staron, Wilhelm Meding, JÃűrgen Hansson, Christoffer HÃűglund, Kent Niesel, and Vilhelm Bergmann. 2014. Chapter 8 - Dashboards for Continuous Monitoring of Quality for Software Product under Development. In Relating System Quality and Software Architecture, Ivan Mistrik, Rami Bahsoon, Peter Eeles, Roshanak Roshandel, and Michael Stal (Eds.). Morgan Kaufmann, Boston.Google Scholar
D. Suleiman, M. Alian, and A. Hudaib. 2017. A survey on prioritization regression testing test case. In 2017 8th International Conference on Information Technology (ICIT).Google Scholar
Stephen W Thomas, Hadi Hemmati, Ahmed E Hassan, and Dorothea Blostein. 2014. Static test case prioritization using topic models. Empirical Software Engineering 19, 1 (2014). Google ScholarDigital Library
C. Treude and M. A. Storey. 2010. Awareness 2.0: staying aware of projects, developers and tasks using dashboards and feeds. In 2010 ACM/IEEE32nd International Conference on Software Engineering, Vol. 1. Google ScholarDigital Library
Shin Yoo and Mark Harman. 2012. Regression testing minimization, selection and prioritization: a survey. Software Testing, Verification and Reliability 22, 2 (2012). Google ScholarDigital Library
Kovair Marketing, "White Paper - ALM and Integrated ALM", https://www.kovair.com/What-are-ALM-and-Integrated-ALM.pdfGoogle Scholar
Radim Rehůřek and Petr Sojka, "Software Framework for Topic Modeling with Large Corpora", Proc. of the LREC Workshop, 45--50, 2010Google Scholar
Pedregosa, F. et. al., "Scikit-learn: Machine Learning in Python", J. Mach. Learn. Res., 12, 2825--2830, 2011 Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation", J. Mach. Learn. Res., 3, 993--1022, 2003 Google ScholarCross Ref

Index Terms

Automatic topic classification of test cases using text mining at an Android smartphone vendor
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation

Recommendations

A framework for testing Android apps by reusing test cases
MOBILESoft '19: Proceedings of the 6th International Conference on Mobile Software Engineering and Systems

Android apps are generally developed by an individual developer or a small team of developers, and the developers may not have experience of testing Android apps or they may not have experience of testing any software systems. Furthermore, even an ...
Read More
A Static Approach to Prioritizing JUnit Test Cases

Test case prioritization is used in regression testing to schedule the execution order of test cases so as to expose faults earlier in testing. Over the past few years, many test case prioritization techniques have been proposed in the literature. Most ...
Read More
Improving Fault Detection Capability by Selectively Retaining Test Cases during Test Suite Reduction

Software testing is a critical part of software development. As new test cases are generated over time due to software modifications, test suite sizes may grow significantly. Because of time and resource constraints for testing, test suite minimization ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
October 2018
487 pages
ISBN:9781450358231
DOI:10.1145/3239235
General Chair:
Markku Oivo
University of Oulu, Finland
,
Program Chairs:
Daniel Méndez
Technical University of Munich, Germany
,
Audris Mockus
University of Tennessee
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Industry Paper
Author Tags
classification
dashboard
software testing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate130of594submissions,22%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 194
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic topic classification of test cases using text mining at an Android smartphone vendor

ESEM '18: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement

ABSTRACT

References

Cited By

Index Terms

Recommendations

A framework for testing Android apps by reusing test cases

A Static Approach to Prioritizing JUnit Test Cases

Improving Fault Detection Capability by Selectively Retaining Test Cases during Test Suite Reduction