skip to main content
10.1145/3611643.3616361acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Automata-Based Trace Analysis for Aiding Diagnosing GUI Testing Tools for Android

Published: 30 November 2023 Publication History

Abstract

Benchmarking software testing tools against known bugs is a classic approach to evaluating the tools’ bug finding abilities. However, this approach is difficult to give some clues on the tool-missed bugs to aid diagnosing the testing tools. As a result, heavy and ad hoc manual analysis is needed. In this work, in the setting of GUI testing for Android apps, we introduce an automata-based trace analysis approach to tackling the key challenge of manual analysis, i.e., how to analyze the lengthy event traces generated by a testing tool against a missed bug to find the clues. Our key idea is that, we model a bug in the form of a finite automaton which captures its bug-triggering traces; and match the event traces generated by the testing tool (which misses this bug) against this automaton to obtain the clues. Specifically, the clues are presented in the form of three designated automata-based coverage values. We apply our approach to enhance Themis, a representative benchmark suite for Android, to aid diagnosing GUI testing tools. Our extensive evaluation on nine state-of-the-art GUI testing tools and the involvement with several tool developers shows that our approach is feasible and useful. Our approach enables Themis+ (the enhanced benchmark suite) to provide the clues on the tool-missed bugs, and all the Themis+’s clues are identical or useful, compared to the manual analysis results of tool developers. Moreover, the clues have helped find several tool weaknesses, which were unknown or unclear before. Based on the clues, two actively-developing industrial testing tools in our study have quickly made several optimizations and demonstrated their improved bug finding abilities. All the tool developers give positive feedback on the usefulness and usability of Themis+’s clues. Themis+ is available at https://github.com/DDroid-Android/home.

Supplementary Material

Video (fse23main-p1454-p-video.mp4)
"Benchmarking software testing tools against known bugs is a classic approach to evaluating the tools’ bug finding abilities. However, this approach is difficult to give some clues on the tool-missed bugs to aid diagnosing the testing tools. As a result, heavy and ad hoc manual analysis is needed. In this work, in the setting of GUI testing for Android apps, we introduce an emph{automata-based trace analysis} approach to tackling the key challenge of manual analysis, i.e., how to analyze the lengthy event traces generated by a testing tool against a missed bug to find the clues. Our emph{key} idea is that, we model a bug in the form of a finite automaton which captures its bug-triggering traces; and match the event traces generated by the testing tool (which misses this bug) against this automaton to obtain the clues. Specifically, the clues are presented in the form of three designated automata-based coverage values. We apply our approach to enhance Themis, a representative benchmark suite for Android, to aid diagnosing GUI testing tools. Our extensive evaluation on nine state-of-the-art GUI testing tools and the involvement with several tool developers shows that our approach is emph{feasible} and emph{useful}. Our approach enables Themis+ (the enhanced benchmark suite) to provide the clues on the tool-missed bugs, and emph{all} the Themis+’s clues are identical or useful, compared to the manual analysis results of tool developers. Moreover, the clues have helped find several tool weaknesses, which were unknown or unclear before. Based on the clues, two actively-developing industrial testing tools in our study have quickly made several optimizations and demonstrated their improved bug finding abilities. emph{All} the tool developers give positive feedback on the usefulness and usability of Themis+’s clues."

References

[1]
Domenico Amalfitano, Anna Rita Fasolino, Porfirio Tramontana, Bryan Dzung Ta, and Atif M. Memon. 2015. MobiGUITAR: Automated Model-Based Testing of Mobile Apps. IEEE Software, 32, 5 (2015), 53–59. https://doi.org/10.1109/MS.2014.55
[2]
Ape. 2022. Ape’s event generation strategy. https://github.com/tianxiaogu/ape/blob/master/src/com/android/commands/monkey/ape/tree/GUITreeBuilder.java##L261
[3]
Iván Arcuschin, Juan Pablo Galeotti, and Diego Garbervetsky. 2021. An Empirical Study on How Sapienz Achieves Coverage and Crash Detection. Journal of Software: Evolution and Process, e2411. https://doi.org/10.1002/smr.2411
[4]
ASM team. 2023. ASM: an all purpose Java bytecode manipulation and analysis framework. https://asm.ow2.io/
[5]
Anton Babenko, Leonardo Mariani, and Fabrizio Pastore. 2009. AVA: Automated Interpretation of Dynamically Detected Anomalies. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis (ISSTA ’09). Association for Computing Machinery, New York, NY, USA. 237–248. isbn:9781605583389 https://doi.org/10.1145/1572272.1572300
[6]
Young-Min Baek and Doo-Hwan Bae. 2016. Automated Model-Based Android GUI Testing Using Multi-Level GUI Comparison Criteria. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). 238–249. https://doi.org/10.1145/2970276.2970313
[7]
Ezio Bartocci, Yliès Falcone, Adrian Francalanza, and Giles Reger. 2018. Introduction to runtime verification. In Lectures on Runtime Verification. 1–33. https://doi.org/10.1007/978-3-319-75632-5_1
[8]
Farnaz Behrang and Alessandro Orso. 2020. Seven Reasons Why: An In-Depth Study of the Limitations of Random Test Input Generation for Android. In 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1066–1077. https://doi.org/10.1145/3324884.3416567
[9]
Eric Bruneton, Romain Lenglet, and Thierry Coupaye. 2002. ASM: A code manipulation tool to implement adaptable systems. In Adaptable and extensible component systems.
[10]
Tianqin Cai, Zhao Zhang, and Ping Yang. 2020. Fastbot: A Multi-Agent Model-Based Test Generation System. In IEEE/ACM 1st International Conference on Automation of Software Test (AST). 93–96. https://doi.org/10.1145/3387903.3389308
[11]
Wontae Choi, George Necula, and Koushik Sen. 2013. Guided GUI Testing of Android Apps with Minimal Restart and Approximate Learning. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA). 623–640. https://doi.org/10.1145/2509136.2509552
[12]
Shauvik Roy Choudhary, Alessandra Gorla, and Alessandro Orso. 2015. Automated Test Input Generation for Android: Are We There Yet? (E). In 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 429–440. https://doi.org/10.1109/ASE.2015.89
[13]
Philip Daian, Yliès Falcone, Patrick O’Neil Meredith, Traian-Florin Serbanuta, Shin’ichi Shiriashi, Akihito Iwai, and Grigore Rosu. 2015. RV-Android: Efficient Parametric Android Runtime Verification, a Brief Tutorial. In 6th International Conference on Runtime Verification (RV) (Lecture Notes in Computer Science, Vol. 9333). 342–357. https://doi.org/10.1007/978-3-319-23820-3_24
[14]
DDroid. 2022. Themis+’s clues. https://github.com/DDroid-Android/home/blob/main/README.md##html-report
[15]
DDroid. 2023. Supplementary materials for RQ2 and RQ3. https://github.com/DDroid-Android/home/blob/main/supplementary-material.pdf
[16]
Brendan Dolan-Gavitt, Patrick Hulin, Engin Kirda, Tim Leek, Andrea Mambretti, William K. Robertson, Frederick Ulrich, and Ryan Whelan. 2016. LAVA: Large-Scale Automated Vulnerability Addition. In IEEE Symposium on Security and Privacy (SP). 110–121. https://doi.org/10.1109/SP.2016.15
[17]
Zhen Dong, Marcel Böhme, Lucia Cojocaru, and Abhik Roychoudhury. 2020. Time-travel Testing of Android Apps. In Proceedings of the 42nd International Conference on Software Engineering (ICSE). 481–492. https://doi.org/10.1145/3377811.3380402
[18]
DroidBot. 2022. DroidBot’s event generation strategy. https://github.com/honeynet/droidbot/blob/master/droidbot/device_state.py##L401
[19]
Caleb Evans. 2021. automata-lib(5.0.0). https://pypi.org/project/automata-lib/
[20]
Yliès Falcone, Sebastian Currea, and Mohamad Jaber. 2012. Runtime Verification and Enforcement for Android Applications with RV-Droid. In Third International Conference on Runtime Verification (RV) (Lecture Notes in Computer Science, Vol. 7687). 88–95. https://doi.org/10.1007/978-3-642-35632-2_11
[21]
Fastbot team. 2022. Fastbot(2.0). https://github.com/bytedance/Fastbot_Android
[22]
gcov team. 2023. gcov-a Test Coverage Program. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html
[23]
Tegan George. 2022. Semi-Structured Interview: Definition, Guide and Examples. https://www.scribbr.com/methodology/semi-structured-interview/
[24]
Dimitra Giannakopoulou and Klaus Havelund. 2001. Automata-Based Verification of Temporal Properties on Running Programs. In 16th IEEE International Conference on Automated Software Engineering (ASE). 412–416. https://doi.org/10.1109/ASE.2001.989841
[25]
Google Android team. 2023. Transform. https://developer.android.com/reference/tools/gradle-api/7.0/com/android/build/api/transform/Transform
[26]
Google Inc. 2022. Monkey. https://developer.android.com/studio/test/monkey
[27]
Tianxiao Gu, Chengnian Sun, Xiaoxing Ma, Chun Cao, Chang Xu, Yuan Yao, Qirun Zhang, Jian Lu, and Zhendong Su. 2019. Practical GUI Testing of Android Applications via Model Abstraction and Refinement. In Proceedings of the 41st International Conference on Software Engineering (ICSE). 269–280. https://doi.org/10.1109/ICSE.2019.00042
[28]
Wunan Guo, Liwei Shen, Ting Su, Xin Peng, and Weiyang Xie. 2020. Improving Automated GUI Exploration of Android Apps via Static Dependency Analysis. In IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 557–568. https://doi.org/10.1109/ICSME46990.2020.00059
[29]
Ahmad Hazimeh, Adrian Herrera, and Mathias Payer. 2020. Magma: A Ground-Truth Fuzzing Benchmark. Proc. ACM Meas. Anal. Comput. Syst., 4, 3 (2020), 49:1–49:29. https://doi.org/10.1145/3428334
[30]
John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. 2001. Introduction to automata theory, languages, and computation, 2nd Edition. isbn:978-0-201-44124-6
[31]
Siw Elisabeth Hove and Bente Anda. 2005. Experiences from Conducting Semi-structured Interviews in Empirical Software Engineering Research. In 11th IEEE International Symposium on Software Metrics (METRICS). 23. https://doi.org/10.1109/METRICS.2005.24
[32]
Raphaël Jakse, Yliès Falcone, Jean-François Méhaut, and Kevin Pouget. 2017. Interactive Runtime Verification - When Interactive Debugging Meets Runtime Verification. In 28th IEEE International Symposium on Software Reliability Engineering (ISSRE). 182–193. https://doi.org/10.1109/ISSRE.2017.19
[33]
René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: a database of existing faults to enable controlled testing studies for Java programs. In International Symposium on Software Testing and Analysis (ISSTA). 437–440. https://doi.org/10.1145/2610384.2628055
[34]
George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating Fuzz Testing. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS). 2123–2138. https://doi.org/10.1145/3243734.3243804
[35]
Pingfan Kong, Li Li, Jun Gao, Kui Liu, Tegawendé F. Bissyandé, and Jacques Klein. 2019. Automated Testing of Android Apps: A Systematic Literature Review. IEEE Trans. Reliability, 68, 1 (2019), 45–66. https://doi.org/10.1109/TR.2018.2865733
[36]
Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2017. DroidBot: a lightweight UI-guided test input generator for Android. In Proceedings of the 39th International Conference on Software Engineering (ICSE). 23–26. https://doi.org/10.1109/ICSE-C.2017.8
[37]
Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2019. Humanoid: A Deep Learning-Based Approach to Automated Black-box Android App Testing. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1070–1073. https://doi.org/10.1109/ASE.2019.00104
[38]
Aravind Machiry, Rohan Tahiliani, and Mayur Naik. 2013. Dynodroid: An Input Generation System for Android Apps. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE). 224–234. https://doi.org/10.1145/2491411.2491450
[39]
Ke Mao, Mark Harman, and Yue Jia. 2016. Sapienz: Multi-Objective Automated Testing for Android Applications. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA). 94–105. https://doi.org/10.1145/2931037.2931054
[40]
Gabriel Matni and Michel R. Dagenais. 2009. Automata-based approach for kernel trace analysis. In Proceedings of the 22nd Canadian Conference on Electrical and Computer Engineering (CCECE). 970–973. https://doi.org/10.1109/CCECE.2009.5090273
[41]
Maubis. 2019. Scarlet Notes. https://play.google.com/store/apps/details?id=com.bijoysingh.quicknote
[42]
Maubis. 2019. Scarlet Notes’s issue 114. https://github.com/BijoySingh/Scarlet-Notes/issues/114
[43]
Atif M. Memon, Mary Lou Soffa, and Martha E. Pollack. 2001. Coverage criteria for GUI testing. In Proceedings of the 8th European Software Engineering Conference held jointly with 9th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE). 256–267. https://doi.org/10.1145/503209.503244
[44]
Patrick O’Neil Meredith, Dongyun Jin, Dennis Griffith, Feng Chen, and Grigore Rosu. 2012. An overview of the MOP runtime verification framework. Int. J. Softw. Tools Technol. Transf., 14, 3 (2012), 249–289. https://doi.org/10.1007/s10009-011-0198-6
[45]
Minxue Pan, An Huang, Guoxin Wang, Tian Zhang, and Xuandong Li. 2020. Reinforcement Learning Based Curiosity-Driven Testing of Android Applications. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 153–164. https://doi.org/10.1145/3395363.3397354
[46]
Jay Patel. 2018. JFlap(7.1). https://www.jflap.org/
[47]
Priyam Patel, Gokul Srinivasan, Sydur Rahaman, and Iulian Neamtiu. 2018. On the effectiveness of random testing for Android: or how I learned to stop worrying and love the monkey. In Proceedings of the 13th International Workshop on Automation of Software Test (AST). 34–37. https://doi.org/10.1145/3194733.3194742
[48]
Giles Reger. 2014. Automata based monitoring and mining of execution traces. Ph. D. Dissertation. University of Manchester, UK. http://www.manchester.ac.uk/escholar/uk-ac-man-scw:225931
[49]
Jakub Riegel. 2018. PUTflap(1.0). https://github.com/jakubriegel/PUTflap
[50]
Leon Sell, Michael Auer, Christoph Frädrich, Michael Gruber, Philemon Werli, and Gordon Fraser. 2019. An Empirical Evaluation of Search Algorithms for App Testing. In International Conference on Testing Software and Systems (ICTSS). 11812, 123–139. https://doi.org/10.1007/978-3-030-31280-0_8
[51]
Wei Song, Xiangxing Qian, and Jeff Huang. 2017. EHBDroid: beyond GUI testing for Android applications. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 27–37. https://doi.org/10.1109/ASE.2017.8115615
[52]
Ting Su, Lingling Fan, Sen Chen, Yang Liu, Lihua Xu, Geguang Pu, and Zhendong Su. 2020. Why My App Crashes Understanding and Benchmarking Framework-specific Exceptions of Android apps. IEEE Transactions on Software Engineering, https://doi.org/10.1109/TSE.2020.3013438
[53]
Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu, Yang Liu, and Zhendong Su. 2017. Guided, Stochastic Model-Based GUI Testing of Android Apps. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE). 245–256. https://doi.org/10.1145/3106237.3106298
[54]
Ting Su, Jue Wang, and Zhendong Su. 2021. Benchmarking automated GUI testing for Android against real-world bugs. In 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 119–130. https://doi.org/10.1145/3468264.3468620
[55]
Ting Su, Jue Wang, and Zhendong Su. 2021. The Themis Benchmark. https://github.com/the-themis-benchmarks/home
[56]
Ting Su, Yichen Yan, Jue Wang, Jingling Sun, Yiheng Xiong, Geguang Pu, Ke Wang, and Zhendong Su. 2021. Fully automated functional fuzzing of Android apps for detecting non-crashing logic bugs. Proceedings of the ACM on Programming Languages (OOPSLA), 1–31. https://doi.org/10.1145/3485533
[57]
Haiyang Sun, Andrea Rosà, Omar Javed, and Walter Binder. 2017. ADRENALIN-RV: Android Runtime Verification Using Load-Time Weaving. In 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST). 532–539. https://doi.org/10.1109/ICST.2017.61
[58]
Porfirio Tramontana, Domenico Amalfitano, Nicola Amatucci, and Anna Rita Fasolino. 2019. Automated functional testing of mobile applications: a systematic mapping study. Software Quality Journal, 27, 1 (2019), 149–201. https://doi.org/10.1007/s11219-018-9418-6
[59]
Thomas Vogel, Chinh Tran, and Lars Grunske. 2021. A comprehensive empirical evaluation of generating test suites for mobile applications with diversity. Inf. Softw. Technol., 130 (2021), 106436. https://doi.org/10.1016/j.infsof.2020.106436
[60]
Jue Wang, Yanyan Jiang, Chang Xu, Chun Cao, Xiaoxing Ma, and Jian Lu. 2020. ComboDroid: Generating High-Quality Test Inputs for Android Apps via Use Case Combinations. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE). 469–480. isbn:9781450371216 https://doi.org/10.1145/3377811.3380382
[61]
Wenyu Wang, Dengfeng Li, Wei Yang, Yurui Cao, Zhenwen Zhang, Yuetang Deng, and Tao Xie. 2018. An empirical study of Android test generation tools in industrial cases. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE). 738–748. https://doi.org/10.1145/3238147.3240465
[62]
Wenyu Wang, Wei Yang, Tianyin Xu, and Tao Xie. 2021. Vet: identifying and avoiding UI exploration tarpits. In 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 83–94. https://doi.org/10.1145/3468264.3468554
[63]
Tyler Wendland, Jingyang Sun, Junayed Mahmud, S. M. Hasan Mansur, Steven Huang, Kevin Moran, Julia Rubin, and Mattia Fazzini. 2021. Andror2: A Dataset of Manually-Reproduced Bug Reports for Android apps. In 18th IEEE/ACM International Conference on Mining Software Repositories (MSR). IEEE, 600–604. https://doi.org/10.1109/MSR52588.2021.00082
[64]
Wikipedia. 2022. Floyd–Warshall algorithm. https://en.wikipedia.org/wiki/Floyd-Warshall_algorithm
[65]
Yiheng Xiong, Mengqian Xu, Ting Su, Jingling Sun, Jue Wang, He Wen, Geguang Pu, Jifeng He, and Zhendong Su. 2023. An Empirical Study of Functional Bugs in Android Apps. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 1319–1331. https://doi.org/10.1145/3597926.3598138
[66]
Xia Zeng, Dengfeng Li, Wujie Zheng, Fan Xia, Yuetang Deng, Wing Lam, Wei Yang, and Tao Xie. 2016. Automated test input generation for Android: are we really there yet in an industrial case? In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). 987–992. https://doi.org/10.1145/2950290.2983958
[67]
Haibing Zheng, Dengfeng Li, Beihai Liang, Xia Zeng, Wujie Zheng, Yuetang Deng, Wing Lam, Wei Yang, and Tao Xie. 2017. Automated Test Input Generation for Android: Towards Getting There in an Industrial Case. In 39th IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). 253–262. https://doi.org/10.1109/ICSE-SEIP.2017.32
[68]
Hong Zhu, Patrick A. V. Hall, and John H. R. May. 1997. Software Unit Test Coverage and Adequacy. ACM Comput. Surv., 29, 4 (1997), 366–427. https://doi.org/10.1145/267580.267590

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2023
2215 pages
ISBN:9798400703270
DOI:10.1145/3611643
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Android GUI Testing
  2. Runtime Verification
  3. Trace Analysis

Qualifiers

  • Research-article

Conference

ESEC/FSE '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 192
    Total Downloads
  • Downloads (Last 12 months)103
  • Downloads (Last 6 weeks)6
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media