Understanding the Reproducibility Issues of Monkey for GUI Testing

Liu, Huiyu; Kong, Qichao; Wang, Jue; Su, Ting; Sun, Haiying

doi:10.1007/978-981-99-8664-4_8

Huiyu Liu¹⁰,
Qichao Kong¹⁰,
Jue Wang¹¹,
Ting Su¹⁰ &
…
Haiying Sun¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14464))

Included in the following conference series:

International Symposium on Dependable Software Engineering: Theories, Tools, and Applications

584 Accesses

Abstract

Automated GUI testing is an essential activity in developing Android apps. Monkey is a widely used representative automated input generation (AIG) tool to efficiently and effectively detect crash bugs in Android apps. However, it faces challenges in reproducing the crash bugs it detects. To deeply understand the symptoms and root causes of these challenges, we conducted a comprehensive study on the reproducibility issues of Monkey with Android apps. We focused on Monkey’s capability to reproduce crash bugs using its built-in replay functionality and explored the root causes of its failures. Specifically, we selected six popular open-source apps and conducted automated instrumentation on them to monitor the invocations of event handlers within the apps. Subsequently, we performed GUI testing with Monkey on these instrumented apps for 6,000 test cases and collected 56 unique crash bugs. For each bug, we replayed it 200 times using Monkey’s replay function and calculated the success rate. Through manual analysis of screen recording files, log files of event handlers, and the source code of the apps, we pinpointed five root causes contributing to Monkey’s reproducibility issues: Injection Failure, Event Ambiguity, Data Loading, Widget Loading, and Dynamic Content. Our research showed that only 36.6% of the replays successfully reproduced the crash bugs, shedding light on Monkey’s limitations in consistently reproducing detected crash bugs. Additionally, we delved deep into the unsuccessfully reproduced replays to discern the root causes behind the reproducibility issues and offered insights for developing future AIG tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A benchmark-based evaluation of search-based crash reproduction

Article Open access 29 August 2019

A GUI-Based Automated Test System for Android Applications

TCM: Test Case Mutation to Improve Crash Detection in Android

References

Arnatovich, Y., Wang, L., Ngo, N., Soh, C.: Mobolic: an automated approach to exercising mobile application GUIs using symbiosis of online testing technique and customated input generation. Softw. Pract. Exp. 48, 1107–1142 (2018). https://doi.org/10.1002/spe.2564
Article Google Scholar
Ash Turner: The Rise of Android: Why is Android Successful? (2023). https://www.bankmycell.com/blog/how-many-android-users-are-there
Behrang, F., Orso, A.: Seven reasons why: an in-depth study of the limitations of random test input generation for android. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp. 1066–1077. ASE 2020, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3324884.3416567
Bläsing, T., Batyuk, L., Schmidt, A.D., Camtepe, S.A., Albayrak, S.: An android application sandbox system for suspicious software detection. In: 2010 5th International Conference on Malicious and Unwanted Software, pp. 55–62 (2010). https://doi.org/10.1109/MALWARE.2010.5665792
Bruneton, E., Lenglet, R., Coupaye, T.: ASM: a code manipulation tool to implement adaptable systems. Adapt. Extensible Compon. Syst. 30(19) (2002)
Google Scholar
Chen, S., Fan, L., Su, T., Ma, L., Liu, Y., Xu, L.: Automated cross-platform GUI code generation for mobile apps. In: 2019 IEEE 1st International Workshop on Artificial Intelligence for Mobile (AI4Mobile), pp. 13–16 (2019). https://doi.org/10.1109/AI4Mobile.2019.8672718
Dunlap, G.W., King, S.T., Cinar, S., Basrai, M.A., Chen, P.M.: ReVirt: enabling intrusion analysis through virtual-machine logging and replay 36(SI), 211–224 (2003). https://doi.org/10.1145/844128.844148
Feng, S., Xie, M., Chen, C.: Efficiency matters: Speeding up automated testing with GUI rendering inference. In: Proceedings of the 45th International Conference on Software Engineering, pp. 906–918. ICSE 2023 (2023). https://doi.org/10.1109/ICSE48619.2023.00084
Girden, E.R.: ANOVA: Repeated measures. No. 84, Sage (1992)
Google Scholar
Gomez, L., Neamtiu, I., Azim, T., Millstein, T.: Reran: timing- and touch-sensitive record and replay for android. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 72–81. IEEE Computer Society, Los Alamitos, CA, USA (2013). https://doi.org/10.1109/ICSE.2013.6606553
Gu, T., et al.: Practical GUI testing of android applications via model abstraction and refinement. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 269–280 (2019). https://doi.org/10.1109/ICSE.2019.00042
Guo, J., Li, S., Lou, J.G., Yang, Z., Liu, T.: Sara: self-replay augmented record and replay for android in industrial cases. In: ISSTA 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3293882.3330557
IBM Corp.: IBM SPSS statistics for windows. https://hadoop.apache.org
Li, C., Jiang, Y., Xu, C.: Cross-device record and replay for android apps. In: ESEC/FSE 2022, Association for Computing Machinery, pp. 395–407. New York, NY, USA (2022). https://doi.org/10.1145/3540250.3549083
Li, J., Si, S., Li, B., Cui, L., Zheng, J.: Lore: supporting non-deterministic events logging and replay for KVM virtual machines. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications, vol. 1, pp. 442–449 (2013). https://doi.org/10.1109/HPCC.and.EUC.2013.70
Li, Y., Yang, Z., Guo, Y., Chen, X.: DroidBot: a lightweight UI-guided test input generator for android. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 23–26 (2017). https://doi.org/10.1109/ICSE-C.2017.8
Li, Y., Yang, Z., Guo, Y., Chen, X.: Humanoid: a deep learning-based approach to automated black-box android app testing. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1070–1073 (2019). https://doi.org/10.1109/ASE.2019.00104
Lv, Z., Peng, C., Zhang, Z., Su, T., Liu, K., Yang, P.: Fastbot2: reusable automated model-based GUI testing for android enhanced by reinforcement learning. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. ASE 2022 (2023). https://doi.org/10.1145/3551349.3559505
Mao, K., Harman, M., Jia, Y.: Sapienz: multi-objective automated testing for android applications. In: Proceedings of the 25th International Symposium on Software Testing and Analysis. ISSTA 2016 (2016). https://doi.org/10.1145/2931037.2931054
Moran, K., Linares-Vásquez, M., Bernal-Cárdenas, C., Vendome, C., Poshyvanyk, D.: Automatically discovering, reporting and reproducing android application crashes. In: 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 33–44 (2016). https://doi.org/10.1109/ICST.2016.34
Narayanasamy, S., Pokam, G., Calder, B.: BugNet: continuously recording program execution for deterministic replay debugging. In: ISCA 2005, IEEE Computer Society, pp. 284–295. USA (2005). https://doi.org/10.1109/ISCA.2005.16
Patel, P., Srinivasan, G., Rahaman, S., Neamtiu, I.: On the effectiveness of random testing for android: or how i learned to stop worrying and love the monkey. In: Proceedings of the 13th International Workshop on Automation of Software Test, pp. 34–37 (2018). https://doi.org/10.1145/3194733.3194742
Project, A.O.S.: Monkey - android developers (2023). https://developer.android.com/studio/test/other-testing-tools/monkey
Project, A.O.S.: SDK platform tools release notes (2023). https://developer.android.com/tools/releases/platform-tools
Romano, A., Song, Z., Grandhi, S., Yang, W., Wang, W.: An empirical analysis of UI-based flaky tests. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 1585–1597 (2021). https://doi.org/10.1109/ICSE43902.2021.00141
Roy Choudhary, S., Gorla, A., Orso, A.: Automated test input generation for android: are we there yet? (e), pp. 429–440 (2015). https://doi.org/10.1109/ASE.2015.89
Silva, D., Teixeira, L., d’Amorim, M.: Shake it! detecting flaky tests caused by concurrency with shaker. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 301–311 (2020). https://doi.org/10.1109/ICSME46990.2020.00037
Su, T., et al.: Why my app crashes? Understanding and benchmarking framework-specific exceptions of android apps. IEEE Trans. Softw. Eng. 48(4), 1115–1137 (2022). https://doi.org/10.1109/TSE.2020.3013438
Article MathSciNet Google Scholar
Su, T., et al.: Guided, stochastic model-based GUI testing of android apps. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 245–256. ESEC/FSE 2017 (2017). https://doi.org/10.1145/3106237.3106298
Su, T., Wang, J., Su, Z.: Benchmarking automated GUI testing for android against real-world bugs. In: Proceedings of 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 119–130 (2021). https://doi.org/10.1145/3468264.3468620
Su, T., et al.: Fully automated functional fuzzing of android apps for detecting non-crashing logic bugs 5(OOPSLA) (2021). https://doi.org/10.1145/3485533
Sun, J., et al.: Understanding and finding system setting-related defects in android apps. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 204–215 (2021). https://doi.org/10.1145/3460319.3464806
Wang, J., Jiang, Y., Xu, C., Cao, C., Ma, X., Lu, J.: ComboDroid: generating high-quality test inputs for android apps via use case combinations. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp. 469–480. ICSE 2020 (2020). https://doi.org/10.1145/3377811.3380382
Wang, W., et al.: An empirical study of android test generation tools in industrial cases. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 738–748. ASE 2018, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3238147.3240465
Xiong, Y., et al.: An empirical study of functional bugs in android apps, pp. 1319–1331 (2023). https://doi.org/10.1145/3597926.3598138
Xu, M., Bodik, R., Hill, M.D.: A “flight data recorder” for enabling full-system multiprocessor deterministic replay, pp. 122–135. ISCA 2003, Association for Computing Machinery, New York, NY, USA (2003). https://doi.org/10.1145/859618.859633

Download references

Acknowledgements

We thank the SETTA reviewers for their valuable feedback, Yiheng Xiong and Shan Huang from East China Normal University for their insightful comments, and Cong Li from Nanjing University for the mechanism of Rx. This work was supported in part by National Key Research and Development Program (Grant 2022YFB3104002), NSFC Grant 62072178, “Digital Silk Road” Shanghai International Joint Lab of Trustworthy Intelligent Software under Grant 22510750100, and the Shanghai Collaborative Innovation Center of Trusted Industry Internet Software.

Author information

Authors and Affiliations

Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai, China
Huiyu Liu, Qichao Kong, Ting Su & Haiying Sun
Nanjing University, Nanjing, China
Jue Wang

Authors

Huiyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qichao Kong
View author publications
You can also search for this author in PubMed Google Scholar
Jue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ting Su
View author publications
You can also search for this author in PubMed Google Scholar
Haiying Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jue Wang or Haiying Sun .

Editor information

Editors and Affiliations

Saarland University, Saarbrücken, Germany
Holger Hermanns
Singapore Management University, Singapore, Singapore
Jun Sun
Nanjing University, Nanjing, China
Lei Bu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, H., Kong, Q., Wang, J., Su, T., Sun, H. (2024). Understanding the Reproducibility Issues of Monkey for GUI Testing. In: Hermanns, H., Sun, J., Bu, L. (eds) Dependable Software Engineering. Theories, Tools, and Applications. SETTA 2023. Lecture Notes in Computer Science, vol 14464. Springer, Singapore. https://doi.org/10.1007/978-981-99-8664-4_8

Download citation

DOI: https://doi.org/10.1007/978-981-99-8664-4_8
Published: 15 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8663-7
Online ISBN: 978-981-99-8664-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Understanding the Reproducibility Issues of Monkey for GUI Testing