Abstract
Automated GUI testing is an essential activity in developing Android apps. Monkey is a widely used representative automated input generation (AIG) tool to efficiently and effectively detect crash bugs in Android apps. However, it faces challenges in reproducing the crash bugs it detects. To deeply understand the symptoms and root causes of these challenges, we conducted a comprehensive study on the reproducibility issues of Monkey with Android apps. We focused on Monkey’s capability to reproduce crash bugs using its built-in replay functionality and explored the root causes of its failures. Specifically, we selected six popular open-source apps and conducted automated instrumentation on them to monitor the invocations of event handlers within the apps. Subsequently, we performed GUI testing with Monkey on these instrumented apps for 6,000 test cases and collected 56 unique crash bugs. For each bug, we replayed it 200 times using Monkey’s replay function and calculated the success rate. Through manual analysis of screen recording files, log files of event handlers, and the source code of the apps, we pinpointed five root causes contributing to Monkey’s reproducibility issues: Injection Failure, Event Ambiguity, Data Loading, Widget Loading, and Dynamic Content. Our research showed that only 36.6% of the replays successfully reproduced the crash bugs, shedding light on Monkey’s limitations in consistently reproducing detected crash bugs. Additionally, we delved deep into the unsuccessfully reproduced replays to discern the root causes behind the reproducibility issues and offered insights for developing future AIG tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arnatovich, Y., Wang, L., Ngo, N., Soh, C.: Mobolic: an automated approach to exercising mobile application GUIs using symbiosis of online testing technique and customated input generation. Softw. Pract. Exp. 48, 1107–1142 (2018). https://doi.org/10.1002/spe.2564
Ash Turner: The Rise of Android: Why is Android Successful? (2023). https://www.bankmycell.com/blog/how-many-android-users-are-there
Behrang, F., Orso, A.: Seven reasons why: an in-depth study of the limitations of random test input generation for android. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp. 1066–1077. ASE 2020, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3324884.3416567
Bläsing, T., Batyuk, L., Schmidt, A.D., Camtepe, S.A., Albayrak, S.: An android application sandbox system for suspicious software detection. In: 2010 5th International Conference on Malicious and Unwanted Software, pp. 55–62 (2010). https://doi.org/10.1109/MALWARE.2010.5665792
Bruneton, E., Lenglet, R., Coupaye, T.: ASM: a code manipulation tool to implement adaptable systems. Adapt. Extensible Compon. Syst. 30(19) (2002)
Chen, S., Fan, L., Su, T., Ma, L., Liu, Y., Xu, L.: Automated cross-platform GUI code generation for mobile apps. In: 2019 IEEE 1st International Workshop on Artificial Intelligence for Mobile (AI4Mobile), pp. 13–16 (2019). https://doi.org/10.1109/AI4Mobile.2019.8672718
Dunlap, G.W., King, S.T., Cinar, S., Basrai, M.A., Chen, P.M.: ReVirt: enabling intrusion analysis through virtual-machine logging and replay 36(SI), 211–224 (2003). https://doi.org/10.1145/844128.844148
Feng, S., Xie, M., Chen, C.: Efficiency matters: Speeding up automated testing with GUI rendering inference. In: Proceedings of the 45th International Conference on Software Engineering, pp. 906–918. ICSE 2023 (2023). https://doi.org/10.1109/ICSE48619.2023.00084
Girden, E.R.: ANOVA: Repeated measures. No. 84, Sage (1992)
Gomez, L., Neamtiu, I., Azim, T., Millstein, T.: Reran: timing- and touch-sensitive record and replay for android. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 72–81. IEEE Computer Society, Los Alamitos, CA, USA (2013). https://doi.org/10.1109/ICSE.2013.6606553
Gu, T., et al.: Practical GUI testing of android applications via model abstraction and refinement. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 269–280 (2019). https://doi.org/10.1109/ICSE.2019.00042
Guo, J., Li, S., Lou, J.G., Yang, Z., Liu, T.: Sara: self-replay augmented record and replay for android in industrial cases. In: ISSTA 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3293882.3330557
IBM Corp.: IBM SPSS statistics for windows. https://hadoop.apache.org
Li, C., Jiang, Y., Xu, C.: Cross-device record and replay for android apps. In: ESEC/FSE 2022, Association for Computing Machinery, pp. 395–407. New York, NY, USA (2022). https://doi.org/10.1145/3540250.3549083
Li, J., Si, S., Li, B., Cui, L., Zheng, J.: Lore: supporting non-deterministic events logging and replay for KVM virtual machines. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications, vol. 1, pp. 442–449 (2013). https://doi.org/10.1109/HPCC.and.EUC.2013.70
Li, Y., Yang, Z., Guo, Y., Chen, X.: DroidBot: a lightweight UI-guided test input generator for android. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 23–26 (2017). https://doi.org/10.1109/ICSE-C.2017.8
Li, Y., Yang, Z., Guo, Y., Chen, X.: Humanoid: a deep learning-based approach to automated black-box android app testing. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1070–1073 (2019). https://doi.org/10.1109/ASE.2019.00104
Lv, Z., Peng, C., Zhang, Z., Su, T., Liu, K., Yang, P.: Fastbot2: reusable automated model-based GUI testing for android enhanced by reinforcement learning. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. ASE 2022 (2023). https://doi.org/10.1145/3551349.3559505
Mao, K., Harman, M., Jia, Y.: Sapienz: multi-objective automated testing for android applications. In: Proceedings of the 25th International Symposium on Software Testing and Analysis. ISSTA 2016 (2016). https://doi.org/10.1145/2931037.2931054
Moran, K., Linares-Vásquez, M., Bernal-Cárdenas, C., Vendome, C., Poshyvanyk, D.: Automatically discovering, reporting and reproducing android application crashes. In: 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 33–44 (2016). https://doi.org/10.1109/ICST.2016.34
Narayanasamy, S., Pokam, G., Calder, B.: BugNet: continuously recording program execution for deterministic replay debugging. In: ISCA 2005, IEEE Computer Society, pp. 284–295. USA (2005). https://doi.org/10.1109/ISCA.2005.16
Patel, P., Srinivasan, G., Rahaman, S., Neamtiu, I.: On the effectiveness of random testing for android: or how i learned to stop worrying and love the monkey. In: Proceedings of the 13th International Workshop on Automation of Software Test, pp. 34–37 (2018). https://doi.org/10.1145/3194733.3194742
Project, A.O.S.: Monkey - android developers (2023). https://developer.android.com/studio/test/other-testing-tools/monkey
Project, A.O.S.: SDK platform tools release notes (2023). https://developer.android.com/tools/releases/platform-tools
Romano, A., Song, Z., Grandhi, S., Yang, W., Wang, W.: An empirical analysis of UI-based flaky tests. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 1585–1597 (2021). https://doi.org/10.1109/ICSE43902.2021.00141
Roy Choudhary, S., Gorla, A., Orso, A.: Automated test input generation for android: are we there yet? (e), pp. 429–440 (2015). https://doi.org/10.1109/ASE.2015.89
Silva, D., Teixeira, L., d’Amorim, M.: Shake it! detecting flaky tests caused by concurrency with shaker. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 301–311 (2020). https://doi.org/10.1109/ICSME46990.2020.00037
Su, T., et al.: Why my app crashes? Understanding and benchmarking framework-specific exceptions of android apps. IEEE Trans. Softw. Eng. 48(4), 1115–1137 (2022). https://doi.org/10.1109/TSE.2020.3013438
Su, T., et al.: Guided, stochastic model-based GUI testing of android apps. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 245–256. ESEC/FSE 2017 (2017). https://doi.org/10.1145/3106237.3106298
Su, T., Wang, J., Su, Z.: Benchmarking automated GUI testing for android against real-world bugs. In: Proceedings of 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 119–130 (2021). https://doi.org/10.1145/3468264.3468620
Su, T., et al.: Fully automated functional fuzzing of android apps for detecting non-crashing logic bugs 5(OOPSLA) (2021). https://doi.org/10.1145/3485533
Sun, J., et al.: Understanding and finding system setting-related defects in android apps. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 204–215 (2021). https://doi.org/10.1145/3460319.3464806
Wang, J., Jiang, Y., Xu, C., Cao, C., Ma, X., Lu, J.: ComboDroid: generating high-quality test inputs for android apps via use case combinations. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp. 469–480. ICSE 2020 (2020). https://doi.org/10.1145/3377811.3380382
Wang, W., et al.: An empirical study of android test generation tools in industrial cases. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 738–748. ASE 2018, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3238147.3240465
Xiong, Y., et al.: An empirical study of functional bugs in android apps, pp. 1319–1331 (2023). https://doi.org/10.1145/3597926.3598138
Xu, M., Bodik, R., Hill, M.D.: A “flight data recorder” for enabling full-system multiprocessor deterministic replay, pp. 122–135. ISCA 2003, Association for Computing Machinery, New York, NY, USA (2003). https://doi.org/10.1145/859618.859633
Acknowledgements
We thank the SETTA reviewers for their valuable feedback, Yiheng Xiong and Shan Huang from East China Normal University for their insightful comments, and Cong Li from Nanjing University for the mechanism of Rx. This work was supported in part by National Key Research and Development Program (Grant 2022YFB3104002), NSFC Grant 62072178, “Digital Silk Road” Shanghai International Joint Lab of Trustworthy Intelligent Software under Grant 22510750100, and the Shanghai Collaborative Innovation Center of Trusted Industry Internet Software.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, H., Kong, Q., Wang, J., Su, T., Sun, H. (2024). Understanding the Reproducibility Issues of Monkey for GUI Testing. In: Hermanns, H., Sun, J., Bu, L. (eds) Dependable Software Engineering. Theories, Tools, and Applications. SETTA 2023. Lecture Notes in Computer Science, vol 14464. Springer, Singapore. https://doi.org/10.1007/978-981-99-8664-4_8
Download citation
DOI: https://doi.org/10.1007/978-981-99-8664-4_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8663-7
Online ISBN: 978-981-99-8664-4
eBook Packages: Computer ScienceComputer Science (R0)