Skip to main content

Understanding the Reproducibility Issues of Monkey for GUI Testing

  • Conference paper
  • First Online:
Dependable Software Engineering. Theories, Tools, and Applications (SETTA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14464))

  • 250 Accesses

Abstract

Automated GUI testing is an essential activity in developing Android apps. Monkey is a widely used representative automated input generation (AIG) tool to efficiently and effectively detect crash bugs in Android apps. However, it faces challenges in reproducing the crash bugs it detects. To deeply understand the symptoms and root causes of these challenges, we conducted a comprehensive study on the reproducibility issues of Monkey with Android apps. We focused on Monkey’s capability to reproduce crash bugs using its built-in replay functionality and explored the root causes of its failures. Specifically, we selected six popular open-source apps and conducted automated instrumentation on them to monitor the invocations of event handlers within the apps. Subsequently, we performed GUI testing with Monkey on these instrumented apps for 6,000 test cases and collected 56 unique crash bugs. For each bug, we replayed it 200 times using Monkey’s replay function and calculated the success rate. Through manual analysis of screen recording files, log files of event handlers, and the source code of the apps, we pinpointed five root causes contributing to Monkey’s reproducibility issues: Injection Failure, Event Ambiguity, Data Loading, Widget Loading, and Dynamic Content. Our research showed that only 36.6% of the replays successfully reproduced the crash bugs, shedding light on Monkey’s limitations in consistently reproducing detected crash bugs. Additionally, we delved deep into the unsuccessfully reproduced replays to discern the root causes behind the reproducibility issues and offered insights for developing future AIG tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arnatovich, Y., Wang, L., Ngo, N., Soh, C.: Mobolic: an automated approach to exercising mobile application GUIs using symbiosis of online testing technique and customated input generation. Softw. Pract. Exp. 48, 1107–1142 (2018). https://doi.org/10.1002/spe.2564

    Article  Google Scholar 

  2. Ash Turner: The Rise of Android: Why is Android Successful? (2023). https://www.bankmycell.com/blog/how-many-android-users-are-there

  3. Behrang, F., Orso, A.: Seven reasons why: an in-depth study of the limitations of random test input generation for android. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp. 1066–1077. ASE 2020, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3324884.3416567

  4. Bläsing, T., Batyuk, L., Schmidt, A.D., Camtepe, S.A., Albayrak, S.: An android application sandbox system for suspicious software detection. In: 2010 5th International Conference on Malicious and Unwanted Software, pp. 55–62 (2010). https://doi.org/10.1109/MALWARE.2010.5665792

  5. Bruneton, E., Lenglet, R., Coupaye, T.: ASM: a code manipulation tool to implement adaptable systems. Adapt. Extensible Compon. Syst. 30(19) (2002)

    Google Scholar 

  6. Chen, S., Fan, L., Su, T., Ma, L., Liu, Y., Xu, L.: Automated cross-platform GUI code generation for mobile apps. In: 2019 IEEE 1st International Workshop on Artificial Intelligence for Mobile (AI4Mobile), pp. 13–16 (2019). https://doi.org/10.1109/AI4Mobile.2019.8672718

  7. Dunlap, G.W., King, S.T., Cinar, S., Basrai, M.A., Chen, P.M.: ReVirt: enabling intrusion analysis through virtual-machine logging and replay 36(SI), 211–224 (2003). https://doi.org/10.1145/844128.844148

  8. Feng, S., Xie, M., Chen, C.: Efficiency matters: Speeding up automated testing with GUI rendering inference. In: Proceedings of the 45th International Conference on Software Engineering, pp. 906–918. ICSE 2023 (2023). https://doi.org/10.1109/ICSE48619.2023.00084

  9. Girden, E.R.: ANOVA: Repeated measures. No. 84, Sage (1992)

    Google Scholar 

  10. Gomez, L., Neamtiu, I., Azim, T., Millstein, T.: Reran: timing- and touch-sensitive record and replay for android. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 72–81. IEEE Computer Society, Los Alamitos, CA, USA (2013). https://doi.org/10.1109/ICSE.2013.6606553

  11. Gu, T., et al.: Practical GUI testing of android applications via model abstraction and refinement. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 269–280 (2019). https://doi.org/10.1109/ICSE.2019.00042

  12. Guo, J., Li, S., Lou, J.G., Yang, Z., Liu, T.: Sara: self-replay augmented record and replay for android in industrial cases. In: ISSTA 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3293882.3330557

  13. IBM Corp.: IBM SPSS statistics for windows. https://hadoop.apache.org

  14. Li, C., Jiang, Y., Xu, C.: Cross-device record and replay for android apps. In: ESEC/FSE 2022, Association for Computing Machinery, pp. 395–407. New York, NY, USA (2022). https://doi.org/10.1145/3540250.3549083

  15. Li, J., Si, S., Li, B., Cui, L., Zheng, J.: Lore: supporting non-deterministic events logging and replay for KVM virtual machines. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications, vol. 1, pp. 442–449 (2013). https://doi.org/10.1109/HPCC.and.EUC.2013.70

  16. Li, Y., Yang, Z., Guo, Y., Chen, X.: DroidBot: a lightweight UI-guided test input generator for android. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 23–26 (2017). https://doi.org/10.1109/ICSE-C.2017.8

  17. Li, Y., Yang, Z., Guo, Y., Chen, X.: Humanoid: a deep learning-based approach to automated black-box android app testing. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1070–1073 (2019). https://doi.org/10.1109/ASE.2019.00104

  18. Lv, Z., Peng, C., Zhang, Z., Su, T., Liu, K., Yang, P.: Fastbot2: reusable automated model-based GUI testing for android enhanced by reinforcement learning. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. ASE 2022 (2023). https://doi.org/10.1145/3551349.3559505

  19. Mao, K., Harman, M., Jia, Y.: Sapienz: multi-objective automated testing for android applications. In: Proceedings of the 25th International Symposium on Software Testing and Analysis. ISSTA 2016 (2016). https://doi.org/10.1145/2931037.2931054

  20. Moran, K., Linares-Vásquez, M., Bernal-Cárdenas, C., Vendome, C., Poshyvanyk, D.: Automatically discovering, reporting and reproducing android application crashes. In: 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 33–44 (2016). https://doi.org/10.1109/ICST.2016.34

  21. Narayanasamy, S., Pokam, G., Calder, B.: BugNet: continuously recording program execution for deterministic replay debugging. In: ISCA 2005, IEEE Computer Society, pp. 284–295. USA (2005). https://doi.org/10.1109/ISCA.2005.16

  22. Patel, P., Srinivasan, G., Rahaman, S., Neamtiu, I.: On the effectiveness of random testing for android: or how i learned to stop worrying and love the monkey. In: Proceedings of the 13th International Workshop on Automation of Software Test, pp. 34–37 (2018). https://doi.org/10.1145/3194733.3194742

  23. Project, A.O.S.: Monkey - android developers (2023). https://developer.android.com/studio/test/other-testing-tools/monkey

  24. Project, A.O.S.: SDK platform tools release notes (2023). https://developer.android.com/tools/releases/platform-tools

  25. Romano, A., Song, Z., Grandhi, S., Yang, W., Wang, W.: An empirical analysis of UI-based flaky tests. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 1585–1597 (2021). https://doi.org/10.1109/ICSE43902.2021.00141

  26. Roy Choudhary, S., Gorla, A., Orso, A.: Automated test input generation for android: are we there yet? (e), pp. 429–440 (2015). https://doi.org/10.1109/ASE.2015.89

  27. Silva, D., Teixeira, L., d’Amorim, M.: Shake it! detecting flaky tests caused by concurrency with shaker. In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 301–311 (2020). https://doi.org/10.1109/ICSME46990.2020.00037

  28. Su, T., et al.: Why my app crashes? Understanding and benchmarking framework-specific exceptions of android apps. IEEE Trans. Softw. Eng. 48(4), 1115–1137 (2022). https://doi.org/10.1109/TSE.2020.3013438

    Article  MathSciNet  Google Scholar 

  29. Su, T., et al.: Guided, stochastic model-based GUI testing of android apps. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 245–256. ESEC/FSE 2017 (2017). https://doi.org/10.1145/3106237.3106298

  30. Su, T., Wang, J., Su, Z.: Benchmarking automated GUI testing for android against real-world bugs. In: Proceedings of 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 119–130 (2021). https://doi.org/10.1145/3468264.3468620

  31. Su, T., et al.: Fully automated functional fuzzing of android apps for detecting non-crashing logic bugs 5(OOPSLA) (2021). https://doi.org/10.1145/3485533

  32. Sun, J., et al.: Understanding and finding system setting-related defects in android apps. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 204–215 (2021). https://doi.org/10.1145/3460319.3464806

  33. Wang, J., Jiang, Y., Xu, C., Cao, C., Ma, X., Lu, J.: ComboDroid: generating high-quality test inputs for android apps via use case combinations. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp. 469–480. ICSE 2020 (2020). https://doi.org/10.1145/3377811.3380382

  34. Wang, W., et al.: An empirical study of android test generation tools in industrial cases. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 738–748. ASE 2018, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3238147.3240465

  35. Xiong, Y., et al.: An empirical study of functional bugs in android apps, pp. 1319–1331 (2023). https://doi.org/10.1145/3597926.3598138

  36. Xu, M., Bodik, R., Hill, M.D.: A “flight data recorder” for enabling full-system multiprocessor deterministic replay, pp. 122–135. ISCA 2003, Association for Computing Machinery, New York, NY, USA (2003). https://doi.org/10.1145/859618.859633

Download references

Acknowledgements

We thank the SETTA reviewers for their valuable feedback, Yiheng Xiong and Shan Huang from East China Normal University for their insightful comments, and Cong Li from Nanjing University for the mechanism of Rx. This work was supported in part by National Key Research and Development Program (Grant 2022YFB3104002), NSFC Grant 62072178, “Digital Silk Road” Shanghai International Joint Lab of Trustworthy Intelligent Software under Grant 22510750100, and the Shanghai Collaborative Innovation Center of Trusted Industry Internet Software.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jue Wang or Haiying Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, H., Kong, Q., Wang, J., Su, T., Sun, H. (2024). Understanding the Reproducibility Issues of Monkey for GUI Testing. In: Hermanns, H., Sun, J., Bu, L. (eds) Dependable Software Engineering. Theories, Tools, and Applications. SETTA 2023. Lecture Notes in Computer Science, vol 14464. Springer, Singapore. https://doi.org/10.1007/978-981-99-8664-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8664-4_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8663-7

  • Online ISBN: 978-981-99-8664-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics