Abstract
Library or framework APIs are difficult to learn and use, leading to unexpected software behaviors or bugs. Hence, various API mining techniques have been introduced to mine API usage patterns about the co-occurring of API calls or pre-conditions of API calls. However, they fail to mine patterns about an API call itself (e.g., whether the arguments of the API call are correctly set and whether the API is suitably chosen over other similar APIs). To bridge this gap, we propose Cpam to identify change patterns (in the form of a pair of APIs before and after code changes) to fix API misuses, using historical code changes. Given a set of target APIs and a corpus of open-source projects, Cpam first selects the commits that potentially fix API misuses from the corpus, then extracts changes to API misuses in each selected commit, and finally identifies change patterns of API misuses. We implement Cpam for Java, and conduct large-scale evaluation, targeting Java SE APIs and using a corpus of 1162 Java projects. Our experimental results demonstrate Cpam’s effectiveness and efficiency. By applying identified change patterns to bug detection, we find 44 new bugs, and 18 of them have been confirmed and fixed.
Similar content being viewed by others
References
Robillard M P, DeLine R. A field study of API learning obstacles. Empir Softw Eng, 2011, 16: 703–732
Hou D, Li L. Obstacles in using frameworks and APIs: an exploratory study of programmers’ newsgroup discussions. In: Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension, 2011. 91–100
Nadi S, Krüger S, Mezini M, et al. Jumping through hoops: why do Java developers struggle with cryptography apis? In: Proceedings of the 38th International Conference on Software Engineering, 2016. 935–946
Zibran M F, Eishita F Z, Roy C K. Useful, but usable? factors affecting the usability of APIs. In: Proceedings of the 2011 18th Working Conference on Reverse Engineering, 2011. 151–155
Robillard M P, Bodden E, Kawrykow D, et al. Automated API property inference techniques. IEEE Trans Softw Eng, 2013, 39: 613–637
Zhong H, Xie T, Zhang L, et al. MAPO: mining and recommending API usage patterns. In: Proceedings of the 23rd European Conference on ECOOP 2009 — Object-Oriented Programming, 2009. 318–343
Uddin G, Robillard M P. How API documentation fails. IEEE Softw, 2015, 32: 68–75
Linares-Vásquez M, Bavota G, Bernal-Cárdenas C, et al. API change and fault proneness: a threat to the success of android apps. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, 2013. 477–487
McDonnell T, Ray B, Kim M. An empirical study of API stability and adoption in the android ecosystem. In: Proceedings of the 2013 IEEE International Conference on Software Maintenance, 2013. 70–79
Dig D, Johnson R. How do APIs evolve? A story of refactoring. J Softw Maint Evol-Res Pract, 2006, 18: 83–107
Xavier L, Brito A, Hora A, et al. Historical and impact analysis of API breaking changes: a large-scale study. In: Proceedings of 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2017. 138–147
Jezek K, Dietrich J, Brada P. How Java APIs break — an empirical study. Inf Softw Tech, 2015, 65: 129–146
Raemaekers S, van Deursen A, Visser J. Semantic versioning and impact of breaking changes in the Maven repository. J Syst Softw, 2017, 129: 140–158
Jung C, Rus S, Railing B P, et al. Brainy: effective selection of data structures. SIGPLAN Not, 2011, 46: 86–97
Xu G. CoCo: sound and adaptive replacement of Java collections. In: Proceedings of the 27th European conference on Object-Oriented Programming, 2013. 1–26
Chen B, Liu Y, Le W. Generating performance distributions via probabilistic symbolic execution. In: Proceedings of the 38th International Conference on Software Engineering, 2016. 49–60
Zhao Y, Xiao L, Wang X, et al. Localized or architectural: an empirical study of performance issues dichotomy. In: Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2019. 316–317
Georgiev M, Iyengar S, Jana S, et al. The most dangerous code in the world: validating SSL certificates in non-browser software. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, 2012. 38–49
Fahl S, Harbach M, Perl H, et al. Rethinking SSL development in an appified world. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, 2013. 49–60
Egele M, Brumley D, Fratantonio Y, et al. An empirical study of cryptographic misuse in android applications. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, 2013. 73–84
Li L, Bissyandé T F, Traon Y L, et al. Accessing inaccessible android APIs: an empirical study. In: Proceedings of the 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2016. 411–422
Li Z, Zhou Y. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. SIGSOFT Softw Eng Notes, 2005, 30: 306–315
Thummalapenta S, Xie T. Alattin: mining alternative patterns for detecting neglected conditions. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, 2009. 283–294
Monperrus M, Bruch M, Mezini M. Detecting missing method calls in object-oriented software. In: Proceedings of the 24th European Conference on Object-Oriented Programming, 2010. 2–25
Wasylkowski A, Zeller A, Lindig C. Detecting object usage anomalies. In: Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2007. 35–44
Moritz E, Linares-Vásquez M, Poshyvanyk D, et al. ExPort: detecting and visualizing API usages in large source code repositories. In: Proceedings of the 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2013. 646–651
Fowkes J, Sutton C. Parameter-free probabilistic API mining across github. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016. 254–265
Zhang T, Upadhyaya G, Reinhardt A, et al. Are code examples on an online Q&A forum reliable? a study of API misuse on stack overflow. In: Proceedings of 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), 2018. 886–896
Williams C C, Hollingsworth J K. Recovering system specific rules from software repositories. SIGSOFT Softw Eng Notes, 2005, 30: 1
Livshits B, Zimmermann T. DynaMine: finding common error patterns by mining software revision histories. SIGSOFT Softw Eng Notes, 2005, 30: 296–305
Uddin G, Dagenais B, Robillard M P. Temporal analysis of API usage concepts. In: Proceedings of the 2012 34th International Conference on Software Engineering (ICSE), 2012. 804–814
Azad S, Rigby P C, Guerrouj L. Generating API call rules from version history and stack overflow posts. ACM Trans Softw Eng Methodol, 2017, 25: 1–22
Liang B, Bian P, Zhang Y, et al. Antminer: mining more bugs by reducing noise interference. In: Proceedings of the 38th International Conference on Software Engineering, 2016. 333–344
Ramanathan M K, Grama A, Jagannathan S. Path-sensitive inference of function precedence protocols. In: Proceedings of the 29th International Conference on Software Engineering (ICSE’07), 2007. 240–250
Nguyen H A, Dyer R, Nguyen T N, et al. Mining preconditions of APIs in large-scale code corpus. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2014. 166–177
Ramanathan M K, Grama A, Jagannathan S. Static specification inference using predicate mining. SIGPLAN Not, 2007, 42: 123–134
Wasylkowski A, Zeller A. Mining temporal specifications from object usage. Autom Softw Eng, 2011, 18: 263–292
Chang R Y, Podgurski A, Yang J. Finding what’s not there: a new approach to revealing neglected conditions in software. In: Proceedings of the 2007 International Symposium on Software Testing and Analysis, 2007. 163–173
Nguyen T T, Nguyen H A, Pham N H, et al. Graph-based mining of multiple object usage patterns. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2009. 383–392
Falleri J R, Morandat F, Blanc X, et al. Fine-grained and accurate source code differencing. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, 2014. 313–324
Kim S, Whitehead E J, Zhang Y. Classifying software changes: clean or buggy? IEEE Trans Softw Eng, 2008, 34: 181–196
Jin G, Song L, Shi X, et al. Understanding and detecting real-world performance bugs. SIGPLAN Not, 2012, 47: 77–88
Chen Z, Chen B, Xiao L, et al. Speedoo: prioritizing performance optimization opportunities. In: Proceedings of the 40th International Conference on Software Engineering, 2018. 811–821
Zhou Y, Sharma A. Automated identification of security issues from commit messages and bug reports. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017. 914–919
Wei L, Liu Y, Cheung S C. Taming android fragmentation: characterizing and detecting compatibility issues for Android apps. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, 2016. 226–237
Herzig K, Zeller A. The impact of tangled code changes. In: Proceedings of the 2013 10th Working Conference on Mining Software Repositories (MSR), 2013. 121–130
Dias M, Bacchelli A, Gousios G, et al. Untangling fine-grained code changes. In: Proceedings of the 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2015. 341–350
Hattori L P, Lanza M. On the nature of commits. In: Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering-Workshops, 2008. 63–71
Liu H, Liu Q, Staicu C A, et al. Nomen est omen: exploring and exploiting similarities between argument and parameter names. In: Proceedings of the 38th International Conference on Software Engineering, 2016. 1063–1073
Pradel M, Gross T R. Detecting anomalies in the order of equally-typed method arguments. In: Proceedings of the 2011 International Symposium on Software Testing and Analysis, 2011. 232–242
Pradel M, Gross T R. Name-based analysis of equally typed method arguments. IEEE Trans Softw Eng, 2013, 39: 1127–1143
Rice A, Aftandilian E, Jaspan C, et al. Detecting argument selection defects. In: Proceedings of the ACM on Programming Languages, 2017. 1–22
Williams C C, Hollingsworth J K. Automatic mining of source code repositories to improve bug finding techniques. IIEEE Trans Softw Eng, 2005, 31: 466–480
Hovemeyer D, Pugh W. Finding bugs is easy. SIGPLAN Not, 2004, 39: 92–106
Aftandilian E, Sauciuc R, Priya S, et al. Building useful program analysis tools using an extensible Java compiler. In: Proceedings of the 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation, 2012. 14–23
Copeland T. PMD Applied. Arexandria: Centennial Books, 2005
Thung F, Lucia F, Lo D, et al. To what extent could we detect field defects? An extended empirical study of false negatives in static bug-finding tools. Autom Softw Eng, 2015, 22: 561–602
Habib A, Pradel M. How many of all bugs do we find? a study of static bug detectors. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018. 317–328
Sabetta A, Bezzi M. A practical approach to the automatic classification of security-relevant commits. In: Proceedings of 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2018. 579–582
Xu Z, Chen B, Chandramohan M, et al. SPAIN: security patch analysis for binaries towards understanding the pain and pills. In: Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017. 462–472
Pearson S, Campos J, Just R, et al. Evaluating and improving fault localization. In: Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017. 609–620
Kawrykow D, Robillard M P. Non-essential changes in version histories. In: Proceedings of the 2011 33rd International Conference on Software Engineering (ICSE), 2011. 351–360
Barnett M, Bird C, Brunet J A, et al. Helping developers help themselves: automatic decomposition of code review change-sets. In: Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Florence, 2015. 134–144
Tao Y, Kim S, Partitioning composite code changes to facilitate code review. In: Proceedings of the 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, 2015. 180–190
Paletov R, Tsankov P, Raychev V, et al. Inferring crypto API rules from code changes. SIGPLAN Not, 2018, 53: 450–464
Amann S, Nguyen H A, Nadi S, et al. A systematic evaluation of static API-misuse detectors. IEEE Trans Softw Eng, 2019, 45: 1170–1188
Engler D, Chen D Y, Hallem S, et al. Bugs as deviant behavior: a general approach to inferring errors in systems code. SIGOPS Oper Syst Rev, 2001, 35: 57–72
Salman H E. Identification multi-level frequent usage patterns from APIs. J Syst Softw, 2017, 130: 42–56
Xie T, Pei J. MAPO: mining API usages from open source repositories. In: Proceedings of the 2006 International Workshop on Mining Software Repositories, 2006. 54–57
Kagdi H, Collard M L, Maletic J I. An approach to mining call-usage patternswith syntactic context. In: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering, 2007. 457–460
Acharya M, Xie T, Pei J, et al. Mining API patterns as partial orders from source code: from usage scenarios to specifications. In: Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2007. 25–34
Gruska N, Wasylkowski A, Zeller A. Learning from 6000 projects: lightweight cross-project anomaly detection. In: Proceedings of the 19th International Symposium on Software Testing and Analysis, 2010. 119–130
Thummalapenta S, Xie T. Mining exception-handling rules as sequence association rules. In: Proceedings of the 2009 IEEE 31st International Conference on Software Engineering, 2009. 496–506
Wang J, Dang Y, Zhang H, et al. Mining succinct and high-coverage API usage patterns from source code. In: Proceedings of the 2013 10th Working Conference on Mining Software Repositories (MSR), 2013. 319–328
Gu X, Zhang H, Zhang D, et al. Deep API learning. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016. 631–642
Wen M, Liu Y, Wu R, et al. Exposing library API misuses via mutation analysis. In: Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019
Mandelin D, Xu L, Bodík R, et al. Jungloid mining: Helping to navigate the API jungle. SIGPLAN Not, 2005, 40: 48–61
Zhong H, Zhang H L, Mei H. Inferring specifications of object oriented APIs from API source code. In: Proceedings of the 2008 15th Asia-Pacific Software Engineering Conference, 2008. 221–228
Buse R P, Weimer W. Synthesizing API usage examples. In: Proceedings of the 2012 34th International Conference on Software Engineering (ICSE), 2012. 782–792
Niu H, Keivanloo I, Zou Y. API usage pattern recommendation for software development. J Syst Softw, 2017, 129: 127–139
Wang S, Chollak D, Movshovitz-Attias D, et al. Bugram: bug detection with n-gram language models. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, 2016. 708–719
Murali V, Chaudhuri S, Jermaine C. Bayesian specification learning for finding API usage errors. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017. 151–162
Murphy-Hill E, Sadowski C, Head A, et al. Discovering API usability problems at scale. In: Proceedings of the 2nd International Workshop on API Usage and Evolution, 2018. 14–17
Uddin G, Dagenais B, Robillard M P. Analyzing temporal API usage patterns. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), 2011. 456–459
Bruch M, Monperrus M, Mezini M. Learning from examples to improve code completion systems. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2009. 213–222
Wang L, Fang L, Wang L, et al. APIExample: an effective web search based usage example recommendation system for Java APIs. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), 2011. 592–595
Negara S, Codoban M, Dig D, et al. Mining fine-grained code changes to detect unknown change patterns. In: Proceedings of the 36th International Conference on Software Engineering, 2014. 803–813
Meng N, Kim M, McKinley K S. Systematic editing: generating program transformations from an example. SIGPLAN Not, 2011, 46: 329–342
Meng N, Kim M, McKinley K S. LASE: locating and applying systematic edits by learning from examples. In: Proceedings of the 2013 35th International Conference on Software Engineering (ICSE), 2013. 502–511
Rolim R, Soares G, D’Antoni L, et al. Learning syntactic program transformations from examples. In: Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017. 404–415
Kim D, Nam J, Song J, et al. Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 35th International Conference on Software Engineering (ICSE), 2013. 802–811
Long F, Amidon P, Rinard M. Automatic inference of code transforms for patch generation. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017. 727–739
Liu X, Zhong H. Mining stackoverflow for program repair. In: Proceedings of the 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2018. 118–129
Roychoudhury A, Xiong Y. Automated program repair: a step towards software automation. Sci China Inf Sci, 2019, 62: 200103
Brown D B, Vaughn M, Liblit B, et al. The care and feeding of wild-caught mutants. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017. 511–522
Monperrus M, Eichberg M, Tekes E, et al. What should developers be aware of? An empirical study on the directives of API documentation. Empir Software Eng, 2012, 17: 703–737
Dekel U, Herbsleb J D. Improving API documentation usability with knowledge pushing. In: Proceedings of the 2009 IEEE 31st International Conference on Software Engineering, 2009. 320–330
Saied M A, Sahraoui H, Dufour B. An observational study on API usage constraints and their documentation. In: Proceedings of 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2015. 33–42
Zhou Y, Gu R, Chen T, et al. Analyzing APIs documentation and code to detect directive defects. In: Proceedings of 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017. 27–37
Wu W, Guéhéneuc Y G, Antoniol G, et al. AURA: a hybrid approach to identify framework evolution. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, 2010. 325–334
Dagenais B, Robillard M P. Recommending adaptive changes for framework evolution. ACM Trans Softw Eng Methodol, 2011, 20: 1–35
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant No. 61802067).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, W., Chen, B., Peng, X. et al. Identifying change patterns of API misuses from code changes. Sci. China Inf. Sci. 64, 132101 (2021). https://doi.org/10.1007/s11432-019-2745-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-019-2745-5