Abstract
Many software libraries, especially those commercial ones, provide API documentation in natural languages to describe correct API usages. However, developers may still write code that is inconsistent with API documentation, partially because many developers are reluctant to carefully read API documentation as shown by existing research. As these inconsistencies may indicate defects, researchers have proposed various detection approaches, and these approaches need many known specifications. As it is tedious to write specifications manually for all APIs, various approaches have been proposed to mine specifications automatically. In the literature, most existing mining approaches rely on analyzing client code, so these mining approaches would fail to mine specifications when client code is not sufficient. Instead of analyzing client code, we propose an approach, called Doc2Spec, that infers resource specifications from API documentation in natural languages. We evaluated our approach on the Javadocs of five libraries. The results show that our approach performs well on real scale libraries, and infers various specifications with relatively high precisions, recalls, and F-scores. We further used inferred specifications to detect defects in open source projects. The results show that specifications inferred by Doc2Spec are useful to detect real defects in existing projects.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Acharya, M., Xie, T.: Mining API error-handling specifications from source code. In: Proc. Fundamental Approaches to Software Engineering, pp. 370–384 (2009)
Acharya, M., Xie, T., Pei, J., Xu, J.: Mining API patterns as partial orders from source code: From usage scenarios to specifications. In: Proc. 6th ESEC/FSE, pp. 25–34 (2007)
Alur, R., Černý, P., Madhusudan, P., Nam, W.: Synthesis of interface specifications for Java classes. In: Proc. 32nd POPL, pp. 98–109 (2005)
Ambriola, V., Gervasi, V.: Processing natural language requirements. In: Proc. 12th ASE, pp. 36–45. IEEE Computer Society, Los Alamitos (1997)
Ammons, G., Bodík, R., Larus, J.: Mining specifications. In: Proc. 29th POPL, pp. 4–16 (2002)
Anvik, J., Hiew, L., Murphy, G.: Who should fix this bug? In: Proc. 28th ICSE, pp. 361–370 (2006)
Arnout, K., Meyer, B.: Uncovering hidden contracts: The .NET example. Computer 36(11), 48–55 (2003)
Baum, L., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 164–171 (1970)
Buse, R., Weimer, W.: Automatic documentation inference for exceptions. In: Proc. ISSTA, pp. 273–282 (2008)
Buse, R., Weimer, W.: Automatically documenting program changes. In: Proc. 26th ASE, pp. 33–42 (2010)
Chinchor, N.: MUC-7 named entity task definition. In: Proc. 7th MUC (1997)
Cohen, W., Sarawagi, S.: Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods. In: Proc. 10th KDD, pp. 89–98 (2004)
Dag, J., Regnell, B., Gervasi, V., Brinkkemper, S.: A linguistic-engineering approach to large-scale requirements management. IEEE Softw. 3, 3 (2005)
Dagenais, B., Hendren, L.J.: Enabling static analysis for partial Java programs. In: Proc. 23rd OOPSLA, pp. 313–328 (2008)
Dekel, U., Herbsleb, J.D.: Reading the documentation of invoked API functions in program comprehension. In: Proc. 17th ICPC, pp. 168–177 (2009a)
Dekel, U., Herbsleb, J.D.: Improving API documentation usability with knowledge pushing. In: Proc. 31st ICSE, pp. 320–330 (2009b)
Engler, D., Chen, D., Chou, A.: Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In: Proc. 18th SOSP, pp. 57–72 (2001)
Fantechi, A., Gnesi, S., Lami, G., Maccari, A.: Applications of linguistic techniques for use case analysis. Requir. Eng. 8(3), 161–170 (2003)
Fellbaum, C., et al.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Fry, Z., Shepherd, D., Hill, E., Pollock, L., Vijay-Shanker, K.: Analysing source code: looking for useful verb-direct object pairs in all the right places. IET Softw. 2(1), 27–36 (2008)
Gabel, M., Su, Z.: Symbolic mining of temporal specifications. In: Proc. 13th ICSE, pp. 51–60 (2008)
Gabel, M., Su, Z.: Online inference and enforcement of temporal properties. In: Proc. 32nd ICSE, pp. 15–24 (2010)
Gegick, M., Rotella, P., Xie, T.: Identifying security bug reports via text mining: An industrial case study. In: Proc. 7th MSR, pp. 11–20 (2010)
Gervasi, V., Zowghi, D.: Reasoning about inconsistencies in natural language requirements. ACM Trans. Softw. Eng. Methodol. 14(3), 277–330 (2005)
Goldin, L., Berry, D.: AbstFinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom. Softw. Eng. 4(4), 375–412 (1997)
Gowri, M., Grothoff, C., Chandra, S.: Deriving object typestates in the presence of inter-object references. In: Proc. 20th OOPSLA, pp. 77–96 (2005)
Hayes, J., Dekhtyar, A., Sundaram, S.: Advancing candidate link generation for requirements tracing: The study of methods. IEEE Trans. Softw. Eng. 32(1), 4–19 (2006)
Henkel, J., Diwan, A.: A tool for writing and debugging algebraic specifications. In: Proc. 26th ICSE, pp. 449–458 (2004)
Hirschman, L.: MUC-7 coreference task definition. In: Proc. 7th MUC (1997)
Horie, M., Chiba, S.: Tool support for crosscutting concerns of API documentation. In: Proc. 8th AOSD, pp. 97–108 (2010)
Høst, E.W., Østvold, B.M.: Debugging method names. In: Proc. 23rd ECOOP, pp. 294–317 (2009)
Igarashi, A., Kobayashi, N.: Resource usage analysis. ACM Trans. Program. Lang. Syst. 27(2), 264–313 (2005)
Jeong, G., Kim, S., Zimmermann, T.: Improving bug triage with bug tossing graphs. In: Proc. 7th ESEC/FSE, pp. 111–120. ACM, New York (2009)
Kof, L.: Scenarios: Identifying missing objects and actions by means of computational linguistics. In: Proc. 15th RE, pp. 121–130 (2007)
Kremenek, T., Twohey, P., Back, G., Ng, A., Engler, D.: From uncertainty to belief: Inferring the specification within. In: Proc. 7th OSDI, pp. 259–272 (2006)
Lee, C., Chen, F., Rosu, G.: Mining parametric specifications. In: Proc. 33rd ICSE, pp. 591–600 (2011)
Li, Z., Zhou, Y.: PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In: Proc. ESEC/FSE, pp. 306–315 (2005)
Livshits, V., Zimmermann, T.: Dynamine: Finding common error patterns by mining software revision histories. In: Proc. ESEC/FSE, pp. 31–40 (2005)
Lo, D., Khoo, S.: SMArTIC: towards building an accurate, robust and scalable specification miner. In: Proc. 14th FSE, pp. 265–275 (2006)
Lo, D., Maoz, S.: Scenario-based and value-based specification mining: better together. In: Proc. 25th ASE, pp. 387–396 (2010)
Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakes – a comprehensive study on real world concurrency bug characteristics. In: Proc. 13th ASPLOS, pp. 329–339 (2008)
Meziane, F., Athanasakis, N., Ananiadou, S.: Generating natural language specifications from UML class diagrams. Requir. Eng. 13(1), 1–18 (2008)
Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: Proc. 9th EACL, pp. 1–8 (1999)
Novick, D., Ward, K.: Why don’t people read the manual. In: Proc. 24th SIGDOC, pp. 11–18 (2006)
Olson, D.: Advanced Data Mining Techniques. Springer, Berlin (2008)
Padioleau, Y., Tan, L., Zhou, Y.: Listening to programmers—Taxonomies and characteristics of comments in operating system code. In: Proc. 31st ICSE, pp. 331–341 (2009)
Perry, E., Sanko, M., Wright, B., Pfaeffle, T.: Oracle 9i JDBC developer’s guide and reference. Technical report, March 2002. http://www.oracle.com
Raman, A., Patrick, J.: The sk-strings method for inferring PFSA. In: Proc. Machine Learning Workshop Automata Induction, Grammatical Inference, and Language Acquisition (1997)
Ramanathan, M., Grama, A., Jagannathan, S.: Path-sensitive inference of function precedence protocols. In: Proc. 29th ICSE, pp. 240–250 (2007)
Rivest, R., Schapire, R.: Inference of finite automata using homing sequences. In: Machine Learning: From Theory to Applications, pp. 51–73 (1993)
Robillard, M.P., DeLine, R.: A field study of API learning obstacles. Empir. Softw. Eng. (2011). doi:10.1007/s10664-010-9150-8
Runeson, P., Alexandersson, M., Nyholm, O.: Detection of duplicate defect reports using natural language processing. In: Proc. 29th ICSE, pp. 499–510 (2007)
Sawyer, P., Rayson, P., Garside, R.: REVERE: Support for requirements synthesis from documents. Inf. Syst. Front. 4(3), 343–353 (2002)
Shepherd, D., Fry, Z., Hill, E., Pollock, L., Vijay-Shanker, K.: Using natural language program analysis to locate and understand action-oriented concerns. In: Proc. 6th AOSD, pp. 212–224 (2007)
Shi, L., Zhong, H., Xie, T., Li, M.: An empirical study on evolution of API documentation. In: Proc. FASE, pp. 416–431 (2011)
Sridhara, G., Hill, E., Muppaneni, D., Pollock, L.L., Vijay-Shanker, K.: Towards automatically generating summary comments for Java methods. In: Proc. 25th ASE, pp. 43–52 (2010)
Stylos, J., Faulring, A., Yang, Z., Myers, B.: Improving API documentation using API usage information. In: Proc. IVL/HCC, pp. 119–126 (2009)
Tan, L., Yuan, D., Krishna, G., Zhou, Y.: /* iComment: Bugs or Bad Comments?*/. In: Proc. 21st SOSP, pp. 145–158 (2007)
Thummalapenta, S., Xie, T.: SpotWeb: Detecting framework hotspots and coldspots via mining open source code on the web. In: Proc. 23rd ASE, pp. 327–336 (2008)
Thummalapenta, S., Xie, T.: Mining exception-handling rules as sequence association rules. In: Proc. 31th International Conference on Software Engineering, May 2009, pp. 496–506 (2009a)
Thummalapenta, S., Xie, T.: Alattin: Mining alternative patterns for detecting neglected conditions. In: Proc. 24th Automated Software Engineering, pp. 283–294 (2009b)
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)
Wang, X., Zhang, L., Xie, T., Anvik, J., Sun, J.: An approach to detecting duplicate bug reports using natural language and execution information. In: Proc. 30th ICSE, pp. 461–470 (2008)
Wasylkowski, A., Zeller, A., Lindig, C.: Detecting object usage anomalies. In: Proc. ESEC/FSE, pp. 35–44 (2007)
Weimer, W., Necula, G.: Mining temporal specifications for error detection. In: Proc. TACAS, pp. 461–476 (2005)
Whaley, J., Martin, M., Lam, M.: Automatic extraction of object-oriented component interfaces. In: Proc. ISSTA, pp. 218–228 (2002)
Williams, C., Hollingsworth, J.: Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans. Softw. Eng. 31(6), 466–480 (2005)
Würsch, M., Ghezzi, G., Reif, G., Gall, H.: Supporting developers with natural language queries. In: Proc. 32nd ICSE, pp. 165–174 (2010)
Xu, G., Rountev, A.: Precise memory leak detection for Java software using container profiling. In: Proc. 30th ICSE, pp. 151–160 (2008)
Yang, J., Evans, D., Bhardwaj, D., Bhat, T., Das, M.: Perracotta: mining temporal API rules from imperfect traces. In: Proc. 28th ICSE, pp. 282–291 (2006)
Zhong, H., Zhang, L., Mei, H.: Early filtering of polluting method calls for mining temporal specifications. In: Proc. 15th APSEC, pp. 9–16 (2008a)
Zhong, H., Zhang, L., Mei, H.: Inferring specifications of object oriented APIs from API source code. In: Proc. 15th APSEC, pp. 221–228 (2008b)
Zhong, H., Xie, T., Zhang, L., Pei, J., Mei, H.: MAPO: Mining and recommending API usage patterns. In: Proc. 23rd ECOOP, pp. 318–343 (2009a)
Zhong, H., Zhang, L., Xie, T., Mei, H.: Inferring resource specifications from natural language API documentation. In: Proc. 24th ASE, pp. 307–318 (2009b)
Zhou, G., Su, J.: Named entity recognition using an HMM-based chunk tagger. In: Proc. 40th ACL, pp. 473–480 (2001)
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is a revised, expanded version of a paper (Zhong et al. 2009b) presented at the 24th IEEE/ACM International Conference on Automated Software Engineering Conference (ASE 2009), which won the best paper award of the conference and the ACM SIGSOFT distinguished paper award. The work of this paper was done when Hao Zhong was a PhD student with Peking University under the supervision of Prof. Hong Mei, and the revisions over the previous ASE 2009 paper (Zhong et al. 2009b) were done when Hao Zhong became an assistant professor with Chinese Academy of Sciences since 2009.
Rights and permissions
About this article
Cite this article
Zhong, H., Zhang, L., Xie, T. et al. Inferring specifications for resources from natural language API documentation. Autom Softw Eng 18, 227–261 (2011). https://doi.org/10.1007/s10515-011-0082-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-011-0082-3