AutoQuery: automatic construction of dependency queries for code search

Wang, Shaowei; Lo, David; Jiang, Lingxiao

doi:10.1007/s10515-014-0170-2

AutoQuery: automatic construction of dependency queries for code search

Published: 25 September 2014

Volume 23, pages 393–425, (2016)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Shaowei Wang¹,
David Lo¹ &
Lingxiao Jiang¹

664 Accesses
11 Citations
15 Altmetric
2 Mentions
Explore all metrics

Abstract

Many code search techniques have been proposed to return relevant code for a user query expressed as textual descriptions. However, source code is not mere text. It contains dependency relations among various program elements. To leverage these dependencies for more accurate code search results, techniques have been proposed to allow user queries to be expressed as control and data dependency relationships among program elements. Although such techniques have been shown to be effective for finding relevant code, it remains a question whether appropriate queries can be generated by average users. In this work, we address this concern by proposing a technique, AutoQuery, that can automatically construct dependency queries from a set of code snippets. We realize AutoQuery by the following major steps: firstly, code snippets (that are not necessarily compilable) are converted into program dependence graphs (PDGs); secondly, a new graph mining solution is built to return common structures in the PDGs; thirdly, the common structures are converted to dependency queries, which are used to retrieve results by using a dependence-based code search technique. We have evaluated AutoQuery on real systems with 47 different code search tasks. The results show that the automatically constructed dependency queries retrieve relevant code with a precision, recall, and F-measure of 68.4, 72.1, and 70.2 %, respectively. We have also performed a user study to compare the effectiveness of AutoQuery with that of human generated queries. The results show that queries constructed by AutoQuery on average help to retrieve code fragments with comparable F-measures to those retrieved by human constructed queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

We use the node types defined by CodeSurfer. There are 33 different node types for C/C++, e.g., function call, expression, etc.
https://bitbucket.org/eliben/pycparser.
First argument of function $tag \rightarrow check()$.
We simply name a type by using its corresponding variable name in uppercase letters.
http://www.liacs.nl/~snijssen/gaston/.

References

Andersen, J., Lawall, J.L.: Generic patch inference. Autom. Softw. Eng. 17(2), 119–148 (2010)
Article Google Scholar
Andersen, J., Nguyen, A. C., Lo, D., Lawall, J. L., Khoo, S.-C.: Semantic patch inference. In: ASE, pp. 382–385 (2012)
Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng. 28(10), 970–983 (2002)
Article Google Scholar
Baah, G.K., Podgurski, A., Harrold, M.J.: The probabilistic program dependence graph and its application to fault diagnosis. IEEE Trans. Softw. Eng. 36(4), 528–545 (2010)
Article Google Scholar
Baker, H.G.: Unify and conquer (garbage, updating, aliasing, ...) in functional languages. In: ACM Conference on LISP and Functional Programming, pp. 218–226 (1990)
Chan, W.-K., Cheng, H., Lo, D.: Searching connected api subgraph via text phrases. In: SIGSOFT FSE, p. 10 (2012)
Chang, R.-Y., Podgurski, A., Yang, J.: Discovering neglected conditions in software by mining dependence graphs. IEEE Trans. Softw. Eng. 34(5), 579–596 (2008)
Article Google Scholar
Cheng, H., Lo, D., Zhou, Y., Wang, X., Yan, X.: Identifying bug signatures using discriminative graph mining. In: ISSTA, pp. 141–152 (2009)
Dagenais, B., Hendren, L.: Enabling static analysis for partial java programs. In OOPSLA, pp. 313–328 (2008)
Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D.: Feature location in source code: a taxonomy and survey. J. Softw. 25(1), 53–95 (2013)
Google Scholar
Dit, B., Revelle, M., Poshyvanyk, D.: Integrating information retrieval, execution and link analysis algorithms to improve feature location in software. Empir. Softw. Eng. 18(2), 277–309 (2013)
Article Google Scholar
Gabel, M., Jiang, L., Su, Z.: Scalable detection of semantic clones. In: ICSE, pp. 321–330 (2008)
Ganesh, V., Kiezun, A., Artzi, S., Guo, P.J., Hooimeijer, P., Ernst, M.D.: Hampi: a string solver for testing, analysis and vulnerability detection. In: CAV, pp. 1–19 (2011)
Codesurfer, Grammatech (2013)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)
Book MATH Google Scholar
Hardekopf, B., Lin, C.: The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In: PLDI, pp. 290–299 (2007)
Horwitz, S., Reps, T.W.: The use of program dependence graphs in software engineering. In: ICSE, pp. 392–411 (1992)
Jang, J., Agrawal, A., Brumley, D.: Redebug: Finding unpatched code clones in entire os distributions. In: IEEE Symposium on Security and Privacy (S&P), pp. 48–62 (2012)
Jiang, L., Misherghi, G., Su, Z., Glondu, S.: Deckard: Scalable and accurate tree-based detection of code clones. In: ICSE, pp. 96–105 (2007)
Jiang, L., Su, Z.: Automatic mining of functionally equivalent code fragments via random testing. In: ISSTA, pp. 81–92 (2009)
Kim, J., Lee, S., Hwang, S.-W., Kim, S.: Towards an intelligent code search engine. In: AAAI (2010)
Komondoor, R., Horwitz, S.: Tool demonstration: finding duplicated code using program dependences. In: ESOP, pp. 383–386 (2001)
Lattner, C., Lenharth, A., Adve, V.S.: Making context-sensitive points-to analysis with heap cloning practical for the real world. In: PLDI, pp. 278–289 (2007)
Lee, M.-W., Roh, J.-W., won Hwang, S., Kim, S.: Instant code clone search. In: SIGSOFT FSE, pp. 167–176 (2010)
Li, J., Ernst, M.D.: Cbcd: Cloned buggy code detector. In: ICSE, pp. 310–320 (2012)
Liu, C., Chen, C., Han, J., Yu, P.S.: GPLAG: Detection of software plagiarism by program dependence graph analysis. In: KDD, pp. 872–881 (2006)
Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., Fu, C.: Portfolio: finding relevant functions and their usage. In: ICSE, pp. 111–120 (2011)
Meng, N., Kim, M., McKinley, K.S.: Sydit: creating and applying a program transformation from an example. In: SIGSOFT FSE, pp. 440–443 (2011)
Meng, N., Kim, M., McKinley, K.S.: Systematic editing: generating program transformations from an example. In: PLDI, pp. 329–342 (2011)
Meng, N., Kim, M., McKinley, K.S.: Locating and applying systematic edits by learning from examples. In: ICSE (2013)
Milner, R.: A theory of type polymorphism in programming. J. Comput. Syst. Sci. 17, 348–375 (1978)
Article MathSciNet MATH Google Scholar
Nguyen, T.T., Nguyen, H.A., Pham, N.H., Al-Kofahi, J.M., Nguyen, T.N.: Graph-based mining of multiple object usage patterns. In: ESEC/SIGSOFT FSE, pp. 383–392 (2009)
O’Callahan, R., Jackson, D.: Lackwit: a program understanding tool based on type inference. In: ICSE, pp. 338–348 (1997)
Roy, C.K., Cordy, J.R., Koschke, R.: Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci. Comput. Program. 74(7), 470–495 (2009)
Article MathSciNet MATH Google Scholar
Sun, B., Podgurski, A., Ray, S.: Improving the precision of dependence-based defect mining by supervised learning of rule and violation graphs. In: ISSRE, pp. 1–10 (2010)
Thummalapenta, S., Xie, T.: Parseweb: a programmer assistant for reusing open source code on the web. In: ASE, pp. 204–213 (2007)
Tian, Y., Lo, D., Lawall, J.L.: Automated construction of a software-specific word similarity database. In: CSMR-WCRE, pp. 44–53 (2014)
Wang, S., Lo, D., Jiang, L.: Code search via topic-enriched dependence graph matching. In: WCRE, pp. 119–123 (2011)
Wang, S., Lo, D., Jiang, L.: Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM), ICSM ’12, pp. 604–607 (2012)
Wang, S., Lo, D., Jiang, L.: Understanding widespread changes: a taxonomic study. In: CSMR (2013)
Wang, S., Lo, D., Xing, Z., Jiang, L.: Concern localization using information retrieval: an empirical study on linux kernel. In: WCRE, pp. 92–96 (2011)
Wang, X., Lo, D., Cheng, J., Zhang, L., Mei, H., Yu, J.X.: Matching dependence-related queries in the system dependence graph. In: ASE, pp. 457–466 (2010)
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80–83 (1945)
Article Google Scholar
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In ICDM, pp. 721–724 (2002)
Yan, X., Han, J.: Closegraph: mining closed frequent graph patterns. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. KDD ’03, pp. 286–295. NY, USA, ACM, New York (2003)
Zhu, F., Qu, Q., Lo, D., Yan, X., Han, J., Yu, P.S.: Mining top-k large structural patterns in a massive network. PVLDB 4(11), 807–818 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Systems, Singapore Management University, 80 Stamford Road, Singapore, Singapore
Shaowei Wang, David Lo & Lingxiao Jiang

Authors

Shaowei Wang
View author publications
Search author on:PubMed Google Scholar
David Lo
View author publications
Search author on:PubMed Google Scholar
Lingxiao Jiang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to David Lo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Lo, D. & Jiang, L. AutoQuery: automatic construction of dependency queries for code search. Autom Softw Eng 23, 393–425 (2016). https://doi.org/10.1007/s10515-014-0170-2

Download citation

Received: 24 December 2013
Accepted: 10 September 2014
Published: 25 September 2014
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10515-014-0170-2

Keywords

Profiles

Lingxiao Jiang View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AutoQuery: automatic construction of dependency queries for code search

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

AutoG: a visual query autocompletion framework for graph databases

CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search

Exemplar queries: a new way of searching

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Profiles

Subscribe and save

Buy Now

AutoQuery: automatic construction of dependency queries for code search

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

AutoG: a visual query autocompletion framework for graph databases

CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search

Exemplar queries: a new way of searching

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now