Skip to main content
Log in

Efficient storage and fast querying of source code

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Enabling fast and detailed insights over large portions of source code is an important task in a global development ecosystem. Numerous data structures have been developed to store source code and to support various structural queries, to help in navigation, evaluation and analysis. Many of these data structures work with tree-based or graph-based representations of source code. The goal of this project is to elaborate a data storage that enables efficient storing and fast querying of structural information. The naive adjacency list method has been enhanced with the use of recent data compression approaches for column-oriented databases to allow no-loss albeit compact storage of fine-grained structural data. The graph indexing has enabled the proposed data model to expeditiously answer fine-grained structural queries. This paper describes the basics of the proposed approach and illustrates its technical feasibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.sdn.sap.com/irj/sdn/abap

  2. http://www.eclipse.org/jdt/

  3. http://www.eclipse.org

References

  • Abadi, D., Madden, S., & Ferreira, M. (2006). Integrating compression and execution in column-oriented database systems. In Proceedings of the international conference on management of data (pp. 671–682). ACM.

  • Bajracharya, S., Ngo, T., Linstead, E., Dou, Y., Rigor, P., Baldi, P., et al. (2006). Sourcerer: A search engine for open source code supporting structure-based search. In Companion to the 21st SIGPLAN symposium on object-oriented programming systems, languages, and applications (pp. 681–682). ACM.

  • Begel, A. (2007). Codifier: A programmer-centric search user interface. In Proceedings of the workshop on human-computer interaction and information retrieval (pp. 23–24).

  • Hajiyev, E., Verbaere, M., & de Moor, O. (2006). CodeQuest: Scalable source code queries with datalog. In Proceedings of the 20th European conference on object-oriented programming (Vol. 4067, pp. 2–27). Berlin: Springer.

    Google Scholar 

  • Hill, E., Pollock, L., & Vijay-Shanker, K. (2007). Exploring the neighborhood with dora to expedite software maintenance. In Proceedings of the 22nd international conference on automated software engineering (pp. 14–23). ACM.

  • Holmes, R., Walker, R. J., & Murphy, G. C. (2006). Approximate structural context matching: An approach to recommend relevant examples. IEEE Transactions on Software Engineering, 32(12), 952–970.

    Article  Google Scholar 

  • Hummel, O., & Atkinson, C. (2006). Using the web as a reuse repository. In Proceedings of the international conference on software reuse (pp. 298–311).

  • Hummel, O., Janjic, W., & Atkinson, C. (2008). Code conjurer: Pulling reusable software out of thin air. IEEE Software, 25(5), 45–52.

    Article  Google Scholar 

  • Janzen, D., & Volder, K. D. (2003). Programs as information. In Proceedings of the OOPSLA workshop on eclipse technology exchange (pp. 69–73). New York: ACM.

    Chapter  Google Scholar 

  • Keller, H., & Krüger, S. (2007). ABAP objects: ABAP programming in SAP NetWeaver. Galileo Press.

  • Koskinen, J., Salminen, A., & Paakki, J. (2004). Hypertext support for the information needs of software maintainers. Journal of Software Maintenance and Evolution: Research and Practice, 16(3), 187–215.

    Article  Google Scholar 

  • Lethbridge, T., & Singer, J. (2001). Studies of the work practices of software engineers. In H. Erdogmus, & O. Tanir (Eds.), Advances in software engineering: Comprehension, evaluation, and evolution (pp. 53–76). Springer.

  • Liu, D., & Xu, S. (2007). Challenges of using LSI for concept location. In Proceedings of the 45th annual southeast regional conference (pp. 449–454). ACM.

  • Marcus, A., Sergeyev, A., Rajlich, V., & Maletic, J. I. (2004). An information retrieval approach to concept location in source code. In Proceedings of the 11th working conference on reverse engineering (pp. 214–223). IEEE Computer Society.

  • McCormick, E., & Volder, K. D. (2004). JQuery: Finding your way through tangled code. In Proceedings of the 19th annual SIGPLAN conference on object-oriented programming systems, languages, and applications (pp. 9–10). ACM.

  • Poshyvanyk, D., Petrenko, M., Marcus, A., Xie, X., & Liu, D. (2006). Source code exploration with Google. In Proceedings of the 22nd IEEE international conference on software maintenance (pp. 334–338). IEEE Computer Society.

  • Schaffner, J., Bog, A., Krüger, J., & Zeier, A. (2008). A hybrid row-column OLTP database architecture for operational reporting. In Proceedings of the international workshop on business intelligence for the real time enterprise.

  • Sim, S. E., Clarke, C. L. A., & Holt, R. C. (1998). Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the 6th international workshop on program comprehension (pp. 180–187). IEEE Computer Society.

  • Stockinger, K., Cieslewicz, J., Wu, K., Rotem, D., & Shoshani, A. (2009). Using bitmap index for joint queries on structured and text data. Annals of Information Systems, 1–23.

  • Transier, F., & Sanders, P. (2008). Compressed inverted indexes for in-memory search engines. In Proceedings of the 9th workshop on algorithm engineering and experiments.

  • Trißl, S., & Leser, U. (2007). Fast and practical indexing and querying of very large graphs. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 845–856). ACM.

  • von Mayrhauser, A., & Vans, A. M. (1997). Program understanding needs during corrective maintenance of large scale software. In Proceedings of the 21st international computer software and applications conference (pp. 630–637). IEEE Computer Society.

Download references

Acknowledgements

This project has been done in cooperation with SAP AG. In particular, we would like to thank Jan Karstens, Heinz Ulrich Roggenkemper, Wolfgang Stephan, Cafer Tosun, Xiwei Zhou.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oleksandr Panchenko.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Panchenko, O., Plattner, H. & Zeier, A.B. Efficient storage and fast querying of source code. Inf Syst Front 13, 349–357 (2011). https://doi.org/10.1007/s10796-010-9285-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-010-9285-6

Keywords

Navigation