Efficient storage and fast querying of source code

Panchenko, Oleksandr; Plattner, Hasso; Zeier, Alexander B.

doi:10.1007/s10796-010-9285-6

Efficient storage and fast querying of source code

Published: 16 November 2010

Volume 13, pages 349–357, (2011)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Oleksandr Panchenko¹,
Hasso Plattner¹ &
Alexander B. Zeier¹

231 Accesses
3 Citations
Explore all metrics

Abstract

Enabling fast and detailed insights over large portions of source code is an important task in a global development ecosystem. Numerous data structures have been developed to store source code and to support various structural queries, to help in navigation, evaluation and analysis. Many of these data structures work with tree-based or graph-based representations of source code. The goal of this project is to elaborate a data storage that enables efficient storing and fast querying of structural information. The naive adjacency list method has been enhanced with the use of recent data compression approaches for column-oriented databases to allow no-loss albeit compact storage of fine-grained structural data. The graph indexing has enabled the proposed data model to expeditiously answer fine-grained structural queries. This paper describes the basics of the proposed approach and illustrates its technical feasibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artifact Representation Techniques for Large-Scale Software Search Engines

Robust and scalable content-and-structure indexing

Article Open access 15 October 2022

Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases

Notes

References

Abadi, D., Madden, S., & Ferreira, M. (2006). Integrating compression and execution in column-oriented database systems. In Proceedings of the international conference on management of data (pp. 671–682). ACM.
Bajracharya, S., Ngo, T., Linstead, E., Dou, Y., Rigor, P., Baldi, P., et al. (2006). Sourcerer: A search engine for open source code supporting structure-based search. In Companion to the 21st SIGPLAN symposium on object-oriented programming systems, languages, and applications (pp. 681–682). ACM.
Begel, A. (2007). Codifier: A programmer-centric search user interface. In Proceedings of the workshop on human-computer interaction and information retrieval (pp. 23–24).
Hajiyev, E., Verbaere, M., & de Moor, O. (2006). CodeQuest: Scalable source code queries with datalog. In Proceedings of the 20th European conference on object-oriented programming (Vol. 4067, pp. 2–27). Berlin: Springer.
Google Scholar
Hill, E., Pollock, L., & Vijay-Shanker, K. (2007). Exploring the neighborhood with dora to expedite software maintenance. In Proceedings of the 22nd international conference on automated software engineering (pp. 14–23). ACM.
Holmes, R., Walker, R. J., & Murphy, G. C. (2006). Approximate structural context matching: An approach to recommend relevant examples. IEEE Transactions on Software Engineering, 32(12), 952–970.
Article Google Scholar
Hummel, O., & Atkinson, C. (2006). Using the web as a reuse repository. In Proceedings of the international conference on software reuse (pp. 298–311).
Hummel, O., Janjic, W., & Atkinson, C. (2008). Code conjurer: Pulling reusable software out of thin air. IEEE Software, 25(5), 45–52.
Article Google Scholar
Janzen, D., & Volder, K. D. (2003). Programs as information. In Proceedings of the OOPSLA workshop on eclipse technology exchange (pp. 69–73). New York: ACM.
Chapter Google Scholar
Keller, H., & Krüger, S. (2007). ABAP objects: ABAP programming in SAP NetWeaver. Galileo Press.
Koskinen, J., Salminen, A., & Paakki, J. (2004). Hypertext support for the information needs of software maintainers. Journal of Software Maintenance and Evolution: Research and Practice, 16(3), 187–215.
Article Google Scholar
Lethbridge, T., & Singer, J. (2001). Studies of the work practices of software engineers. In H. Erdogmus, & O. Tanir (Eds.), Advances in software engineering: Comprehension, evaluation, and evolution (pp. 53–76). Springer.
Liu, D., & Xu, S. (2007). Challenges of using LSI for concept location. In Proceedings of the 45th annual southeast regional conference (pp. 449–454). ACM.
Marcus, A., Sergeyev, A., Rajlich, V., & Maletic, J. I. (2004). An information retrieval approach to concept location in source code. In Proceedings of the 11th working conference on reverse engineering (pp. 214–223). IEEE Computer Society.
McCormick, E., & Volder, K. D. (2004). JQuery: Finding your way through tangled code. In Proceedings of the 19th annual SIGPLAN conference on object-oriented programming systems, languages, and applications (pp. 9–10). ACM.
Poshyvanyk, D., Petrenko, M., Marcus, A., Xie, X., & Liu, D. (2006). Source code exploration with Google. In Proceedings of the 22nd IEEE international conference on software maintenance (pp. 334–338). IEEE Computer Society.
Schaffner, J., Bog, A., Krüger, J., & Zeier, A. (2008). A hybrid row-column OLTP database architecture for operational reporting. In Proceedings of the international workshop on business intelligence for the real time enterprise.
Sim, S. E., Clarke, C. L. A., & Holt, R. C. (1998). Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the 6th international workshop on program comprehension (pp. 180–187). IEEE Computer Society.
Stockinger, K., Cieslewicz, J., Wu, K., Rotem, D., & Shoshani, A. (2009). Using bitmap index for joint queries on structured and text data. Annals of Information Systems, 1–23.
Transier, F., & Sanders, P. (2008). Compressed inverted indexes for in-memory search engines. In Proceedings of the 9th workshop on algorithm engineering and experiments.
Trißl, S., & Leser, U. (2007). Fast and practical indexing and querying of very large graphs. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 845–856). ACM.
von Mayrhauser, A., & Vans, A. M. (1997). Program understanding needs during corrective maintenance of large scale software. In Proceedings of the 21st international computer software and applications conference (pp. 630–637). IEEE Computer Society.

Download references

Acknowledgements

This project has been done in cooperation with SAP AG. In particular, we would like to thank Jan Karstens, Heinz Ulrich Roggenkemper, Wolfgang Stephan, Cafer Tosun, Xiwei Zhou.

Author information

Authors and Affiliations

Hasso Plattner Institute for Software Systems Engineering, P.O. Box 900460, 14440, Potsdam, Germany
Oleksandr Panchenko, Hasso Plattner & Alexander B. Zeier

Authors

Oleksandr Panchenko
View author publications
You can also search for this author in PubMed Google Scholar
Hasso Plattner
View author publications
You can also search for this author in PubMed Google Scholar
Alexander B. Zeier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oleksandr Panchenko.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Panchenko, O., Plattner, H. & Zeier, A.B. Efficient storage and fast querying of source code. Inf Syst Front 13, 349–357 (2011). https://doi.org/10.1007/s10796-010-9285-6

Download citation

Published: 16 November 2010
Issue Date: July 2011
DOI: https://doi.org/10.1007/s10796-010-9285-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient storage and fast querying of source code

Abstract

Access this article

Similar content being viewed by others

Artifact Representation Techniques for Large-Scale Software Search Engines

Robust and scalable content-and-structure indexing

Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient storage and fast querying of source code

Abstract

Access this article

Similar content being viewed by others

Artifact Representation Techniques for Large-Scale Software Search Engines

Robust and scalable content-and-structure indexing

Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation