Selma: A Semantic Local Code Search Platform

Reusch, Anja; Lopes, Guilherme C.; Pertsch, Wilhelm; Ueck, Hannes; Gonsior, Julius; Lehner, Wolfgang

doi:10.1007/978-3-031-56069-9_21

Anja Reusch¹⁴,
Guilherme C. Lopes¹⁴,
Wilhelm Pertsch¹⁴,
Hannes Ueck¹⁴,
Julius Gonsior¹⁴ &
…
Wolfgang Lehner¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14612))

Included in the following conference series:

European Conference on Information Retrieval

1190 Accesses

Abstract

Searching for the right code snippet is cumbersome and not a trivial task. Online platforms such as Github.com or searchcode.com provide tools to search, but they are limited to publicly available and internet-hosted code. However, during the development of research prototypes or confidential tools, it is preferable to store source code locally. Consequently, the use of external code search tools becomes impractical. Here, we present Selma (Code and Videos: https://anreu.github.io/selma): a local code search platform that enables term-based and semantic retrieval of source code. Selma searches code and comments, annotates undocumented code to enable term-based search in natural language, and trains neural models for code retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semantic Code Search in Software Repositories using Neural Machine Translation

A Code Search Method Incorporating Code Annotations

ExCS: accelerating code search with code expansion

Article Open access 25 November 2024

Notes

References

Ahmad, W.U., Chakraborty, S., Ray, B., Chang, K.W.: Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333 (2021)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423
Elnaggar, A., et al.: Codetrans: towards cracking the language of silicon’s code through self-supervised deep learning and high performance computing. arXiv e-prints pp. arXiv-2104 (2021)
Google Scholar
Feng, Z., et al.: Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020)
Guo, D., et al.: Graphcodebert: pre-training code representations with data flow. In: ICLR (2021)
Google Scholar
Husain, H., Wu, H.H., Gazit, T., Allamanis, M., Brockschmidt, M.: Codesearchnet challenge: evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019)
Kanade, A., Maniatis, P., Balakrishnan, G., Shi, K.: Learning and evaluating contextual embedding of source code. In: International Conference on Machine Learning, pp. 5110–5121. PMLR (2020)
Google Scholar
Khattab, O., Zaharia, M.: Colbert: efficient and effective passage search via contextualized late interaction over bert. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 39–48 (2020)
Google Scholar
Liu, S., Wu, B., Xie, X., Meng, G., Liu, Y.: Contrabert: enhancing code pre-trained models via contrastive learning. arXiv preprint arXiv:2301.09072 (2023)
Macdonald, C., Tonellotto, N.: Declarative experimentation in information retrieval using pyterrier. In: Proceedings of ICTIR 2020 (2020)
Google Scholar
Neelakantan, A., et al.: Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005 (2022)
Nguyen, N., Nadi, S.: An empirical evaluation of github copilot’s code suggestions. In: Proceedings of the 19th International Conference on Mining Software Repositories, MSR 2022, pp. 1–5. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3524842.3528470
Nogueira, R., Yang, W., Lin, J., Cho, K.: Document expansion by query prediction. arXiv preprint arXiv:1904.08375 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Yates, A., Nogueira, R., Lin, J.: Pretrained transformers for text ranking: Bert and beyond. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2666–2668 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Dresden Database Systems Group, Technische Universität Dresden, Dresden, Germany
Anja Reusch, Guilherme C. Lopes, Wilhelm Pertsch, Hannes Ueck, Julius Gonsior & Wolfgang Lehner

Authors

Anja Reusch
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme C. Lopes
View author publications
You can also search for this author in PubMed Google Scholar
Wilhelm Pertsch
View author publications
You can also search for this author in PubMed Google Scholar
Hannes Ueck
View author publications
You can also search for this author in PubMed Google Scholar
Julius Gonsior
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anja Reusch .

Editor information

Editors and Affiliations

Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, PISA, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reusch, A., Lopes, G.C., Pertsch, W., Ueck, H., Gonsior, J., Lehner, W. (2024). Selma: A Semantic Local Code Search Platform. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14612. Springer, Cham. https://doi.org/10.1007/978-3-031-56069-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-56069-9_21
Published: 23 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56068-2
Online ISBN: 978-3-031-56069-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics