Systematic Evaluation of Different Approaches on Embedding Search

Aperdannier, Roman; Koeppel, Melanie; Unger, Tamina; Schacht, Sigurd; Barkur, Sudarshan Kamath

doi:10.1007/978-3-031-53963-3_36

Roman Aperdannier¹⁰,
Melanie Koeppel¹⁰,
Tamina Unger¹⁰,
Sigurd Schacht¹⁰ &
…
Sudarshan Kamath Barkur¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 920))

Included in the following conference series:

Future of Information and Communication Conference

795 Accesses
2 Citations

Abstract

This paper presents a comparative analysis of various methods of embedding search in insurance documents. The evaluation focuses on different SentenceTransformers models integrated within LangChain. Further, we assess the performance of the text-embedding-ada-002, Vicuna-13B, and a fine-tuned variant of the Vicuna-13B within the same pipeline. In an effort to broaden our evaluation, we also investigate a custom HuggingFace pipeline that compares the embeddings generated at the token level. Our findings reveal that the text-embedding-ada-002 model provides the most favorable results. Furthermore, in terms of open-source alternatives, the SentenceTransformers model all-miniLM-L12-v2 outperforms other models. To our knowledge, there is currently no published research addressing retrieval using embeddings on German insurance documents, thus underscoring the unique relevance of this study in this niche domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

PlanBERT: From Messy Zonal Plans to Informative Vector Embeddings

Effectiveness in retrieving legal precedents: exploring text summarization and cutting-edge language models toward a cost-efficient approach

Article Open access 20 February 2025

Def2Vec: you shall know a word by its definition

Article Open access 16 October 2024

References

Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., Gurevych, I.: BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. arXiv:2104.08663 (2021). http://arxiv.org/abs/2104.08663
Ethayarajh, K.: How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. arXiv (2019). http://arxiv.org/abs/1909.00512
Muennighoff, N.: SGPT: GPT Sentence Embeddings for Semantic Search. arXiv:2202.08904 (2022). http://arxiv.org/abs/2202.08904
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Huang, J.-T., et al.: Embedding-based retrieval in facebook search. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, pp. 2553–2561. ACM (2020). https://doi.org/10.1145/3394486.3403305
Li, H., Xu, J.: Semantic matching in search. Found. Trends® Inf. Retrieval 7(5), 343–469 (2014). https://doi.org/10.1561/1500000035. https://www.nowpublishers.com/article/Details/INR-035
Wang, Z., Mi, H., Ittycheriah, A.: Sentence Similarity Learning by Lexical Decomposition and Composition. arXiv (2017). http://arxiv.org/abs/1602.07019
Farouk, M.: Measuring sentences similarity: a survey. Indian J. Sci. Technol. 12(25), 1–11 (2019). https://indjst.org/articles/measuring-sentences-similarity-a-survey
Faruqui, M., Tsvetkov, Y., Rastogi, P., Dyer, C.: Problems With Evaluation of Word Embeddings Using Word Similarity Tasks. arXiv (2016). http://arxiv.org/abs/1605.02276
Reimers, N.: Pretrained Models - Sentence-Transformers documentation (2023). https://www.sbert.net/docs/pretrained_models.html
Reimers, N.: Computing Sentence Embeddings - Sentence-Transformers documentation (2022). https://www.sbert.net/examples/applications/computing-embeddings/README.html. Accessed 26 June 2023
Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, Barcelona, Spain, pp. 74–81. Association for Computational Linguistics (2004). https://aclanthology.org/W04-1013
Hu, E.J., et al.: LoRA: Low-Rank Adaptation of Large Language Models. arXiv (2021). http://arxiv.org/abs/2106.09685
Touvron, H., et al.: LLaMA: Open and Efficient Foundation Language Models. arXiv (2023). http://arxiv.org/abs/2302.13971
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. arXiv (2020). http://arxiv.org/abs/2002.10957
Muennighoff, N., Tazi, N., Magne, L., Reimers, N.: MTEB: Massive Text Embedding Benchmark. arXiv preprint arXiv:2210.07316 (2022). https://doi.org/10.48550/ARXIV.2210.07316
Greene, R., Sanders, T.: New and improved embedding model (2023). https://openai.com/blog/new-and-improved-embedding-model
Yoo, Y., Heo, T.-S., Park, Y., Kim, K.: A novel hybrid methodology of measuring sentence similarity. Symmetry 13(8), 1442 (2021). https://www.mdpi.com/2073-8994/13/8/1442
Muennighoff, N., Tazi, N., Magne, L., Reimers, N.: MTEB: Massive Text Embedding Benchmark. arXiv (2023). http://arxiv.org/abs/2210.07316

Download references

Author information

Authors and Affiliations

Faculty of Business, University of Applied Science, Ansbach, Germany
Roman Aperdannier, Melanie Koeppel, Tamina Unger, Sigurd Schacht & Sudarshan Kamath Barkur

Authors

Roman Aperdannier
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Koeppel
View author publications
You can also search for this author in PubMed Google Scholar
Tamina Unger
View author publications
You can also search for this author in PubMed Google Scholar
Sigurd Schacht
View author publications
You can also search for this author in PubMed Google Scholar
Sudarshan Kamath Barkur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sigurd Schacht .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aperdannier, R., Koeppel, M., Unger, T., Schacht, S., Barkur, S.K. (2024). Systematic Evaluation of Different Approaches on Embedding Search. In: Arai, K. (eds) Advances in Information and Communication. FICC 2024. Lecture Notes in Networks and Systems, vol 920. Springer, Cham. https://doi.org/10.1007/978-3-031-53963-3_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-53963-3_36
Published: 17 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53962-6
Online ISBN: 978-3-031-53963-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Systematic Evaluation of Different Approaches on Embedding Search