Authors:
Rakia Saidi
1
;
Fethi Jarray
1
;
2
and
Mohammed Alsuhaibani
3
Affiliations:
1
LIMTIC Laboratory, UTM University, Tunis, Tunisia
;
2
Higher institute of computer science of Medenine, Gabes University, Medenine, Tunisia
;
3
Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
Keyword(s):
Semantic Textual Similarity, Siamese Networks, BERT, Soft Attention, Arabic BERT.
Abstract:
The assessment of semantic textual similarity (STS) is a challenging task in natural language processing. It is crucial for many applications, including question answering, plagiarism detection, machine translation, information retrieval, and word sense disambiguation. The STS task evaluates the similarity of data pairs of text. For high high-resource languages (e.g. English), several approaches for STS have been proposed. In this paper, we are interested in measuring the semantic similarity of texts for Arabic, a low-resource language. A standard approach for STS is based on vector embedding of the input text and application of similarity metric on space embedding. In this contribution, we propose a BERT-based Siamese Network (SiameseBERT) and investigate the most available Arabic BERT models to embed the input sentences. We validate our approach via Arabic STS datasets. The araBERT-based Siamese Network model achieves a Pearson correlation of 0.925. The results obtained demonstrate
the superiority of integrating the BERT embedding, the attention mechanism, and the Siamese neural network for the semantic textual similarity task.
(More)