Effect of the training set on the word embeddings and similarity test set for Turkish | IEEE Conference Publication | IEEE Xplore
Scheduled Maintenance: On Monday, 27 January, the IEEE Xplore Author Profile management portal will undergo scheduled maintenance from 9:00-11:00 AM ET (1400-1600 UTC). During this time, access to the portal will be unavailable. We apologize for any inconvenience.

Effect of the training set on the word embeddings and similarity test set for Turkish


Abstract:

Word embedding, which is usually used in the literature especially for English, is a technique to associate each word to a mathematical vector representation under which ...Show More

Abstract:

Word embedding, which is usually used in the literature especially for English, is a technique to associate each word to a mathematical vector representation under which some structural or semantic relations hold. There are some Turkish application of this technique. Despite being designed according to English, it is also satisfactory for Turkish. In this study, the performance of Turkish word embeddings is analysed against the convenience of the data to the goal of the embedding. For this study, a new test set based on subject similarity in Turkish is introduced. This set is used to measure the performance of the word embeddings. This set will be publicly available for academic purposes2. A subject classifier, which beats the state of the art performance, for Turkish labeled text corpus is also proposed.
Date of Conference: 16-19 May 2016
Date Added to IEEE Xplore: 23 June 2016
ISBN Information:
Conference Location: Zonguldak, Turkey