Abstract:
Word vectors are the real-valued numbers which allow machine learning algorithms to extract the semantic information concern with the words when trained on natural langua...Show MoreMetadata
Abstract:
Word vectors are the real-valued numbers which allow machine learning algorithms to extract the semantic information concern with the words when trained on natural language corpora. The paper explores word representation techniques with evaluation criteria to measure the quality of representation through deep learning models like BERT. The performance of these words vectors can be evaluated using certain measures. Broadly, the two classes of evaluation are intrinsic and extrinsic evaluation. Intrinsic evaluators directly extract syntactic or semantic relationships between the words independent of any language processing task. These evaluators focus on subtasks while extrinsic evaluators consider complete natural language processing task as a measure of performance like chunking, sentiment analysis etc. The experiments have been performed using BOW model, Word2Vec and BERT language model. In this research work word-similarity task is considered for intrinsic evaluation and part-of-speech (POS) tagging task is used as a measure for extrinsic evaluation. The experiments have been performed using python, sklearn machine learning toolkit and keras deep learning framework. BERT language model is used which has recently emerged as the prominent tool for natural language processing. The result obtained from the experiment in this research for word embedding representation techniques are efficient and better compared to other existing traditional models. However, considering large datasets this can be enhanced for better accuracy
Date of Conference: 14-16 October 2020
Date Added to IEEE Xplore: 09 December 2020
ISBN Information: