Weighted word2vec based on the distance of words | IEEE Conference Publication | IEEE Xplore

Weighted word2vec based on the distance of words


Abstract:

Word2vec is a novel technique for the study and application of natural language processing(NLP). It trains a word embedding neural network model with a large training cor...Show More

Abstract:

Word2vec is a novel technique for the study and application of natural language processing(NLP). It trains a word embedding neural network model with a large training corpus. After the model is trained, each word is represented by a vector in the specified vector space. The vectors obtained possess many interesting and useful characteristics that are implicitly embedded with the original words. The idea of word2vec is that there are relations between the words if they appear in the neighborhood. These relations are employed by considering various context windows in training the network model. However, word2vec doesn't consider the influence of distance between the words. It only considers whether or not the words appear in the same context window. We consider that word distances in the context bear certain semantic sense which can be exploited to better train the network model. To formalize the influence of different distances in the context, the fuzzy concept is adopted. Various experiments show that our proposed improvement can result in better language models than Word2Vec.
Date of Conference: 09-12 July 2017
Date Added to IEEE Xplore: 16 November 2017
ISBN Information:
Electronic ISSN: 2160-1348
Conference Location: Ningbo, China

Contact IEEE to Subscribe

References

References is not available for this document.