Skip to main content
Log in

Global-locality preserving projection for word embedding

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Pre-trained word embedding has a significant impact on constructing representations for sentences, paragraphs and documents. However, existing word embedding methods are typically learned in the Euclidean space. Distributed word embedding suffers from inaccurate semantic similarity and high computational cost in the Euclidean metric space. In this study, we propose global-locality preserving projection to refine word representation by re-embedding word vectors from the original embedding space to a manifold semantic space. Our method extracts the local feature of the word vector and preserves the global feature of the word vector as well. It can discover the local geometric structure that also indicates the latent semantic structure and obtain a compact word embedding subspace. The performance of the method is assessed on several lexical-level intrinsic tasks of semantic similarity and semantic relatedness, and the experimental results demonstrate its advantages over other word embedding-based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://liir.cs.kuleuven.be/software.php.

References

  1. Mikolov T, Sutskever I, Chen K et al (2013) Distributed representations of words and phrases and their compositionality. arXiv:1310.4546

  2. Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  3. Yang Z, Chen H, Zhang J et al (2020) Attention-based multi-level feature fusion for named entity recognition. In: IJCAI, pp 3594–3600

  4. Ke P, Ji H, Liu S et al (2020) Sentilare: linguistic knowledge enhanced language representation for sentiment analysis. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 6975–6988

  5. Liu W, Tang J, Liang X et al (2021) Heterogeneous graph reasoning for knowledge-grounded medical dialogue system. Neurocomputing 442:260–268

    Article  Google Scholar 

  6. Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1637

    Article  Google Scholar 

  7. Lu J, Lai Z, Wang H et al (2020) Generalized embedding regression: a framework for supervised feature extraction. In: IEEE transactions on neural networks and learning systems, 2020

  8. Liu Y, Gao Q, Miao S et al (2016) A non-greedy algorithm for L1-norm LDA. IEEE Trans Image Process 26(2):684–695

    Article  MathSciNet  Google Scholar 

  9. Mu T, Goulermas JY, Tsujii J et al (2012) Proximity-based frameworks for generating embeddings from multi-output data. IEEE Trans Pattern Anal Mach Intell 34(11):2216–2232

    Article  Google Scholar 

  10. Lu J, Wang H, Zhou J et al (2021) Low-rank adaptive graph embedding for unsupervised feature extraction. Pattern Recogn 113:107758

    Article  Google Scholar 

  11. He X, Niyogi P (2004) Locality preserving projections. Adv Neural Inf Process Syst 16(16):153–160

    Google Scholar 

  12. Lu J, Lin J, Lai Z et al (2021) Target redirected regression with dynamic neighborhood structure. Inf Sci 544:564–584

    Article  MathSciNet  Google Scholar 

  13. Liu Y, Gao Q, Li J et al (2018) Zero shot learning via low-rank embedded semantic autoencoder. In: IJCAI, pp 2490–2496

  14. Liu Y, Nie F, Gao Q et al (2019) Flexible unsupervised feature extraction for image classification. Neural Netw 115:65–71

    Article  Google Scholar 

  15. Hashimoto TB, Alvarez-Melis D, Jaakkola TS (2016) Word embeddings as metric recovery in semantic spaces. Trans Assoc Comput Linguist 4:273–286

    Article  Google Scholar 

  16. Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. Adv Neural Inf Process Syst 27:2177–2185

    Google Scholar 

  17. Bengio Y, Ducharme R, Vincent P et al (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  18. Labutov I, Lipson H (2013) Re-embedding words. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 2: short papers), pp 489–493

  19. Lee Y Y, Ke H, Huang HH et al (2016) Less is more: filtering abnormal dimensions in GloVe. In: Proceedings of the 25th international conference companion on world wide web, pp 71–72

  20. Yu LC, Wang J, Lai KR et al (2017) Refining word embeddings using intensity scores for sentiment analysis. IEEE/ACM Trans Audio Speech Lang Process 26(3):671–681

    Article  Google Scholar 

  21. Mu J, Bhat S, Viswanath P (2018) All-but-the-top: simple and effective postprocessing for word representations. In: International conference on learning representations, ICLR 2018

  22. Wang S, Zhang J, Zong C (2018) Learning multimodal word representation via dynamic fusion methods. In: Proceedings of the AAAI conference on artificial intelligence, vol 32, no 1

  23. Hasan S, Curry E (2017) Word re-embedding via manifold dimensionality retention. In: Proceedings of the 2017 conference on empirical methods in natural language processing

  24. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  25. Chu Y, Lin H, Yang L et al (2019) Refining word representations by manifold learning. In: IJCAI, pp 5394–5400

  26. Zhang Z, Wang J (2007) MLLE: modified locally linear embedding using multiple weights. In: Advances in neural information processing systems, pp 1593–1600

  27. Zhao W, Zhou D, Li L et al (2020) Manifold learning-based word representation refinement incorporating global and local information. In: Proceedings of the 28th international conference on computational linguistics, pp 3401–3412

  28. Peters ME, Neumann M, Iyyer M et al (2018) Deep contextualized word representations. arXiv:1802.05365

  29. Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modelling. In: Interspeech, pp 601–608

  30. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. arXiv:170603762

  31. Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: Proceedings of the 2013 IEEE international conference on acoustics, speech and signal processing, pp 6645–6649

  32. Devlin J, Chang MW, Lee K et al (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  33. Collell G, Zhang T, Moens MF (2017) Imagined visual representations as multimodal embeddings. In: Proceedings of the AAAI conference on artificial intelligence, vol 31, no 1

  34. Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  35. Zhang Z, Zha H (2003) Nonlinear dimension reduction via local tangent space alignment. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, pp 477–481

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China (no. 2018YFC0830603).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuanyuan Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Sun, Y., Chu, Y. et al. Global-locality preserving projection for word embedding. Int. J. Mach. Learn. & Cyber. 13, 2943–2956 (2022). https://doi.org/10.1007/s13042-022-01574-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01574-y

Keywords

Navigation