Utility of Neural Embeddings in Semantic Similarity of Text Data

Hendre, Manik; Mukherjee, Prasenjit; Godse, Manish

doi:10.1007/978-981-15-5788-0_21

Manik Hendre¹⁸,
Prasenjit Mukherjee¹⁸ &
Manish Godse¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1176))

823 Accesses
4 Citations

Abstract

Semantic similarity plays an important role in understanding the context of text data. In this paper, semantic similarity between large text data is computed using different neural embeddings. we review the utility of different deep neural embeddings for text data representation. Most of the earlier papers have studied the semantic similarity of text by using individual word embeddings. In this paper, we have evaluated the neural embedding techniques on large text data with the help of Essay Dataset. We have used recent neural embedding methods such as Google Sentence Encoder, ELMo, and GloVe along with traditional similarity metrics including TF-IDF and Jaccard Index for experimental investigation. Experimental evaluation in this research paper shows that Google Sentence Encoder and ELMo embeddings perform best on semantic similarity task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cambria, E., White, B.: Jumping NLPp curves: a review of natural language processing research. IEEE Comput. Intell. Mag. 9(2), 48–57 (2014)
Article Google Scholar
Cer, D., Yang, Y., Kong, S.Y., Hua, N., Limtiaco, N., John, R.S., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., et al.: Universal Sentence Encoder. arXiv preprint arXiv:1803.11175 (2018)
Clark, E., Celikyilmaz, A., Smith, N.A.: Sentence movers similarity: automatic evaluation for multi-sentence texts. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2748–2760 (2019)
Google Scholar
Khurana, D., Koli, A., Khatter, K., Singh, S.: Natural Language Processing: State of the Art, Current Trends and Challenges. arXiv preprint arXiv:1708.05148 (2017)
Melamud, O., Goldberger, J., Dagan, I.: context2vec: Learning generic context embedding with bidirectional LSTM. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61. Association for Computational Linguistics, Berlin, Germany (2016). https://doi.org/10.18653/v1/K16-1006
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Pawar, A., Mago, V.: Challenging the boundaries of unsupervised learning for semantic similarity. IEEE Access 7, 16291–16308 (2019)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
Google Scholar
Prize, A.S.A.: The Hewlett Foundation: Automated Essay Scoring (2012). https://www.kaggle.com/c/asap-aes/
Tashu, T.M., Horváth, T.: Pair-wise: automatic essay evaluation using word mover’s distance. CSEDU 1, 59–66 (2018)
Google Scholar
Wang, B., Wang, A., Chen, F., Wang, Y., Kuo, C.C.J.: Evaluating Word Embedding Models: Methods and Experimental Results. arXiv preprint arXiv:1901.09785 (2019)
Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. IEEE Comput Intell. Mag. 13(3), 55–75 (2018)
Article Google Scholar
Zhu, G., Iglesias, C.A.: Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29(1), 72–85 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Analytics and IT, Pune Institute of Business Management, Pune, Maharashtra, India
Manik Hendre, Prasenjit Mukherjee & Manish Godse

Authors

Manik Hendre
View author publications
You can also search for this author in PubMed Google Scholar
Prasenjit Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Manish Godse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manik Hendre .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges (SRMGPC), Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien, Taiwan
Sheng-Lung Peng
School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Department of Informatics, University of Leicester, Leicester, UK
Yu-Dong Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hendre, M., Mukherjee, P., Godse, M. (2021). Utility of Neural Embeddings in Semantic Similarity of Text Data. In: Bhateja, V., Peng, SL., Satapathy, S.C., Zhang, YD. (eds) Evolution in Computational Intelligence. Advances in Intelligent Systems and Computing, vol 1176. Springer, Singapore. https://doi.org/10.1007/978-981-15-5788-0_21

Download citation

DOI: https://doi.org/10.1007/978-981-15-5788-0_21
Published: 09 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5787-3
Online ISBN: 978-981-15-5788-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics