research-article

Public Access

Optimizing Word2Vec Performance on Multicore Systems

Authors:
Vasudevan Rengasamy

The Pennsylvania State University, Department of Computer Science and Engineering, University Park, Pennsylvania

The Pennsylvania State University, Department of Computer Science and Engineering, University Park, Pennsylvania
View Profile

,
Tao-Yang Fu

The Pennsylvania State University, Department of Computer Science and Engineering, University Park, Pennsylvania

The Pennsylvania State University, Department of Computer Science and Engineering, University Park, Pennsylvania
View Profile

,
Wang-Chien Lee

The Pennsylvania State University, Department of Computer Science and Engineering, University Park, Pennsylvania

The Pennsylvania State University, Department of Computer Science and Engineering, University Park, Pennsylvania
View Profile

,
Kamesh Madduri

The Pennsylvania State University, Department of Computer Science and Engineering, University Park, Pennsylvania

The Pennsylvania State University, Department of Computer Science and Engineering, University Park, Pennsylvania
View Profile

IA3'17: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and AlgorithmsNovember 2017Article No.: 3Pages 1–9https://doi.org/10.1145/3149704.3149768

Published:12 November 2017Publication History

IA3'17: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms

Pages 1–9

ABSTRACT

The Skip-gram with negative sampling (SGNS) method of Word2Vec is an unsupervised approach to map words in a text corpus to low dimensional real vectors. The learned vectors capture semantic relationships between co-occurring words and can be used as inputs to many natural language processing and machine learning tasks. There are several high-performance implementations of the Word2Vec SGNS method. In this paper, we introduce a new optimization called context combining to further boost SGNS performance on multicore systems. For processing the One Billion Word benchmark dataset on a 16-core platform, we show that our approach is 3.53x faster than the original multithreaded Word2Vec implementation and 1.28x faster than a recent parallel Word2Vec implementation. We also show that our accuracy on benchmark queries is comparable to state-of-the-art implementations.

References

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/ Software available from tensorflow.org.Google Scholar
Google Code Archive. 2013. word2vec: Tool for computing continuous distributed representations of words. (2013). https://code.google.com/archive/p/word2vec/.Google Scholar
Ehsaneddin Asgari and Mohammad RK Mofrad. 2015. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS ONE 10, 11 (2015), e0141287.Google ScholarCross Ref
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research 3, Feb (2003), 1137--1155.Google Scholar
Elia Bruni, Gemma Boleda, Marco Baroni, and Nam-Khanh Tran. 2012. Distributional Semantics in Technicolor. In Proc. Annual Meeting of the Association for Computational Linguistics (ACL).Google Scholar
John Canny, Huasha Zhao, Bobby Jaros, Ye Chen, and Jiangchang Mao. 2015. Machine Learning at the Limit. In Proc. Int'l. Conf. on Big Data (Big Data). Google ScholarDigital Library
Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. 2013. One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling. Technical Report. Google. http://arxiv.org/abs/1312.3005.Google Scholar
Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational linguistics 16, 1 (1990), 22--29.Google Scholar
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proc. Int'l. Conf. on Machine Learning (ICML). Google ScholarDigital Library
Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing Search in Context: The Concept Revisited. In Proc. Int'l. Conf. on World Wide Web (WWW). Google ScholarDigital Library
Felix Hill, Roi Reichart, and Anna Korhonen. 2015. SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation. Computational Linguistics 41, 4 (2015), 665--695. Google ScholarDigital Library
Shihao Ji, Nadathur Satish, Sheng Li, and Pradeep Dubey. 2016. Parallelizing Word2Vec in Multi-Core and Many-Core Architectures. In Proc. Int'l. Workshop on Efficient Methods for Deep Neural Networks (EMDNN).Google Scholar
Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Proc. Conf. on Neural Information Processing Systems (NIPS).Google Scholar
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From Word Embeddings To Document Distances. In Proc. Int'l. Conf. on Machine Learning (ICML).Google Scholar
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural Architectures for Named Entity Recognition. In Proc. Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). Google ScholarCross Ref
Omer Levy, Yoav Goldberg, and Ido Dagan. 2015. Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics 3 (2015), 211--225.Google ScholarCross Ref
Thang Luong, Richard Socher, and Christopher D Manning. 2013. Better word representations with recursive neural networks for morphology. In Proc. Conf. on Computational Natural Language Learning (CoNLL).Google Scholar
Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2016. MLlib: Machine Learning in Apache Spark. Journal of Machine Learning Research 17, 34 (2016), 1--7.Google ScholarDigital Library
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In Proc. Int'l. Conf. on Learning Representations (ICLR) Workshop.Google Scholar
Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities in continuous space word representations. In Proc. Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT).Google Scholar
Feng Niu, Benjamin Recht, Christopher Ré, and Stephen J Wright. 2011. HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In Proc. Conf. on Neural Information Processing Systems (NIPS).Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP). Google ScholarCross Ref
Kira Radinsky, Eugene Agichtein, Evgeniy Gabrilovich, and Shaul Markovitch. 2011. A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis. In Proc. Int'l. Conf. on World Wide Web (WWW). Google ScholarDigital Library
Stergios Stergiou, Zygimantas Straznickas, Rolina Wu, and Kostas Tsioutsiouliklis. 2017. Distributed Negative Sampling for Word Embeddings. In Proc. AAAI Conf. on Artificial Intelligence (AAAI).Google Scholar
Deeplearning4j Development Team. 2017. Deeplearning4j: Open-source distributed deep learning for the JVM. (2017). Apache Software Foundation License 2.0, http://deeplearning4j.org.Google Scholar
John Towns, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor Hazlewood, Scott Lathrop, Dave Lifka, Gregory D. Peterson, Ralph Roskies, J. Ray Scot, and Nancy Wilkins-Diehr. 2014. XSEDE: Accelerating Scientific Discovery. Computing in Science & Engineering 16, 5 (2014), 62--74. Google ScholarCross Ref
Jeroen BP Vuurens, Carsten Eickhoff, and Arjen P de Vries. 2016. Efficient Parallel Learning of Word2Vec. In Proc. Int'l. Conf. on Machine Learning (ICML) ML Systems Workshop.Google Scholar
Will Y Zou, Richard Socher, Daniel M Cer, and Christopher D Manning. 2013. Bilingual Word Embeddings for Phrase-Based Machine Translation. In Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP).Google Scholar

Index Terms

Optimizing Word2Vec Performance on Multicore Systems
1. Computing methodologies

Recommendations

BlazingText: Scaling and Accelerating Word2Vec using Multiple GPUs
MLHPC'17: Proceedings of the Machine Learning on HPC Environments

Word2Vec is a popular algorithm used for generating dense vector representations of words in large corpora using unsupervised learning. The resulting vectors have been shown to capture semantic relationships between the corresponding words and are used ...
Read More
A study of lexical function detection with word2vec and supervised machine learning
Special Section: Applied Machine Learning and Management of Volatility, Uncertainty, Complexity & Ambiguity (V.U.C.A)

In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function ...
Read More
Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages

Crosslingual word embeddings developed from multiple parallel corpora help in understanding the relationships between languages and improving the prediction quality of machine translation. However, in low resource languages with complex and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

IA3'17: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms
November 2017
78 pages
ISBN:9781450351362
DOI:10.1145/3149704

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SGD
Word2Vec
multicore
word embeddings
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
IA3'17 Paper Acceptance Rate6of22submissions,27%Overall Acceptance Rate18of67submissions,27%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 580
  Total Downloads
- Downloads (Last 12 months)83
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimizing Word2Vec Performance on Multicore Systems

IA3'17: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms

ABSTRACT

References

Cited By

Index Terms

Recommendations

BlazingText: Scaling and Accelerating Word2Vec using Multiple GPUs

A study of lexical function detection with word2vec and supervised machine learning

Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Optimizing Word2Vec Performance on Multicore Systems

IA3'17: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms

ABSTRACT

References

Cited By

Index Terms

Recommendations

BlazingText: Scaling and Accelerating Word2Vec using Multiple GPUs

A study of lexical function detection with word2vec and supervised machine learning

Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media