Abstract
Nowadays, billions of people use social networks such as Twitter. Twitter users create and use hashtags in their tweets to classify them corresponding to topic or theme. Hashtags have been progressed into a multifaceted instrument to tag and track content, emphasise a standpoint or galvanise communal support across published posts on social networks. Although, by dint of the free hashtag creation strategy, users are having a broad toughness to choose suitable hashtags for their posts. In this paper, we introduce an approach for hashtag recommendation in Twitter based on tweets embeddings. We first make use of multiple techniques to calculate embeddings of the tweets in the corpus. Next, we use the k-means clustering algorithm in order to divide the heterogeneous tweets into clusters of similar tweets. Afterwards, we compute the similarity between the entered tweet embeddings and the centroids embeddings of each obtained cluster to recommend the most appropriate hashtags to the user. Through miscellaneous experiments, we introduce an itemized study on how the techniques used for tweet embeddings influence on the final set of the recommended hashtags.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Twitter Company. https://about.twitter.com/fr/company.html. Accessed 29 Jan 2018
Arora, S., Liang, Y., Ma, T.: A simple but tough-to-beat baseline for sentence embeddings. In: The 5th International Conference on Learning Representations (2017)
Iyyer, M., Manjunatha, V., Boyd-Graber, J., Daume III, H.: Deep unordered composition rivals syntactic methods for text classification. The Association for Computational Linguistics (2015)
Wieting, J., Bansal, M., Gimpel, K., Livescu, K.: Towards universal paraphrastic sentence embeddings. In: International Conference on Learning Representations (2016)
Wang, Y., Huang, H., Feng, C., Zhou, Q., Gu, J., Gao, X.: Conceptual sentence embeddings based on attention model. In: The 54th Annual Meeting of the Association for Computational Linguistics (2016)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 993–1022 (2003)
She, J., Chen, L.: TOMOHA: TOpic model-based HAshtag recommendation on Twitter. In: Proceedings of the 23rd International Conference on World Wide Web (2014)
Ding, Z., Zhang, Q., Huang, X.: Automatic hashtag recommendation for microblogs using topic-specific translation model. In: Proceedings of COLING (2012)
Chen, J.D., Kao, H.Y.: LDA based semi-supervised learning from streaming short text. In: 2015 IEEE International Conference on Data Science and Advanced Analytics (2015)
Sedhai, S., Sun, A.: Hashtag recommendation for hyperlinked tweets. In: SIGIR Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (2014)
Zangerle, E., Gassler, W., Specht, G.: On the impact of text similarity functions on hashtag recommendations in microblogging environments. Soc. Netw. Anal. Min. (2011). https://doi.org/10.1007/s13278-013-0108-x
Jeon, M., Jun, S., Hwang, E.: Hashtag recommendation based on user tweet and hashtag classification on Twitter. In: Chen, Y., et al. (eds.) WAIM 2014. LNCS, vol. 8597, pp. 325–336. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11538-2_30
Gong, Y., Zhang, Q.: Hashtag recommendation using attention-based convolutional neural network. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) (2016)
Weston, J., Chopra, S., Adams, K.: #TAGSPACE: semantic embeddings from hashtags. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1822–1827 (2014)
Ben Lhachemi, N., Nfaoui, E.H.: An extended spreading activation technique for hashtag recommendation in microblogging platforms. In: The 7th International Conference on Web Intelligence, Mining and Semantics (2017)
Kalloubi, F., Nfaoui, E.H., El Beqqali, O.: Harnessing semantic features for large scale content based hashtag recommendations on microblogging platforms. Int. J. Semant. Web Inf. Syst. 13(1), 6381 (2017)
Kenter, T., Borisov, A., de Rijke, M.: Siamese CBOW: optimizing word embeddings for sentence representations. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
https://code.google.com/archive/p/word2vec/. Accessed 3 Dec 2017
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: 2014 Proceedings of the 31st International Conference on Machine Learning, Beijing, China, vol. 32. JMLR: W&CP (2014)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability (1967)
https://code.google.com/archive/p/word2vec/. Accessed 16 Nov 2017
Illinois Wiki. https://wiki.cites.illinois.edu/wiki/display/forward/Dataset-UDITwitterCrawl-Aug2012. Accessed 21 Oct 2017
http://scikit-learn.org/stable/. Accessed 25 Nov 2017
https://radimrehurek.com/gensim/Accessed 24 Nov 2017
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ben-Lhachemi, N., Nfaoui, E.H. (2018). Hashtag Recommendation Using Word Sequences’ Embeddings. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-96292-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96291-7
Online ISBN: 978-3-319-96292-4
eBook Packages: Computer ScienceComputer Science (R0)