research-article

Getting Deep Recommenders Fit: Bloom Embeddings for Sparse Binary Input/Output Networks

Authors:
Joan Serrà

Telefónica Research, Barcelona, Spain

Telefónica Research, Barcelona, Spain
View Profile

,
Alexandros Karatzoglou

Telefónica Research, Barcelona, Spain

Telefónica Research, Barcelona, Spain
View Profile

RecSys '17: Proceedings of the Eleventh ACM Conference on Recommender SystemsAugust 2017Pages 279–287https://doi.org/10.1145/3109859.3109876

Published:27 August 2017Publication History

RecSys '17: Proceedings of the Eleventh ACM Conference on Recommender Systems

Pages 279–287

ABSTRACT

Recommendation algorithms that incorporate techniques from deep learning are becoming increasingly popular. Due to the structure of the data coming from recommendation domains (i.e., one-hot-encoded vectors of item preferences), these algorithms tend to have large input and output dimensionalities that dominate their overall size. This makes them difficult to train, due to the limited memory of graphical processing units, and difficult to deploy on mobile devices with limited hardware. To address these difficulties, we propose Bloom embeddings, a compression technique that can be applied to the input and output of neural network models dealing with sparse high-dimensional binary-coded instances. Bloom embeddings are computationally efficient, and do not seriously compromise the accuracy of the model up to 1/5 compression ratios. In some cases, they even improve over the original accuracy, with relative increases up to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4 alternative methods, obtaining favorable results. We also discuss a number of further advantages of Bloom embeddings, such as 'on-the-fly' constant-time operation, zero or marginal space requirements, training time speedups, or the fact that they do not require any change to the core model architecture or training configuration.

References

Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid. 2015. Label-embedding for image classification. IEEE Trans. on Pattern Analysis and Machine Intelligence 38, 7 (2015), 1425--1438.Google ScholarCross Ref
Y. Amit, M. Fink, N. Srebro, and S. Ullman. 2007. Uncovering shared structures in multiclass classification. In Proc. of the Int. Conf. on Machine Learning (ICML). 17--24. Google ScholarDigital Library
G. Armano, C. Chira, and N. Hatami. 2012. Error-correcting output codes for multi-label text categorization. In Proc. of the Italian Information Retrieval Conf. (IIR). 26--37.Google Scholar
S. Bengio, J. Weston, and D. Grangier. 2010. Label embedding trees for large multi-class tasks. In Advances in Neural Information Processing Systems (NIPS). Vol. 23. 163--171. Google ScholarDigital Library
Y. Bengio, R. Ducharme, and P. Vincent. 2000. A neural probabilistic language model. In Advances in Neural Information Processing Systems (NIPS), T. K. Leen, T. G. Dietterich, and V. Tresp (Eds.). Vol. 13. 932--938. Google ScholarDigital Library
T. Bertin-Mahieux, D. P. W. Ellis, B. Whitman, and P. Lamere. 2011. The million song dataset. In Proc. of the Int. Soc. for Music Information Retrieval Conf. (ISMIR). 591--596.Google Scholar
B. H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (1970), 422--426. Google ScholarDigital Library
J. Blustein and A. El-Maazawi. 2002. Bloom filters - a tutorial, analysis, and survey. Technical Report. Faculty of Computer Science, Dalhousie University, Halifax, Canada. https://www.cs.dal.ca/research/techreports/cs-2002-10Google Scholar
F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese. 2006. An improved construction for counting Bloom filters. In European Symposium on Algorithms (ESA), Y. Azar and T. Erlebach (Eds.). Lecture Notes in Computer Science, Vol. 4168. Springer-Verlag, Berlin, Germany, 684--695. Google ScholarDigital Library
A. Cardoso-Cachopo. 2007. Improving methods for single-label text categorization. Ph.D. Dissertation. Instituto Superior Tecnico, Universidade Tecnica de Lisboa.Google Scholar
W. Chen, J. Wilson, S. Tyree, K. Weinberger, and Y. Chen. 2015. Compressing neural networks with the hashing trick. In Proc. of the Int. Conf. on Machine Learning (ICML). 2285--2294. Google ScholarDigital Library
H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir, R. Anil, Z. Haque, L. Hong, V. Jain, X. Liu, and H. Shah. 2016. Wide & deep learning for recommender systems. In Proc. of the Workshop on Deep Learning for Recommender Systems (DLRS). 7--10. Google ScholarDigital Library
K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio. 2014. On the properties of neural machine translation: encoder-decoder approaches. In Proc. of the Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST). 103--111.Google Scholar
F. Chollet. 2016. Information-theoretic label embeddings for large-scale image classification. ArXiv: 1607.05691. (2016).Google Scholar
M. Cissé, N. Usunier, T. Artières, and P. Gallinari. 2013. Robust Bloom filters for large multilabel classification tasks. In Advances in Neural Information Processing Systems (NIPS). 1851--1859. Google ScholarDigital Library
M. Courbariaux, Y. Bengio, and J.-P. David. 2015. BinaryConnect: training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems (NIPS), C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). 3123--3131. Google ScholarDigital Library
T. G. Dietterich and G. Bakiri. 1995. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2 (1995), 263--286. Google ScholarCross Ref
P. C. Dillinger and P. Manolios. 2004. Bloom filters in probabilistic verification. In Proc. of the Int. Conf. on Formal Methods in Computer-Aided Design (FMCAD). 367--381.Google Scholar
J. Duchi, E. Hazan, and Y. Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12 (2011), 2121--2159. Google ScholarDigital Library
K. Ganchev and M. Dredze. 2008. Small statistical models by random feature mixing. In ACL Workshop on Mobile Language Processing (MLP). 19--20.Google Scholar
X. Glorot, A. Bordes, and Y. Bengio. 2011. Deep sparse rectifier neural networks. In Proc. of the Int. Conf. on Artificial Intelligence and Statistics (AISTATS). 315--323.Google Scholar
E. Grave, A. Joulin, M. Cissé, D. Grangier, and H. Jégou. 2016. Efficient softmax approximation for GPUs. ArXiv: 1609.04309. (2016).Google Scholar
A. Graves. 2013. Generating sequences with recurrent neural networks. ArXiv: 1308.0850. (2013).Google Scholar
S. Han, H. Mao, and W. J. Dally. Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proc. of the Int. Conf. on Learning Representations (ICLR). arXiv:1510.00149Google Scholar
F. M. Harper and J. K. Konstan. 2015. The MovieLens datasets: history and context. ACM Trans. on Interactive Intelligent Systems 5, 4 (2015), 19. Google ScholarDigital Library
B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk. Session-based recommendations with recurrent neural networks. In Proc. of the Int. Conf. on Learning Representations (ICLR). arXiv:1511.06939Google Scholar
S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory networks. Neural Computation 9, 8 (1997), 1735--1780. Google ScholarDigital Library
H. Hotelling. 1936. Relations between two sets of variates. Biometrika 28, 3-4 (1936), 321--377.Google ScholarCross Ref
D. Hsu, S. M. Kakade, and T. Zhang. 2012. A spectral algorithm for learning hidden Markov models. J. Comput. System Sci. 78, 5 (2012), 1460--1470. Google ScholarDigital Library
D. J. Hsu, S. M. Kakade, J. Langford, and T. Zhang. 2009. Multi-label prediction via compressed sensing. In Advances in Neural Information Processing Systems (NIPS), Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta (Eds.). Vol. 22. 772--780. Google ScholarDigital Library
Y.-D. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shin. Compression of deep convolutional neural networks for fast and low power mobile applications. In Proc. of the Int. Conf. on Learning Representations (ICLR). arXiv:1511.06530Google Scholar
D. P. Kingma and J. L. Ba. 2015. Adam: a method for stochastic optimization. In Proc. of the Int. Conf. on Learning Representations (ICLR). arXiv:1412.6980 https://arxiv.org/abs/1412.6980Google Scholar
J. Langford, L. Li, and A. Strehl. 2007. Vowpal wabbit online learning project. Technical Report. http://hunch.net/?p=309Google Scholar
C. D. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to information retrieval. Cambridge University Press, Cambridge, UK. Google ScholarDigital Library
J. McAuley, R. Pandey, and J. Leskovec. 2015. Inferring networks of substitutable and complementary products. In Proc. of the ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD). 785--794. Google ScholarDigital Library
T. Mikolov. 2012. Statistical language models based on neural networks. Ph.D. Dissertation. Brno University of Technology.Google Scholar
T. Mikolov, K. Chen, G. S. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv: 1301.3781. (2013).Google Scholar
M. Mitzenmacher and E. Upfal. 2005. Probability and computing: randomized algorithms and probabilistic analysis. Cambridge University Press, Cambridge, UK. Google ScholarDigital Library
F. Morin and Y. Bengio. 2005. Hierarchical probabilistic neural network language model. In Proc. of the Int. Workshop on Artificial Intelligence and Statistics (AISTATS). 246--252.Google Scholar
S. Rendle. 2010. Factorization machines. In Proc. of the IEEE Int. Conf. on Data Mining (ICDM). 995--1000. Google ScholarDigital Library
Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, and S. V. N. Vishwanathan. 2009. Hash kernels for structured data. Journal of Machine Learning Research 10 (2009), 2615--2637. Google ScholarDigital Library
F. Strub, R. Gaudel, and J. Mary. 2016. Hybrid recommender system based on autoencoders. In Proc. of theWorkshop on Deep Learning for Recommender Systems (DLRS). 11--16. Google ScholarDigital Library
T. Tieleman and G. Hinton. 2012. Lecture 6.5-RMSprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4, 2. (2012).Google Scholar
J. Turian, L. Ratinov, and Y. Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL). 384--394. Google ScholarDigital Library
P. Vincent, A. Brébisson, and X. Bouthilier. 2015. Efficient exact gradient update for training deep networks with very large sparse targets. In Advances in Neural Information Processing Systems (NIPS). 1108--1116. Google ScholarDigital Library
K.Weinberger, A. Dasgupta, J. Attenberg, J. Langford, and A. Smola. 2009. Feature hashing for large scale multitask learning. In Proc. of the Int. Conf. on Machine Learning (ICML). 1113--1120. Google ScholarDigital Library
J.Weston, S. Bengio, and N. Usunier. 2010. Large scale image annotation: learning to rank with joint word-image embeddings. Machine Learning 81, 1 (2010), 21--35. Google ScholarDigital Library
J. Weston, O. Chapelle, A. Elisseeff, B. Schölkopf, and V. Vapnik. 2002. Kernel dependency estimation. In Advances in Neural Information Processing Systems (NIPS), S. Becker, S. Thrun, and K. Obermayer (Eds.). Vol. 15. 873--880. Google ScholarDigital Library
Y. Wu, C. DuBois, A. X. Zheng, and M. Ester. 2016. Collaborative denoising auto-Encoders for top-N recommender systems. In Proc. of the ACM Int. Conf. on Web Search and Data Mining (WSDM). 153--162. Google ScholarDigital Library
C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. 2005. Improving recommendation lists through topic diversification. In Proc. of the Int.World Wide Web Conf. (WWW). 22--32. Google ScholarDigital Library

Index Terms

Getting Deep Recommenders Fit: Bloom Embeddings for Sparse Binary Input/Output Networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Learning latent representations
      2. Neural networks
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Learning to Embed Categorical Features without Embedding Tables for Recommendation
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Embedding learning of categorical features (e.g. user/item IDs) is at the core of various recommendation models. The standard approach creates an embedding table where each row represents a dedicated embedding vector for every unique feature value. ...
Read More
To Copy, or not to Copy; That is a Critical Issue of the Output Softmax Layer in Neural Sequential Recommenders
WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining

Recent studies suggest that the existing neural models have difficulty handling repeated items in sequential recommendation tasks. However, our understanding of this difficulty is still limited. In this study, we substantially advance this field by ...
Read More
Deep Embeddings for Brand Detection in Product Titles
Analysis of Images, Social Networks and Texts
Abstract
In this paper, we compare various techniques to learn expressive product title embeddings starting from TF-IDF and ending with deep neural architectures. The problem is to recognize brands from noisy retail product names coming from different ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RecSys '17: Proceedings of the Eleventh ACM Conference on Recommender Systems
August 2017
466 pages
ISBN:9781450346528
DOI:10.1145/3109859
General Chairs:
Paolo Cremonesi
Politecnico di Milano, Italy
,
Francesco Ricci
Free University Bozen-Bolzano, Italy
,
Program Chairs:
Shlomo Berkovsky
CSIRO, Australia
,
Alexander Tuzhilin
New York University, USA
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bloom filters
deep recommenders
embeddings
neural network
sparse input/output
Qualifiers
- research-article
Conference

Acceptance Rates
RecSys '17 Paper Acceptance Rate26of125submissions,21%Overall Acceptance Rate254of1,295submissions,20%
More
Upcoming Conference
RecSys '24

Sponsor:

sigchi

18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 508
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Getting Deep Recommenders Fit: Bloom Embeddings for Sparse Binary Input/Output Networks

RecSys '17: Proceedings of the Eleventh ACM Conference on Recommender Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning to Embed Categorical Features without Embedding Tables for Recommendation

To Copy, or not to Copy; That is a Critical Issue of the Output Softmax Layer in Neural Sequential Recommenders

Deep Embeddings for Brand Detection in Product Titles