Abstract
Factorization Machines (FMs) enhance an underlying linear regression or classification model by capturing feature interactions. Intuitively, FMs warp the feature space to help capture the underlying non-linear structure of the machine learning task. In this paper, we propose novel Doubly-Warped Factorization Machines (or \(\mathtt{W2FM}\)s) that leverage multiple complementary space warping strategies to improve the representational ability of FMs. Our approach abstracts the feature interaction in FMs as additional affine transformations (thus warping the space), which can be learned efficiently without introducing large numbers of model parameters. We also explore alternative W2FM based approaches and conduct extensive experiments on real world data sets. These experiments show that \(\mathtt{W2FM}\) achieves better performance in collaborative filtering task not only relative to vanilla FMs, but also against other state-of-the-art competitors, such as Attention FM (AFM), Holographic FM (HFM), and Neural FM (NFM).
This work is supported by NSF (#1610282, #1633381, #1909555, #2026860, #1827757, #1629888), and EUH2020 Marie Sklodowska-Curie grant agreement #690817. Results were obtained using the ChameleonCloud resources supported by the NSF.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that FMs can be generalized to higher degrees of feature interactions. In this paper, without loss of generality, we focus on pairwise FMs, which have been shown to be generally effective and, thus, make up the most commonly used approach for FMs – details can be found in [16].
- 2.
For other machine learning tasks, e.g. classification, log loss may be used.
- 3.
Note that for the very last column of \(V_1\), we have \(a_k = (k-1)\lfloor \frac{m}{k}\rfloor + 1\) to \(b_k = m\).
- 4.
Here we use \({\mathtt{W2FM}}_{TR}\) for clarity, other \({\mathtt{W2FM}}\) variants can be seamlessly integrated.
- 5.
Support Vector Machine (SVM) with linear kernel [10], Factorization Machine (FM, single warping baseline) [16], Attention FM (the attention factor: 256, activation function: ReLU, drop out rate: 0.5, the valid dimension:2 (user id and item id)) [24], Neural FM (drop out rate for bi-interaction layer: 0.5, 1 hidden layer with 64 neuron and drop out rate: 0.8, activation function: ReLU) [11] and Holographic FM [23].
- 6.
Ciao (# of instances: 284K, # of features: 107K, density: 0.0003) [21], Epinions (# of instances: 922K, # of features: 141K, density: 0.0003) [22], MovieLens-100K (# of instances: 100K, # of features: 2273, density: 0.041) [9] and MovieLens-1M (# of instances: 1M, # of features: 9746, density: 0.059) [9].
References
Binois, M., Ginsbourger, D., Roustant, O.: A warped kernel improving robustness in Bayesian optimization via random embeddings. In: International Conference on Learning and Intelligent Optimization (2015)
Blondel, M., Fujino, A., Ueda, N., Ishihata, M.: Higher-order factorization machines. In: NIPS 2016 (2016)
Blondel, M., Ishihata, M., Fujino, A., Ueda, N.: Polynomial networks and factorization machines: New insights and efficient training algorithms. In: PMLR (2016)
Chen, T., Yin, H., Nguyen, Q.V.H., Peng, W., Li, X., Zhou, X.: Sequence-aware factorization machines for temporal predictive analytics. In: ICDE 2020 (2020)
Chen, X., Zheng, Y., Wang, J., Ma, W., Huang, J.: RaFM: Rank-aware factorization machines. In: PMLR (2019)
Cheng, H.T., et al.: Wide & deep learning for recommender systems. In: DLRS, vol. 2016, (2016)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
Grčar, M., Mladenič, D., Fortuna, B., Grobelnik, M.: Data sparsity issues in the collaborative filtering framework. In: Advances in Web Mining and Web Usage Analysis (2006)
Harper, F.M., Konstan, J.A.: The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst. 5(4), 1–19 (Dec 2015)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York (2009)
He, X., Chua, T.S.: Neural factorization machines for sparse predictive analytics. In: SIGIR 2017 (2017)
Juan, Y., Zhuang, Y., Chin, W.S., Lin, C.J.: Field-aware factorization machines for CTR prediction. In: RecSys 2016 (2016)
Koren, Y.: Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: KDD 2008 (2008)
Livni, R., Shalev-Shwartz, S., Shamir, O.: On the computational efficiency of training neural networks. In: NIPS (2014)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press (2005)
Rendle, S.: Factorization machines. In: ICDM 2010, IEEE Computer Society (2010)
Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: NIPS 2007 (2007)
Seal, H.L.: Studies in the history of probability and statistics. xv: The historical development of the gauss linear model. Biometrika 54(1–2), 1–24 (1967)
Shan, Y., Hoens, T.R., Jiao, J., Wang, H., Yu, D., Mao, J.: Deep crossing: web-scale modeling without manually crafted combinatorial features. In: KDD 2016 (2016)
Snoek, J., Swersky, K., Zemel, R., Adams, R.P.: Input warping for Bayesian optimization of non-stationary functions. In: ICML 2014 (2014)
Tang, J., Gao, H., Liu, H., Sarma, A.D.: eTrust: Understanding trust evolution in an online world. In: KDD (2012)
Tang, J., Hu, X., Gao, H., Liu, H.: Exploiting local and global social context for recommendation. In: IJCAI (2013)
Tay, Y., Zhang, S., Luu, A.T., Hui, S.C., Yao, L., Vinh, T.D.Q.: Holographic factorization machines for recommendation. In: AAAI (2019)
Xiao, J., Ye, H., He, X., Zhang, H., Wu, F., Chua, T.S.: Attentional factorization machines: Learning the weight of feature interactions via attention networks. In: IJCAI 2017 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, ML., Candan, K.S. (2021). W2FM: The Doubly-Warped Factorization Machine. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12713. Springer, Cham. https://doi.org/10.1007/978-3-030-75765-6_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-75765-6_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75764-9
Online ISBN: 978-3-030-75765-6
eBook Packages: Computer ScienceComputer Science (R0)