Abstract
In the real world, multiple parties sometimes have different data of common instances, e.g., a customer of a supermarket can be a patient of a hospital. In other words, datasets are sometimes vertically partitioned into multiple parties. In such a situation, it is natural for those parties to collaborate to obtain more accurate prediction models; however, sharing their raw data should be prohibitive from the point of view of privacy-preservation. Federated learning has recently attracted the attention of machine learning researchers as a framework for efficiently collaborative learning of predictive models among multiple parties with privacy-preservation. In this paper, we propose a lossless vertical federated learning (VFL) method for higher-order factorization machines (HOFMs). HOFMs take into feature combinations efficiently and effectively and have succeeded in many tasks, especially recommender systems, link predictions, and natural language processing. Although it is intuitively difficult to evaluate and learn HOFMs without sharing raw feature vectors, our generalized recursion of ANOVA kernels enables us to do it. We also propose a more efficient and robust VFL method for HOFMs based on anonymization by clustering. Experimental results on three real-world datasets show the effectiveness of the proposed method .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anderson, J.G.: Security of the distributed electronic patient record: a case-based approach to identifying policy issues. Int. J. Med. Inf. 60(2), 111–118 (2000)
Atarashi, K., Oyama, S., Kurihara, M.: Link prediction using higher-order feature combinations across objects. IEICE Trans. Inf. Syst. 103(8), 1833–1842 (2020)
Blondel, M., Fujino, A., Ueda, N., Ishihata, M.: Higher-order factorization machines. In: NeurIPS, pp. 3351–3359 (2016)
Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 188–200. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71703-4_18
Chang, C.C., Lin, C.J.: LIBSVM Data: Classification, Regression, and Multi-label. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Cheng, K., Fan, T., Jin, Y., Liu, Y., Chen, T., Yang, Q.: Secureboost: A lossless federated learning framework. arXiv preprint arXiv:1901.08755 (2019)
Du, W., Han, Y.S., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: SDM, pp. 222–233 (2004)
Dwork, C., Roth, A., et al.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Gascón, A., et al.: Privacy-preserving distributed linear regression on high-dimensional data. PETS 2017(4), 345–364 (2017)
Gentry, C., et al.: Fully homomorphic encryption using ideal lattices. In: STOC, pp. 169–178 (2009)
Li, M., Liu, Z., Smola, A.J., Wang, Y.X.: Difacto: distributed factorization machines. In: WSDM, pp. 377–386 (2016)
Li, Q., Wen, Z., Wu, Z., Hu, S., Wang, N., He, B.: A survey on federated learning systems: vision, hype and reality for data privacy and protection. arXiv preprint arXiv:1907.09693 (2019)
Liu, Y., et al.: A communication efficient vertical federated learning framework. In: NeurIPS Workshop on Federated Learning for Data Privacy and Confidentiality (2019)
Malin, B., Sweeney, L.: How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. J. Biomed. Inf. 37(3), 179–192 (2004)
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS, pp. 1273–1282 (2017)
Mohassel, P., Zhang, Y.: Secureml: a system for scalable privacy-preserving machine learning. In: SP, pp. 19–38 (2017)
Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving ridge regression on hundreds of millions of records. In: SP, pp. 334–348 (2013)
Petroni, F., Corro, L.D., Gemulla, R.: Core: context-aware open relation extraction with factorization machines. In: EMNLP, pp. 1763–1773 (2015)
Rendle, S.: Factorization machines. In: ICDM, pp. 995–1000 (2010)
Sculley, D.: Web-scale k-means clustering. In: WWW, pp. 1177–1178 (2010)
Shawe-Taylor, J., Cristianini, N., et al.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: CCS, pp. 1310–1321 (2015)
Slavkovic, A.B., Nardi, Y., Tibbits, M.M.: Secure logistic regression of horizontally and vertically partitioned distributed databases. In: ICDM Workshops, pp. 723–728 (2007)
Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10(2), 1–19 (2019)
Yu, H., Jiang, X., Vaidya, J.: Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: SIGAPP SAC, pp. 603–610 (2006)
Yu, H., Vaidya, J., Jiang, X.: Privacy-preserving SVM classification on vertically partitioned data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 647–656. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_74
Acknowledgements
K.A. was supported by JSPS KAKENHI Grant Number JP20J13620 and by the Global Station for Big Data and Cybersecurity, a project of the Global Institution for Collaborative Research and Education at Hokkaido University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Atarashi, K., Ishihata, M. (2021). Vertical Federated Learning for Higher-Order Factorization Machines. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12713. Springer, Cham. https://doi.org/10.1007/978-3-030-75765-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-75765-6_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75764-9
Online ISBN: 978-3-030-75765-6
eBook Packages: Computer ScienceComputer Science (R0)