research-article

FEAST: A Communication-efficient Federated Feature Selection Framework for Relational Data

Authors:

Meihui ZhangAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 1

Article No.: 107, Pages 1 - 28

https://doi.org/10.1145/3588961

Published: 30 May 2023 Publication History

Abstract

Vertical federated learning (VFL) is an emerging paradigm for cross-silo organizations to build more accurate machine learning (ML) models. In this setting, multiple organizations (i.e., parties) hold the same set of samples with different features. However, different parties may have redundant or highly correlated features, leading to inefficient and ineffective VFL model training. Effective feature selection in VFL is therefore essential to mitigate such a problem and improve model effectiveness, as well as computation and communication efficiency. To this end, in this paper, we propose a federated feature selection framework, called FEAST, which leverages conditional mutual information (CMI) to select more informative features while having low redundancy. Furthermore, we design a communication-efficient method to reduce the information exchanged among the parties while protecting the parties' raw data. Extensive experiments on four real-world datasets demonstrate that the proposed framework achieves state-of-the-art performance in terms of accuracy, communication and computation costs.

Supplemental Material

MP4 File

Presentation video - short version

Download
22.35 MB

References

[1]

Naoual El Aboudi and Laila Benhlima. 2016. Review on wrapper feature selection approaches. 2016 International Conference on Engineering & MIS (ICEMIS) (2016), 1--5.

[2]

Javier Apolloni, Guillermo Leguizamó n, and Enrique Alba. 2016. Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl. Soft Comput., Vol. 38 (2016), 922--932. https://doi.org/10.1016/j.asoc.2015.10.037

Digital Library

[3]

Roberto Battiti. 1994. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Networks, Vol. 5, 4 (1994), 537--550. https://doi.org/10.1109/72.298224

Digital Library

[4]

Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsö der, Philipp M. Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, and Steffen Zeuch. 2021. ExDRa: Exploratory Data Science on Federated Raw Data. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 2450--2463. https://doi.org/10.1145/3448016.3457549

Digital Library

[5]

Mohamed Bennasar, Yulia Hicks, and Rossitza Setchi. 2015. Feature selection using Joint Mutual Information Maximisation. Expert Syst. Appl., Vol. 42, 22 (2015), 8520--8532. https://doi.org/10.1016/j.eswa.2015.07.007

Digital Library

[6]

Akash Bharadwaj and Graham Cormode. 2022. An Introduction to Federated Computation. In SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, Zachary Ives, Angela Bonifati, and Amr El Abbadi (Eds.). ACM, 2448--2451. https://doi.org/10.1145/3514221.3522561

Digital Library

[7]

Kallista A. Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Konevc ný, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019. Towards Federated Learning at Scale: System Design. In Proceedings of Machine Learning and Systems 2019, MLSys 2019, Stanford, CA, USA, March 31 - April 2, 2019, Ameet Talwalkar, Virginia Smith, and Matei Zaharia (Eds.). mlsys.org. https://proceedings.mlsys.org/book/271.pdf

[8]

Leo Breiman. 2001. Random Forests. Mach. Learn., Vol. 45, 1 (2001), 5--32. https://doi.org/10.1023/A:1010933404324

Digital Library

[9]

Jie Cai, Jiawei Luo, Shulin Wang, and Sheng Yang. 2018. Feature selection in machine learning: A new perspective. Neurocomputing, Vol. 300 (2018), 70--79. https://doi.org/10.1016/j.neucom.2017.11.077

[10]

Shaofeng Cai, Kaiping Zheng, Gang Chen, H. V. Jagadish, Beng Chin Ooi, and Meihui Zhang. 2021. ARM-Net: Adaptive Relation Modeling Network for Structured Data. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021. 207--220.

Digital Library

[11]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13--17, 2016, Balaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu C. Aggarwal, Dou Shen, and Rajeev Rastogi (Eds.). ACM, 785--794. https://doi.org/10.1145/2939672.2939785

Digital Library

[12]

Tianyi Chen, Xiao Jin, Yuejiao Sun, and Wotao Yin. 2020. VAFL: a Method of Vertical Asynchronous Federated Learning. CoRR, Vol. abs/2007.06081 (2020). showeprint[arXiv]2007.06081 https://arxiv.org/abs/2007.06081

[13]

Zhijun Chen, Chaozhong Wu, Yishi Zhang, Zhen Huang, Bin Ran, Ming Zhong, and Nengchao Lyu. 2015. Feature selection with redundancy-complementariness dispersion. Knowl. Based Syst., Vol. 89 (2015), 203--217. https://doi.org/10.1016/j.knosys.2015.07.004

Digital Library

[14]

Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, Dimitrios Papadopoulos, and Qiang Yang. 2021. SecureBoost: A Lossless Federated Learning Framework. IEEE Intell. Syst., Vol. 36, 6 (2021), 87--98. https://doi.org/10.1109/MIS.2021.3082561

Digital Library

[15]

Isabel F. Cruz, Roberto Tamassia, and Danfeng Yao. 2007. Privacy-Preserving Schema Matching Using Mutual Information. In Data and Applications Security XXI, 21st Annual IFIP WG 11.3 Working Conference on Data and Applications Security, Redondo Beach, CA, USA, July 8--11, 2007, Proceedings (Lecture Notes in Computer Science, Vol. 4602), Steve Barker and Gail-Joon Ahn (Eds.). Springer, 93--94. https://doi.org/10.1007/978--3--540--73538-0_7

[16]

Jian Dai, Meihui Zhang, Gang Chen, Ju Fan, Kee Yuan Ngiam, and Beng Chin Ooi. 2018. Fine-grained Concept Linking using Neural Networks in Healthcare. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2019, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM, 51--66. https://doi.org/10.1145/3183713.3196907

Digital Library

[17]

Francc ois Fleuret. 2004. Fast Binary Feature Selection with Conditional Mutual Information. J. Mach. Learn. Res., Vol. 5 (2004), 1531--1555. http://jmlr.org/papers/volume5/fleuret04a/fleuret04a.pdf

Digital Library

[18]

Fangcheng Fu, Yingxia Shao, Lele Yu, Jiawei Jiang, Huanran Xue, Yangyu Tao, and Bin Cui. 2021. VF(^2 )Boost: Very Fast Vertical Federated Gradient Boosting for Cross-Enterprise Learning. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 563--576. https://doi.org/10.1145/3448016.3457241

Digital Library

[19]

Fangcheng Fu, Huanran Xue, Yong Cheng, Yangyu Tao, and Bin Cui. 2022. BlindFL: Vertical Federated Machine Learning without Peeking into Your Data. In SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, Zachary Ives, Angela Bonifati, and Amr El Abbadi (Eds.). ACM, 1316--1330. https://doi.org/10.1145/3514221.3526127

Digital Library

[20]

Sainyam Galhotra, Karthikeyan Shanmugam, Prasanna Sattigeri, and Kush R. Varshney. 2022. Causal Feature Selection for Algorithmic Fairness. In SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, Zachary Ives, Angela Bonifati, and Amr El Abbadi (Eds.). ACM, 276--285. https://doi.org/10.1145/3514221.3517909

Digital Library

[21]

Wanfu Gao, Liang Hu, Ping Zhang, and Jialong He. 2018. Feature selection considering the composition of feature relevancy. Pattern Recognit. Lett., Vol. 112 (2018), 70--74. https://doi.org/10.1016/j.patrec.2018.06.005

[22]

Salvador Garc'i a, Juliá n Luengo, José Antonio Sá ez, Victoria Ló pez, and Francisco Herrera. 2013. A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning. IEEE Trans. Knowl. Data Eng., Vol. 25, 4 (2013), 734--750. https://doi.org/10.1109/TKDE.2012.35

Digital Library

[23]

Muon Ha and Yulia A. Shichkina. 2022. Translating a Distributed Relational Database to a Document Database. Data Sci. Eng., Vol. 7, 2 (2022), 136--155. https://doi.org/10.1007/s41019-022-00181--9

[24]

Xiaofei He, Ming Ji, Chiyuan Zhang, and Hujun Bao. 2011. A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 33, 10 (2011), 2013--2025. https://doi.org/10.1109/TPAMI.2011.44

Digital Library

[25]

David W. Hosmer and Stanley Lemeshow. 2000. Applied Logistic Regression, Second Edition. Wiley. https://doi.org/10.1002/0471722146

[26]

Ronald A. Howard. 1966. Information Value Theory. IEEE Trans. Syst. Sci. Cybern., Vol. 2, 1 (1966), 22--26. https://doi.org/10.1109/TSSC.1966.300074

[27]

Yaochen Hu, Di Niu, Jianming Yang, and Shengping Zhou. 2019. FDML: A Collaborative Machine Learning Framework for Distributed Features. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4--8, 2019, Ankur Teredesai, Vipin Kumar, Ying Li, Ró mer Rosales, Evimaria Terzi, and George Karypis (Eds.). ACM, 2232--2240. https://doi.org/10.1145/3292500.3330765

Digital Library

[28]

Samina Khalid, Tehmina Khalil, and Shamila Nasreen. 2014. A survey of feature selection and feature extraction techniques in machine learning. 2014 Science and Information Conference (2014), 372--378.

[29]

John Kieffer. 1994. Elements of Information Theory (Thomas M. Cover and Joy A. Thomas). SIAM Rev., Vol. 36, 3 (1994), 509--511. https://doi.org/10.1137/1036124

[30]

David D. Lewis. 1992. Feature Selection and Feature Extraction for Text Categorization. In Proceedings of the Workshop on Speech and Natural Language. Association for Computational Linguistics, 212??17.

Digital Library

[31]

Xiling Li, Rafael Dowsley, and Martine De Cock. 2021b. Privacy-Preserving Feature Selection with Secure Multiparty Computation. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 6326--6336. http://proceedings.mlr.press/v139/li21e.html

[32]

Zitao Li, Bolin Ding, Ce Zhang, Ninghui Li, and Jingren Zhou. 2021a. Federated Matrix Factorization with Privacy Guarantee. Proc. VLDB Endow., Vol. 15, 4 (2021), 900--913. https://doi.org/10.14778/3503585.3503598

Digital Library

[33]

Dahua Lin and Xiaoou Tang. 2006. Conditional Infomax Learning: An Integrated Framework for Feature Extraction and Fusion. In Computer Vision - ECCV 2006, 9th European Conference on Computer Vision, Graz, Austria, May 7--13, 2006, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 3951), Ales Leonardis, Horst Bischof, and Axel Pinz (Eds.). Springer, 68--82. https://doi.org/10.1007/11744023_6

Digital Library

[34]

Junxu Liu, Jian Lou, Li Xiong, Jinfei Liu, and Xiaofeng Meng. 2021b. Projected Federated Averaging with Heterogeneous Differential Privacy. Proc. VLDB Endow., Vol. 15, 4 (2021), 828--840. https://doi.org/10.14778/3503585.3503592

Digital Library

[35]

Yang Liu, Tao Fan, Tianjian Chen, Qian Xu, and Qiang Yang. 2021a. FATE: An Industrial Grade Platform for Collaborative Learning With Data Protection. J. Mach. Learn. Res., Vol. 22 (2021), 226:1--226:6. http://jmlr.org/papers/v22/20--815.html

[36]

Yang Liu, Yingting Liu, Zhijie Liu, Yuxuan Liang, Chuishi Meng, Junbo Zhang, and Yu Zheng. 2022. Federated Forest. IEEE Trans. Big Data, Vol. 8, 3 (2022), 843--854. https://doi.org/10.1109/TBDATA.2020.2992755

[37]

Yejia Liu, Weiyuan Wu, Lampros Flokas, Jiannan Wang, and Eugene Wu. 2021c. Enabling SQL-based Training Data Debugging for Federated Learning. Proc. VLDB Endow., Vol. 15, 3 (2021), 388--400. https://doi.org/10.14778/3494124.3494125

Digital Library

[38]

Xinjian Luo, Yuncheng Wu, Xiaokui Xiao, and Beng Chin Ooi. 2021a. Feature Inference Attack on Model Predictions in Vertical Federated Learning. In ICDE. 181--192.

[39]

Zhaojing Luo, Sai Ho Yeung, Meihui Zhang, Kaiping Zheng, Lei Zhu, Gang Chen, Feiyi Fan, Qian Lin, Kee Yuan Ngiam, and Beng Chin Ooi. 2021b. MLCask: Efficient Management of Component Evolution in Collaborative Data Analytics Pipelines. In ICDE. 1655--1666.

[40]

Panagiotis Mandros, Mario Boley, and Jilles Vreeken. 2020. Discovering dependencies with reliable mutual information. Knowl. Inf. Syst., Vol. 62, 11 (2020), 4223--4253. https://doi.org/10.1007/s10115-020-01494--9

Digital Library

[41]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agü era y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20--22 April 2017, Fort Lauderdale, FL, USA (Proceedings of Machine Learning Research, Vol. 54), Aarti Singh and Xiaojin (Jerry) Zhu (Eds.). PMLR, 1273--1282. http://proceedings.mlr.press/v54/mcmahan17a.html

[42]

Ramakrishnan Muthukrishnan and R. Rohini. 2016. LASSO: A feature selection technique in predictive modeling for machine learning. 2016 IEEE International Conference on Advances in Computer Applications (ICACA) (2016), 18--20.

[43]

Bach Hoai Nguyen, Bing Xue, Ivy Liu, and Mengjie Zhang. 2014. Filter based backward elimination in wrapper based PSO for feature selection in classification. In Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2014, Beijing, China, July 6--11, 2014. IEEE, 3111--3118. https://doi.org/10.1109/CEC.2014.6900657

[44]

Milos Nikolic, Haozhe Zhang, Ahmet Kara, and Dan Olteanu. 2020. F-IVM: Learning over Fast-Evolving Relational Data. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020. 2773--2776.

Digital Library

[45]

Shinji Ono, Jun Takata, Masaharu Kataoka, Tomohiro I, Kilho Shin, and Hiroshi Sakamoto. 2022. Privacy-Preserving Feature Selection with Fully Homomorphic Encryption. Algorithms (2022).

[46]

Beng Chin Ooi, Kian-Lee Tan, Sheng Wang, Wei Wang, Qingchao Cai, Gang Chen, Jinyang Gao, Zhaojing Luo, Anthony K. H. Tung, Yuan Wang, Zhongle Xie, Meihui Zhang, and Kaiping Zheng. 2015. SINGA: A Distributed Deep Learning Platform. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM. 685--688.

Digital Library

[47]

Hanchuan Peng, Fuhui Long, and Chris H. Q. Ding. 2005. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 27, 8 (2005), 1226--1238. https://doi.org/10.1109/TPAMI.2005.159

Digital Library

[48]

Fré dé ric Pennerath, Panagiotis Mandros, and Jilles Vreeken. 2020. Discovering Approximate Functional Dependencies using Smoothed Mutual Information. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23--27, 2020, Rajesh Gupta, Yan Liu, Jiliang Tang, and B. Aditya Prakash (Eds.). ACM, 1254--1264. https://doi.org/10.1145/3394486.3403178

Digital Library

[49]

Simone Romano, Xuan Vinh Nguyen, James Bailey, and Karin Verspoor. 2016. A Framework to Adjust Dependency Measure Estimates for Chance. In Proceedings of the 2016 SIAM International Conference on Data Mining, Miami, Florida, USA, May 5--7, 2016, Sanjay Chawla Venkatasubramanian and Wagner Meira Jr. (Eds.). SIAM, 423--431. https://doi.org/10.1137/1.9781611974348.48

[50]

Thé o Ryffel, Andrew Trask, Morten Dahl, Bobby Wagner, Jason Mancuso, Daniel Rueckert, and Jonathan Passerat-Palmbach. 2018. A generic framework for privacy preserving deep learning. CoRR, Vol. abs/1811.04017 (2018). showeprint[arXiv]1811.04017 http://arxiv.org/abs/1811.04017

[51]

Ricardo Salazar, Felix Neutatz, and Ziawasch Abedjan. 2021. Automated Feature Engineering for Algorithmic Fairness. Proc. VLDB Endow., Vol. 14, 9 (2021), 1694--1702. https://doi.org/10.14778/3461535.3463474

Digital Library

[52]

Monica Scannapieco, Ilya Figotin, Elisa Bertino, and Ahmed K. Elmagarmid. 2007. Privacy preserving schema and data matching. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, June 12--14, 2007, Chee Yong Chan, Beng Chin Ooi, and Aoying Zhou (Eds.). ACM, 653--664. https://doi.org/10.1145/1247480.1247553

Digital Library

[53]

Patrick Schober, Christa Boer, and Lothar A. Schwarte. 2018. Correlation Coefficients: Appropriate Use and Interpretation. Anesthesia & Analgesia, Vol. 126 (2018), 1763-768.

[54]

Bharat Singh, Nidhi Kushwaha, and Om Prakash Vyas. 2014. A Feature Subset Selection Technique for High Dimensional Data Using Symmetric Uncertainty.

[55]

Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., Vol. 15, 1 (2014), 1929--1958. https://doi.org/10.5555/2627435.2670313

Digital Library

[56]

Ingo Steinwart and Andreas Christmann. 2008. Support Vector Machines. Springer.

[57]

Chih-Fong Tsai and Yu-Chi Chen. 2019. The optimal combination of feature selection and data discretization: An empirical study. Inf. Sci., Vol. 505 (2019), 282--293. https://doi.org/10.1016/j.ins.2019.07.091

Digital Library

[58]

Sonal Tuteja and Rajeev Kumar. 2022. A Unification of Heterogeneous Data Sources into a Graph Model in E-commerce. Data Sci. Eng., Vol. 7, 1 (2022), 57--70. https://doi.org/10.1007/s41019-021-00174-0

[59]

Wei Wang, Jinyang Gao, Meihui Zhang, Sheng Wang, Gang Chen, Teck Khim Ng, Beng Chin Ooi, Jie Shao, and Moaz Reyad. 2018. Rafiki: Machine Learning as an Analytics Service System. Proc. VLDB Endow., Vol. 12, 2 (2018), 128--140.

Digital Library

[60]

Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, and Beng Chin Ooi. 2020. Privacy Preserving Vertical Federated Learning for Tree-based Models. Proc. VLDB Endow., Vol. 13, 11 (2020), 2090--2103. http://www.vldb.org/pvldb/vol13/p2090-wu.pdf

Digital Library

[61]

Howard Hua Yang and John E. Moody. 1999. Feature Selection Based on Joint Mutual Information.

Digital Library

[62]

Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated Machine Learning: Concept and Applications. ACM Trans. Intell. Syst. Technol., Vol. 10, 2 (2019), 12:1--12:19. https://doi.org/10.1145/3298981

Digital Library

[63]

Zhenkun Yang, Chuanhui Yang, Fusheng Han, Mingqiang Zhuang, Bing Yang, Zhifeng Yang, Xiaojun Cheng, Yuzhong Zhao, Wenhui Shi, Huafeng Xi, Huang Yu, Bin Liu, Yi Pan, Boxue Yin, Junquan Chen, and Quanqing Xu. 2022. OceanBase: A 707 Million tpmC Distributed Relational Database System. Proceedings of the VLDB Endowment, Vol. 15, 12 (2022), 3385--3397.

Digital Library

[64]

Li Zhang and Xiaobo Chen. 2021. Feature Selection Methods Based on Symmetric Uncertainty Coefficients and Independent Classification Information. IEEE Access, Vol. 9 (2021), 13845--13856. https://doi.org/10.1109/ACCESS.2021.3049815

[65]

Zheng Zhao, Ruiwen Zhang, James Cox, David Duling, and Warren Sarle. 2013. Massively parallel feature selection: an approach based on variance preservation. Mach. Learn., Vol. 92, 1 (2013), 195--220. https://doi.org/10.1007/s10994-013--5373--4

Digital Library

[66]

Kaiping Zheng, Shaofeng Cai, Horng Ruey Chua, Melanie Herschel, Meihui Zhang, and Beng Chin Ooi. 2022. DyHealth: Making Neural Networks Dynamic for Effective Healthcare Analytics. Proc. VLDB Endow., Vol. 15, 12 (2022), 3445--3458.

Digital Library

Cited By

Zhu YWu YLuo ZOoi BXiao X(2024)Secure and Verifiable Data Collaboration with Low-Cost Zero-Knowledge ProofsProceedings of the VLDB Endowment10.14778/3665844.366586017:9(2321-2334)Online publication date: 6-Aug-2024
https://dl.acm.org/doi/10.14778/3665844.3665860
Li JDeng RZang TKong MZhu KSerra ESpezzano F(2024)Efficient and Secure Contribution Estimation in Vertical Federated LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679613(1205-1214)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679613
Chai DWang LYang LZhang JChen KYang Q(2024)A Survey for Federated Learning Evaluations: Goals and MeasuresIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338200236:10(5007-5024)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3382002
Show More Cited By

Index Terms

FEAST: A Communication-efficient Federated Feature Selection Framework for Relational Data
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Cooperation and coordination
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning algorithms
      1. Feature selection
2. Mathematics of computing
  1. Information theory

Recommendations

Vertical federated learning-based feature selection with non-overlapping sample utilization
Abstract
Vertical federated learning (VFL) is a privacy preserving collaborative machine learning technique designed for distributed learning scenarios in which data from different parties have overlap in the sample space. In this paper, a VFL ...
Highlights
- In this paper, we bridge this gap by proposing a novel VFL-based feature selection method—Vertical Federated Learning-based Feature Selection (VFLFS). To the ...
Nearest neighbor estimate of conditional mutual information in feature selection

Mutual information (MI) is used in feature selection to evaluate two key-properties of optimal features, the relevance of a feature to the class variable and the redundancy of similar features. Conditional mutual information (CMI), i.e., MI of the ...
Feature selection by optimizing a lower bound of conditional mutual information

A new relationship between Bayesian error and mutual information.A unified framework for information theory based feature selection.A novel information theory based feature selection method.A new evaluation metric for feature selection precision. A ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 1, Issue 1

PACMMOD

May 2023

2807 pages

EISSN:2836-6573

DOI:10.1145/3603164

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023

Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
430
Total Downloads

Downloads (Last 12 months)165
Downloads (Last 6 weeks)14

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhu YWu YLuo ZOoi BXiao X(2024)Secure and Verifiable Data Collaboration with Low-Cost Zero-Knowledge ProofsProceedings of the VLDB Endowment10.14778/3665844.366586017:9(2321-2334)Online publication date: 6-Aug-2024
https://dl.acm.org/doi/10.14778/3665844.3665860
Li JDeng RZang TKong MZhu KSerra ESpezzano F(2024)Efficient and Secure Contribution Estimation in Vertical Federated LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679613(1205-1214)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679613
Chai DWang LYang LZhang JChen KYang Q(2024)A Survey for Federated Learning Evaluations: Goals and MeasuresIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338200236:10(5007-5024)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3382002
Zhang BChen GOoi BShou MTan KTung AXiao XYip JZhang M(2024)Managing Metaverse Data Tsunami: Actionable InsightsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.335496036:12(7423-7441)Online publication date: Dec-2024
https://doi.org/10.1109/TKDE.2024.3354960
Liu YKang YZou TPu YHe YYe XOuyang YZhang YYang Q(2024)Vertical Federated Learning: Concepts, Advances, and ChallengesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.335262836:7(3615-3634)Online publication date: 26-Jan-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3352628
Zhang MJi ZLuo ZWu YChai C(2024)Applications and Challenges for Large Language Models: From Data Management Perspective2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00441(5530-5541)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00441
Ji ZXie ZWu YZhang M(2024)LBSC: A Cost-Aware Caching Framework for Cloud Databases2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00373(4911-4924)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00373
Yang XZhang MFan JLuo ZYang Y(2024)A Multi-Task Learning Framework for Reading Comprehension of Scientific Tabular Data2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00285(3710-3724)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00285
Wang YLi KLuo YLi GGuo YWang Z(2024)Fast, Robust and Interpretable Participant Contribution Estimation for Federated Learning2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00182(2298-2311)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00182
Guo WZhuang FZhang XTong YDong J(2024)A comprehensive survey of federated transfer learning: challenges, methods and applicationsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-024-40065-x18:6Online publication date: 23-Jul-2024
https://dl.acm.org/doi/10.1007/s11704-024-40065-x
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents