ABSTRACT
Feature selection is crucial in large-scale recommendation system, which can not only reduce the computational cost, but also improve the recommendation efficiency. Most existing works rank the features and then select the top-k ones as the final feature subset. However, they assess feature importance individually and ignore the interrelationship between features. Consequently, multiple features with high relevance may be selected simultaneously, resulting in sub-optimal result. In this work, we solve this problem by proposing an AutoML-based feature selection framework that can automatically search the optimal feature subset. Specifically, we first embed the search space into a weight-sharing Supernet. Then, a two-stage neural architecture search method is employed to evaluate the feature quality. In the first stage, a well-designed sampling method considering feature convergence fairness is applied to train the Supernet. In the second stage, a reinforcement learning method is used to search for the optimal feature subset efficiently. The Experimental results on two real datasets demonstrate the superior performance of new framework over other solutions. Our proposed method obtain significant improvement with a 20% reduction in the amount of features on the Criteo. More validation experiments demonstrate the ability and robustness of the framework.
Supplemental Material
- [1] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.Google ScholarDigital Library
- [2] Xiangxiang Chu, Bo Zhang, and Ruijun Xu. 2021. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12239–12248.Google ScholarCross Ref
- [3] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.Google ScholarDigital Library
- [4] Naoual El Aboudi and Laila Benhlima. 2016. Review on wrapper feature selection approaches. In 2016 International Conference on Engineering & MIS (ICEMIS). IEEE, 1–5.Google ScholarCross Ref
- [5] Ali El Akadi, Abdeljalil El Ouardighi, and Driss Aboutajdine. 2008. A powerful feature selection approach based on mutual information. International Journal of Computer Science and Network Security 8, 4 (2008), 116.Google Scholar
- [6] Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189–1232.Google Scholar
- [7] Antonio A Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, and James Zou. 2021. Mixed dimension embeddings with application to memory-efficient recommendation systems. In 2021 IEEE International Symposium on Information Theory (ISIT). IEEE, 2786–2791.Google ScholarDigital Library
- [8] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).Google Scholar
- [9] Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2020. Single path one-shot neural architecture search with uniform sampling. In European conference on computer vision. Springer, 544–560.Google ScholarDigital Library
- [10] Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. Journal of machine learning research 3, Mar (2003), 1157–1182.Google ScholarDigital Library
- [11] Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141.Google ScholarCross Ref
- [12] Xin Jin, Anbang Xu, Rongfang Bie, and Ping Guo. 2006. Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In International workshop on data mining for biomedical applications. Springer, 106–115.Google ScholarDigital Library
- [13] Manas R Joglekar, Cong Li, Mei Chen, Taibai Xu, Xiaoming Wang, Jay K Adams, Pranav Khaitan, Jiahui Liu, and Quoc V Le. 2020. Neural input search for large scale recommendation models. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2387–2397.Google ScholarDigital Library
- [14] Bin Liu, Niannan Xue, Huifeng Guo, Ruiming Tang, Stefanos Zafeiriou, Xiuqiang He, and Zhenguo Li. 2020. AutoGroup: Automatic feature grouping for modelling explicit high-order feature interactions in CTR prediction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 199–208.Google ScholarDigital Library
- [15] Bin Liu, Chenxu Zhu, Guilin Li, Weinan Zhang, Jincai Lai, Ruiming Tang, Xiuqiang He, Zhenguo Li, and Yong Yu. 2020. Autofis: Automatic feature interaction selection in factorization models for click-through rate prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2636–2645.Google ScholarDigital Library
- [16] Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).Google Scholar
- [17] Siyi Liu, Chen Gao, Yihong Chen, Depeng Jin, and Yong Li. 2021. Learnable embedding sizes for recommender systems. arXiv preprint arXiv:2101.07577 (2021).Google Scholar
- [18] Yaqing Liu, Yong Mu, Keyu Chen, Yiming Li, and Jinghuan Guo. 2020. Daily activity feature selection in smart homes based on pearson correlation coefficient. Neural Processing Letters 51, 2 (2020), 1771–1787.Google ScholarDigital Library
- [19] Xu Ma, Pengjie Wang, Hui Zhao, Shaoguo Liu, Chuhan Zhao, Wei Lin, Kuang-Chih Lee, Jian Xu, and Bo Zheng. 2021. Towards a Better Tradeoff between Effectiveness and Efficiency in Pre-Ranking: A Learnable Feature Selection based Approach. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2036–2040.Google ScholarDigital Library
- [20] Andrew Y Ng. 2004. Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In Proceedings of the twenty-first international conference on Machine learning. 78.Google ScholarDigital Library
- [21] Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient neural architecture search via parameters sharing. In International conference on machine learning. PMLR, 4095–4104.Google Scholar
- [22] Liang Qu, Yonghong Ye, Ningzhi Tang, Lixin Zhang, Yuhui Shi, and Hongzhi Yin. 2022. Single-shot Embedding Dimension Search in Recommender System. arXiv preprint arXiv:2204.03281 (2022).Google Scholar
- [23] Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal 27, 3 (1948), 379–423.Google Scholar
- [24] Qingquan Song, Dehua Cheng, Hanning Zhou, Jiyan Yang, Yuandong Tian, and Xia Hu. 2020. Towards automated neural interaction discovery for click-through rate prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 945–955.Google ScholarDigital Library
- [25] Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267–288.Google ScholarCross Ref
- [26] Dimitrios Ververidis and Constantine Kotropoulos. 2005. Sequential forward feature selection with low computational cost. In 2005 13th European Signal Processing Conference. IEEE, 1–4.Google Scholar
- [27] Yejing Wang, Xiangyu Zhao, Tong Xu, and Xian Wu. 2022. Autofield: Automating feature selection in deep recommender systems. In Proceedings of the ACM Web Conference 2022. 1977–1986.Google ScholarDigital Library
- [28] Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3 (1992), 229–256.Google Scholar
- [29] Bencheng Yan, Pengjie Wang, Kai Zhang, Wei Lin, Kuang-Chih Lee, Jian Xu, and Bo Zheng. 2021. Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3568–3572.Google ScholarDigital Library
- [30] Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR) 52, 1 (2019), 1–38.Google ScholarDigital Library
- [31] Pengyu Zhao, Kecheng Xiao, Yuanxing Zhang, Kaigui Bian, and Wei Yan. 2021. AMEIR: Automatic Behavior Modeling, Interaction Exploration and MLP Investigation in the Recommender System.. In IJCAI. 2104–2110.Google Scholar
- [32] Xiangyu Zhao, Haochen Liu, Hui Liu, Jiliang Tang, Weiwei Guo, Jun Shi, Sida Wang, Huiji Gao, and Bo Long. 2021. Autodim: Field-aware embedding dimension searchin recommender systems. In Proceedings of the Web Conference 2021. 3015–3022.Google ScholarDigital Library
- [33] Xiangyu Zhaok, Haochen Liu, Wenqi Fan, Hui Liu, Jiliang Tang, Chong Wang, Ming Chen, Xudong Zheng, Xiaobing Liu, and Xiwang Yang. 2021. Autoemb: Automated embedding dimensionality search in streaming recommendations. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 896–905.Google ScholarCross Ref
- [34] Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1059–1068.Google ScholarDigital Library
- [35] Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8697–8710.Google ScholarCross Ref
Index Terms
- Automatic Feature Selection By One-Shot Neural Architecture Search In Recommendation Systems
Recommendations
AutoField: Automating Feature Selection in Deep Recommender Systems
WWW '22: Proceedings of the ACM Web Conference 2022Feature quality has an impactful effect on recommendation performance. Thereby, feature selection is a critical process in developing deep learning-based recommender systems. Most existing deep recommender systems, however, focus on designing ...
AdaFS: Adaptive Feature Selection in Deep Recommender System
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningFeature selection plays an impactful role in deep recommender systems, which selects a subset of the most predictive features, so as to boost the recommendation performance and accelerate model optimization. The majority of existing feature selection ...
Item Cold-Start Recommendation with Personalized Feature Selection
AbstractThe problem of recommending new items to users (often referred to as item cold-start recommendation) remains a challenge due to the absence of users’ past preferences for these items. Item features from side information are typically leveraged to ...
Comments