ABSTRACT
Feature crossing captures interactions among categorical features and is useful to enhance learning from tabular data in real-world businesses. In this paper, we present AutoCross, an automatic feature crossing tool provided by 4Paradigm to its customers, ranging from banks, hospitals, to Internet corporations. By performing beam search in a tree-structured space, AutoCross enables efficient generation of high-order cross features, which is not yet visited by existing works. Additionally, we propose successive mini-batch gradient descent and multi-granularity discretization to further improve efficiency and effectiveness, while ensuring simplicity so that no machine learning expertise or tedious hyper-parameter tuning is required. Furthermore, the algorithms are designed to reduce the computational, transmitting, and storage costs involved in distributed computing. Experimental results on both benchmark and real-world business datasets demonstrate the effectiveness and efficiency of AutoCross. It is shown that AutoCross can significantly enhance the performance of both linear and deep models.
- R. Agrawal, T. Imieli'nski, and A. Swami. 1993. Mining association rules between sets of items in large databases. In ACM Sigmod Record, Vol. 22. ACM, 207--216. Google ScholarDigital Library
- M. Blondel, A. Fujino, N. Ueda, and M. Ishihata. 2016. Higher-order factorization machines. In Advances in Neural Information Processing Systems. 3351--3359. Google ScholarDigital Library
- J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. 2013. Recommender systems survey. Knowledge-Based Systems, Vol. 46 (2013), 109--132. Google ScholarDigital Library
- R. Bolton and D. Hand. 2002. Statistical fraud detection: A review. Statistical science (2002), 235--249.Google Scholar
- O. Chapelle, E. Manavoglu, and R. Rosales. 2015. Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 5, 4 (2015), 61. Google ScholarDigital Library
- C. Cheng, F. Xia, T. Zhang, I. King, and M. Lyu. 2014. Gradient boosting factorization machines. In ACM Conference on Recommender systems. 265--272. Google ScholarDigital Library
- H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, and M. Ispir. 2016. Wide & deep learning for recommender systems. In Workshop on Deep Learning for Recommender Systems. 7--10. Google ScholarDigital Library
- D. Crankshaw, X. Wang, G. Zhou, M. Franklin, J. Gonzalez, and I. Stoica. 2017. Clipper: A low-latency online prediction serving system.. In USENIX Symposium on Networked Systems Design and Implementation. 613--627. Google ScholarDigital Library
- P. Domingos. 2012. A few useful things to know about machine learning. Commun. ACM, Vol. 55, 10 (2012), 78--87. Google ScholarDigital Library
- D. Evans. 2009. The online advertising industry: Economics, evolution, and privacy. Journal of Economic Perspectives, Vol. 23, 3 (2009), 37--60.Google ScholarCross Ref
- W. Fan, E. Zhong, J. Peng, O. Verscheure, K. Zhang, J. Ren, R. Yan, and Q. Yang. 2010. Generalized and heuristic-free feature construction for improved accuracy. In SIAM International Conference on Data Mining. 629--640.Google Scholar
- J. Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.Google Scholar
- H. Guo and W. Hsu. 2002. A survey of algorithms for real-time Bayesian network inference. In Join Workshop on Real Time Decision Support and Diagnosis Systems.Google Scholar
- H. Guo, R. Tang, Y. Ye, Z. Li, and X. He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. In International Joint Conference on Artificial Intelligence. 1725--1731. Google ScholarDigital Library
- J. Han, J. Pei, and M. Kamber. 2011. Data mining: concepts and techniques. Elsevier. Google ScholarDigital Library
- J. Han, J. Pei, and Y. Yin. 2000. Mining frequent patterns without candidate generation. In ACM Sigmod Record, Vol. 29. 1--12. Google ScholarDigital Library
- S. Han, H. Mao, and W. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations.Google Scholar
- K. Jamieson and A. Talwalkar. 2016. Non-stochastic best arm identification and hyperparameter optimization. In Artificial Intelligence and Statistics. 240--248.Google Scholar
- Y. Juan, Y. Zhuang, W.-S. Chin, and C.-J. Lin. 2016. Field-aware factorization machines for CTR prediction. In ACM Conference on Recommender Systems. 43--50. Google ScholarDigital Library
- J. Kanter and K. Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. In IEEE International Conference on Data Science and Advanced Analytics. 1--10.Google Scholar
- G. Katz, E. Shin, and D. Song. 2016. Explorekit: Automatic feature generation and selection. In International Conference on Data Mining. 979--984.Google Scholar
- D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. In International Conference on Learning Representations.Google Scholar
- I. Kononenko. 2001. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in medicine, Vol. 23, 1 (2001), 89--109. Google ScholarDigital Library
- S. Kotsiantis and D. Kanellopoulos. 2006. Discretization techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering, Vol. 32, 1 (2006), 47--58.Google Scholar
- M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D. Andersen, and A. Smola. 2013. Parameter server for distributed machine learning. In Big Learning NIPS Workshop, Vol. 6. 2.Google Scholar
- J. Lian, X. Zhou, F. Zhang, Z. Chen, X. Xie, and G. Sun. 2018. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In International Conference on Knowledge Discovery & Data Mining. Google ScholarDigital Library
- H. Liu, F. Hussain, C. Tan, and M. Dash. 2002. Discretization: An enabling technique. Data mining and knowledge discovery, Vol. 6, 4 (2002), 393--423. Google ScholarDigital Library
- H. Liu, H. sand Motoda. 1998. Feature extraction, construction and selection: A data mining perspective. Vol. 453. Springer Science & Business Media. Google ScholarDigital Library
- M. Medress, F. Cooper, J. Forgie, C. Green, D. Klatt, M. O'Malley, E. Neuburg, A. Newell, and B. Reddy, D Ritea. 1977. Speech understanding systems: Report of a steering committee. Artificial Intelligence, Vol. 9, 3 (1977), 307--316.Google ScholarCross Ref
- L. Meier, S. Van De Geer, and P. Bühlmann. 2008. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 70, 1 (2008), 53--71.Google ScholarCross Ref
- T. Mitchell. 1997. Machine learning. Springer Science & Business Media. Google ScholarDigital Library
- R. Ng, L. Lakshmanan, J. Han, and A. Pang. 1998. Exploratory mining and pruning optimizations of constrained associations rules. In ACM Sigmod Record, Vol. 27. ACM, 13--24. Google ScholarDigital Library
- Y. Qu, H. Cai, K. Ren, W. Zhang, Y. Yu, Y. Wen, and J. Wang. 2016. Product-based neural networks for user response prediction. In IEEE International Conference on Data Mining. IEEE, 1149--1154.Google Scholar
- R. Rosales, H. Cheng, and E. Manavoglu. 2012. Post-click conversion modeling and analysis for non-guaranteed delivery display advertising. In ACM International Conference on Web Search and Data Mining. 293--302. Google ScholarDigital Library
- M. Smith and L. Bull. 2005. Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines, Vol. 6, 3 (2005), 265--281. Google ScholarDigital Library
- B. Tran, B. Xue, and M. Zhang. 2016. Genetic programming for feature construction and selection in classification on high-dimensional data. Memetic Computing, Vol. 8, 1 (2016), 3--15.Google ScholarCross Ref
- R. Wang, B. Fu, G. Fu, and M. Wang. 2017. Deep & cross network for ad click predictions. In KDD Workshop. ACM, 12. Google ScholarDigital Library
- S. Wang. 2010. A comprehensive survey of data mining-based accounting-fraud detection research. In Intelligent Computation Technology and Automation (ICICTA), 2010 International Conference on, Vol. 1. IEEE, 50--53. Google ScholarDigital Library
- K. Weinberger, A. Dasgupta, J. Attenberg, J. Langford, and A. Smola. 2009. Feature hashing for large scale multitask learning. In International Conference on Machine Learning. Google ScholarDigital Library
- Q. Yao, M. Wang, Y. Chen, W. Dai, Y. Hu, Y. Li, W.-W. Tu, Q. Yang, and Y. Yu. 2018. Taking Human out of Learning Applications: A Survey on Automated Machine Learning. Technical Report. arXiv preprint.Google Scholar
- R. Zeff and B. Aronson. 1999. Advertising on the Internet. John Wiley & Sons, Inc. Google ScholarDigital Library
- W. Zhang, T. Du, and J. Wang. 2016. Deep learning over multi-field categorical data. In European conference on information retrieval. Springer, 45--57.Google Scholar
- Y. Zhang, Q. Yao, W. Dai, and L. Chen. 2019. AutoKGE: Searching Scoring Functions for Knowledge Graph Embedding. Technical Report. arXiv preprint arXiv:1904.11682.Google Scholar
Index Terms
- AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications
Recommendations
Deep & Cross Network for Ad Click Predictions
ADKDD'17: Proceedings of the ADKDD'17Feature engineering has been the key to the success of many prediction models. However, the process is nontrivial and often requires manual feature engineering or exhaustive searching. DNNs are able to automatically learn feature interactions; however, ...
Multimodal AutoML for Image, Text and Tabular Data
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningAutomated machine learning (AutoML) offers the promise of translating raw data into accurate predictions without the need for significant human effort, expertise, and manual experimentation. In this lecture-style tutorial, we demonstrate fundamental ...
Mining Cross Features for Financial Credit Risk Assessment
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementFor reliability, machine learning models in some areas, e.g., finance and healthcare, require to be both accurate and globally interpretable. Among them, credit risk assessment is a major application of machine learning for financial institutions to ...
Comments