skip to main content
10.1145/3292500.3330679acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications

Published:25 July 2019Publication History

ABSTRACT

Feature crossing captures interactions among categorical features and is useful to enhance learning from tabular data in real-world businesses. In this paper, we present AutoCross, an automatic feature crossing tool provided by 4Paradigm to its customers, ranging from banks, hospitals, to Internet corporations. By performing beam search in a tree-structured space, AutoCross enables efficient generation of high-order cross features, which is not yet visited by existing works. Additionally, we propose successive mini-batch gradient descent and multi-granularity discretization to further improve efficiency and effectiveness, while ensuring simplicity so that no machine learning expertise or tedious hyper-parameter tuning is required. Furthermore, the algorithms are designed to reduce the computational, transmitting, and storage costs involved in distributed computing. Experimental results on both benchmark and real-world business datasets demonstrate the effectiveness and efficiency of AutoCross. It is shown that AutoCross can significantly enhance the performance of both linear and deep models.

References

  1. R. Agrawal, T. Imieli'nski, and A. Swami. 1993. Mining association rules between sets of items in large databases. In ACM Sigmod Record, Vol. 22. ACM, 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Blondel, A. Fujino, N. Ueda, and M. Ishihata. 2016. Higher-order factorization machines. In Advances in Neural Information Processing Systems. 3351--3359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. 2013. Recommender systems survey. Knowledge-Based Systems, Vol. 46 (2013), 109--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Bolton and D. Hand. 2002. Statistical fraud detection: A review. Statistical science (2002), 235--249.Google ScholarGoogle Scholar
  5. O. Chapelle, E. Manavoglu, and R. Rosales. 2015. Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 5, 4 (2015), 61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Cheng, F. Xia, T. Zhang, I. King, and M. Lyu. 2014. Gradient boosting factorization machines. In ACM Conference on Recommender systems. 265--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, and M. Ispir. 2016. Wide & deep learning for recommender systems. In Workshop on Deep Learning for Recommender Systems. 7--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Crankshaw, X. Wang, G. Zhou, M. Franklin, J. Gonzalez, and I. Stoica. 2017. Clipper: A low-latency online prediction serving system.. In USENIX Symposium on Networked Systems Design and Implementation. 613--627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Domingos. 2012. A few useful things to know about machine learning. Commun. ACM, Vol. 55, 10 (2012), 78--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Evans. 2009. The online advertising industry: Economics, evolution, and privacy. Journal of Economic Perspectives, Vol. 23, 3 (2009), 37--60.Google ScholarGoogle ScholarCross RefCross Ref
  11. W. Fan, E. Zhong, J. Peng, O. Verscheure, K. Zhang, J. Ren, R. Yan, and Q. Yang. 2010. Generalized and heuristic-free feature construction for improved accuracy. In SIAM International Conference on Data Mining. 629--640.Google ScholarGoogle Scholar
  12. J. Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.Google ScholarGoogle Scholar
  13. H. Guo and W. Hsu. 2002. A survey of algorithms for real-time Bayesian network inference. In Join Workshop on Real Time Decision Support and Diagnosis Systems.Google ScholarGoogle Scholar
  14. H. Guo, R. Tang, Y. Ye, Z. Li, and X. He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. In International Joint Conference on Artificial Intelligence. 1725--1731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Han, J. Pei, and M. Kamber. 2011. Data mining: concepts and techniques. Elsevier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Han, J. Pei, and Y. Yin. 2000. Mining frequent patterns without candidate generation. In ACM Sigmod Record, Vol. 29. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Han, H. Mao, and W. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  18. K. Jamieson and A. Talwalkar. 2016. Non-stochastic best arm identification and hyperparameter optimization. In Artificial Intelligence and Statistics. 240--248.Google ScholarGoogle Scholar
  19. Y. Juan, Y. Zhuang, W.-S. Chin, and C.-J. Lin. 2016. Field-aware factorization machines for CTR prediction. In ACM Conference on Recommender Systems. 43--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Kanter and K. Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. In IEEE International Conference on Data Science and Advanced Analytics. 1--10.Google ScholarGoogle Scholar
  21. G. Katz, E. Shin, and D. Song. 2016. Explorekit: Automatic feature generation and selection. In International Conference on Data Mining. 979--984.Google ScholarGoogle Scholar
  22. D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  23. I. Kononenko. 2001. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in medicine, Vol. 23, 1 (2001), 89--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Kotsiantis and D. Kanellopoulos. 2006. Discretization techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering, Vol. 32, 1 (2006), 47--58.Google ScholarGoogle Scholar
  25. M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D. Andersen, and A. Smola. 2013. Parameter server for distributed machine learning. In Big Learning NIPS Workshop, Vol. 6. 2.Google ScholarGoogle Scholar
  26. J. Lian, X. Zhou, F. Zhang, Z. Chen, X. Xie, and G. Sun. 2018. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In International Conference on Knowledge Discovery & Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Liu, F. Hussain, C. Tan, and M. Dash. 2002. Discretization: An enabling technique. Data mining and knowledge discovery, Vol. 6, 4 (2002), 393--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Liu, H. sand Motoda. 1998. Feature extraction, construction and selection: A data mining perspective. Vol. 453. Springer Science & Business Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Medress, F. Cooper, J. Forgie, C. Green, D. Klatt, M. O'Malley, E. Neuburg, A. Newell, and B. Reddy, D Ritea. 1977. Speech understanding systems: Report of a steering committee. Artificial Intelligence, Vol. 9, 3 (1977), 307--316.Google ScholarGoogle ScholarCross RefCross Ref
  30. L. Meier, S. Van De Geer, and P. Bühlmann. 2008. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 70, 1 (2008), 53--71.Google ScholarGoogle ScholarCross RefCross Ref
  31. T. Mitchell. 1997. Machine learning. Springer Science & Business Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Ng, L. Lakshmanan, J. Han, and A. Pang. 1998. Exploratory mining and pruning optimizations of constrained associations rules. In ACM Sigmod Record, Vol. 27. ACM, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Qu, H. Cai, K. Ren, W. Zhang, Y. Yu, Y. Wen, and J. Wang. 2016. Product-based neural networks for user response prediction. In IEEE International Conference on Data Mining. IEEE, 1149--1154.Google ScholarGoogle Scholar
  34. R. Rosales, H. Cheng, and E. Manavoglu. 2012. Post-click conversion modeling and analysis for non-guaranteed delivery display advertising. In ACM International Conference on Web Search and Data Mining. 293--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Smith and L. Bull. 2005. Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines, Vol. 6, 3 (2005), 265--281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. B. Tran, B. Xue, and M. Zhang. 2016. Genetic programming for feature construction and selection in classification on high-dimensional data. Memetic Computing, Vol. 8, 1 (2016), 3--15.Google ScholarGoogle ScholarCross RefCross Ref
  37. R. Wang, B. Fu, G. Fu, and M. Wang. 2017. Deep & cross network for ad click predictions. In KDD Workshop. ACM, 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Wang. 2010. A comprehensive survey of data mining-based accounting-fraud detection research. In Intelligent Computation Technology and Automation (ICICTA), 2010 International Conference on, Vol. 1. IEEE, 50--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. Weinberger, A. Dasgupta, J. Attenberg, J. Langford, and A. Smola. 2009. Feature hashing for large scale multitask learning. In International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Q. Yao, M. Wang, Y. Chen, W. Dai, Y. Hu, Y. Li, W.-W. Tu, Q. Yang, and Y. Yu. 2018. Taking Human out of Learning Applications: A Survey on Automated Machine Learning. Technical Report. arXiv preprint.Google ScholarGoogle Scholar
  41. R. Zeff and B. Aronson. 1999. Advertising on the Internet. John Wiley & Sons, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. Zhang, T. Du, and J. Wang. 2016. Deep learning over multi-field categorical data. In European conference on information retrieval. Springer, 45--57.Google ScholarGoogle Scholar
  43. Y. Zhang, Q. Yao, W. Dai, and L. Chen. 2019. AutoKGE: Searching Scoring Functions for Knowledge Graph Embedding. Technical Report. arXiv preprint arXiv:1904.11682.Google ScholarGoogle Scholar

Index Terms

  1. AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      July 2019
      3305 pages
      ISBN:9781450362016
      DOI:10.1145/3292500

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader