skip to main content
10.1145/3534678.3539118acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Design Domain Specific Neural Network via Symbolic Testing

Published: 14 August 2022 Publication History

Abstract

Deep sequence networks such as multi-head self-attention networks provide a promising way to extract effective representations from raw sequence data in an end-to-end fashion and have shown great success in various domains such as natural language processing, computer vision, $etc$. However, in domains such as financial risk management and anti-fraud where expert-derived features are heavily relied on, deep sequence models struggle to dominate the game.In this paper, we introduce a simple framework called symbolic testing to verify the learnability of certain expert-derived features over sequence data. A systematic investigation over simulated data reveals the fact that the self-attention architecture fails to learn some standard symbolic expressions like the count distinct operation. To overcome this deficiency, we propose a novel architecture named SHORING, which contains two components:event network andsequence network. Theevent network efficiently learns arbitrary high-orderevent-level conditional embeddings via a reparameterization trick while thesequence network integrates domain-specific aggregations into the sequence-level representation, thereby providing richer inductive biases compare to standard sequence architectures like self-attention. We conduct comprehensive experiments and ablation studies on synthetic datasets that mimic sequence data commonly seen in anti-fraud domain and three real-world datasets. The results show that SHORING learns commonly used symbolic features well, and experimentally outperforms the state-of-the-art methods by a significant margin over real-world online transaction datasets. The symbolic testing framework and SHORING have been applied in anti-fraud model development at Alipay and improved performance of models for real-time fraud-detection.

Supplemental Material

MP4 File
Presentation video of the paper "Design Domain Specific Neural Network via Symbolic Testing"

References

[1]
Blondel, M., Fujino, A., Ueda, N., and Ishihata, M. Higher-order factorization machines. NeurIPS, 29:3351--3359, 2016.
[2]
Blondel, M., Ishihata, M., Fujino, A., and Ueda, N. Polynomial networks and factorization machines: New insights and efficient training algorithms. ICML, 2016.
[3]
Chen, X., Li, S., Li, H., Jiang, S., and Song, L. Neural model-based reinforcement learning for recommendation. 2018.
[4]
Chen, X., Li, S., Li, H., Jiang, S., Qi, Y., and Song, L. Generative adversarial user model for reinforcement learning based recommendation system. In ICML, 2019.
[5]
Cheng, H.-T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., Anderson, G., Corrado, G., Chai, W., Ispir, M., et al. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems, pp. 7--10, 2016.
[6]
Cheng, W., Shen, Y., and Huang, L. Adaptive factorization network: Learning adaptive-order feature interactions. In AAAI, 2020.
[7]
Choromanski, K., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., Kaiser, L., et al. Rethinking attention with performers. arXiv, 2020.
[8]
Dai, H., Li, H., Tian, T., Huang, X., Wang, L., Zhu, J., and Song, L. Adversarial attack on graph structured data. In International conference on machine learning (ICML), pp. 1115--1124. PMLR, 2018.
[9]
Das, D., Sahoo, L., and Datta, S. A survey on recommendation system. International Journal of Computer Applications, 160(7), 2017.
[10]
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv, 2018.
[11]
Feng, Y., Lv, F., Shen, W., Wang, M., Sun, F., Zhu, Y., and Yang, K. Deep session interest network for click-through rate prediction. arXiv, 2019.
[12]
Garcez, A., Besold, T. R., Raedt, L., Foldiak, P., Hitzler, P., Icard, T., Kuhnberger, K.-U., Lamb, L. C., Miikkulainen, R., and Silver, D. L. Neural-symbolic learning and reasoning: contributions and challenges. 2015.
[13]
Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. Deep learning, volume 1. MIT press Cambridge, 2016.
[14]
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., and Smola, A. A kernel two-sample test. JMLR, 2012.
[15]
Guo, H., Tang, R., Ye, Y., Li, Z., and He, X. Deepfm: a factorization-machine based neural network for ctr prediction. arXiv, 2017.
[16]
He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In CVPR, 2016.
[17]
He, X. and Chua, T.-S. Neural factorization machines for sparse predictive analytics. In SIGIR, 2017.
[18]
Hines, J.W. A logarithmic neural network architecture for unbounded non-linear function approximation. In ICNN, 1996.
[19]
Hochreiter, S. and Schmidhuber, J. Long short-term memory. Number 8, pp. 1735--1780. Neural computation, 1997.
[20]
Jannach, D., de Souza P. Moreira, G., and Oldridge, E. Why are deep learning models not consistently winning recommender systems competitions yet? a position paper. In Proceedings of the Recommender Systems Challenge 2020, pp. 44--49. 2020.
[21]
Kim, H., Papamakarios, G., and Mnih, A. The lipschitz constant of self-attention. In International Conference on Machine Learning, pp. 5562--5571. PMLR, 2021.
[22]
Li, H. and Chen, Y. Q. Automatic 3d reconstruction of mitochondrion with local intensity distribution signature and shape feature. In ICIP, 2013.
[23]
Li, H., Liu, Y., and Chen, Y. Q. Automatic trajectory measurement of large numbers of crowded objects. Optical Engineering, 52(6):067003, 2013.
[24]
Li, H., Hu, K., Zhang, S., Qi, Y., and Song, L. Double neural counterfactual regret minimization. In ICLR, 2019.
[25]
Lian, J., Zhou, X., Zhang, F., Chen, Z., Xie, X., and Sun, G. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In SIGKDD, 2018.
[26]
Potdar, K., Pardawala, T. S., and Pai, C. D. A comparative study of categorical variable encoding techniques for neural network classifiers. International journal of computer applications, 2017.
[27]
Qu, C., Li, H., Liu, C., Xiong, J., Zhang, J., Chu, W., Wang, W., Qi, Y., and Song, L. Intention propagation for multi-agent reinforcement learning. arXiv preprint arXiv:2004.08883, 2020.
[28]
Qu, Y., Cai, H., Ren, K., Zhang, W., Yu, Y., Wen, Y., and Wang, J. Product-based neural networks for user response prediction. In ICDM, 2016.
[29]
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. Language models are unsupervised multitask learners. OpenAI blog, 2019.
[30]
Rendle, S. Factorization machines. In ICDM, 2010.
[31]
Saxe, A. M., McClelland, J. L., and Ganguli, S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv, 2013.
[32]
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. V. D., and et al., J. S. Mastering the game of Go with deep neural networks and tree search. Nature, (7587), 2016.
[33]
Song, W., Shi, C., Xiao, Z., Duan, Z., Xu, Y., Zhang, M., and Tang, J. Autoint: Automatic feature interaction learning via self-attentive neural networks. In CIKM, 2019.
[34]
Sun, H., Chen, W., Li, H., and Song, L. Improving learning to branch via reinforcement learning. Neural Information Processing Systems (NeurIPS) on Learning Meets Combinatorial Algorithms, 2020.
[35]
Sutton, R. S., Barto, A. G., et al. Introduction to reinforcement learning, volume 135. MIT press Cambridge, 1998.
[36]
Tay, Y., Dehghani, M., Abnar, S., Shen, Y., Bahri, D., Pham, P., Rao, J., Yang, L., Ruder, S., and Metzler, D. Long range arena: A benchmark for efficient transformers. arXiv preprint arXiv:2011.04006, 2020.
[37]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. In NeurIPS, 2017.
[38]
Wang, L., Chang, X., Li, S., Chu, Y., Li, H., Zhang, W., He, X., Song, L., Zhou, J., and Yang, H. Tcl: Transformer-based dynamic graph modelling via contrastive learning. arXiv preprint arXiv:2105.07944, 2021.
[39]
Wang, R., Fu, B., Fu, G., and Wang, M. Deep & cross network for ad click predictions. In ADKDD. 2017.
[40]
Xi, D., Zhuang, F., Song, B., Zhu, Y., Chen, S., Hong, D., Chen, T., Gu, X., and He, Q. Neural hierarchical factorization machines for user's event sequence analysis. In SIGIR, 2020.
[41]
Xiao, J., Ye, H., He, X., Zhang, H.,Wu, F., and Chua, T.-S. Attentional factorization machines: Learning the weight of feature interactions via attention networks. arXiv, 2017.
[42]
Xu, K., Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
[43]
Zhang, W., Du, T., and Wang, J. Deep learning over multi-field categorical data. In European conference on information retrieval, 2016.
[44]
Zhao, J., Li, H., Duan, M., Wang, S. H., and Chen, Y. Q. Rapid identification of neuronal structures in electronic microscope image using novel combined multi-scale image features. Neurocomputing, 230:152--159, 2017.
[45]
Zhou, G., Zhu, X., Song, C., Fan, Y., Zhu, H., Ma, X., Yan, Y., Jin, J., Li, H., and Gai, K. Deep interest network for click-through rate prediction. In SIGKDD, 2018.
[46]
Zhu, Y., Xi, D., Song, B., Zhuang, F., Chen, S., Gu, X., and He, Q. Modeling users' behavior sequences with hierarchical explainable network for cross-domain fraud detection. In Proceedings of The Web Conference, 2020.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2022
5033 pages
ISBN:9781450393850
DOI:10.1145/3534678
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anti-fraud
  2. conditional sequence model
  3. high-order interaction
  4. inductive bias
  5. neural networks
  6. symbolic learning

Qualifiers

  • Research-article

Conference

KDD '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 179
    Total Downloads
  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)9
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media