Skip to main content
Log in

Hierarchical attention and feature projection for click-through rate prediction

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Click-through rate (CTR) prediction plays an important role in many industrial applications, feature engineering directly influences CTR prediction performance because features are normally the multi-field type. However, the existing CTR prediction techniques either neglect the importance of each feature or regard the feature interactions equally for feature learning. In addition, using an inner product or a Hadamard product is too simple to effectively model the feature interactions. These limitations lead to suboptimal performances of existing models. In this paper, we propose a framework called Hierarchical Attention and Feature Projection neural network (HAFP) for CTR prediction, which enables the automatically learning of more representative and efficient feature representation in an end-to-end manner. Towards this end, we employ a feature learning layer with a hierarchical attention mechanism to jointly extract more generalized and dominant features and feature interactions. In addition, a projective bilinear function is designed in meaningful second-order interaction encoder to effectively learn more fine-grained and comprehensive second-order feature interactions. Taking advantages of the hierarchical attention mechanism and the projective bilinear function, our proposed model can not only model feature learning in a flexible fashion, but also provide an interpretable capability of the prediction results. Experimental results on two real-world datasets demonstrate that HAFP outperforms the state-of-the-art in terms of Logloss and AUC for CTR prediction baselines. Further analysis verifies the importance of the proposed hierarchical attention mechanism and the projective bilinear function for modelling the feature representation, showing the rationality and effectiveness of HAFP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://labs.criteo.com/2014/02/download-dataset/

  2. http://www.kaggle.com/c/avazu-ctr-prediction

References

  1. Avila Clemenshia P, Vijaya MS (2016) Click through rate prediction for display advertisement. International Journal of Computer Applications 975:8887

    Google Scholar 

  2. Cai W, Wang Y, Ma J, Jin Q (2021) Can: Effective cross features by global attention mechanism and neural network for ad click prediction. Tsinghua Sci Technol 27(1):186–195

    Article  Google Scholar 

  3. Bo C, Ding Y, Xin X, Li Y, Wang Y, Wang D (2021) Airec: Attentive intersection model for tag-aware recommendation. Neurocomputing 421:105–114

    Article  Google Scholar 

  4. Cheng H-T, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, Anil R, Haque Z, Hong L, Jain V, Liu X, Shah H (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, pp 7–10

  5. Frey RM, Xu R, Ammendola C, Moling O, Giglio G, Ilic A (2017) Mobile recommendations based on interest prediction from consumer’s installed apps-insights from a large-scale field study. Inf Syst 71:152–163

    Article  Google Scholar 

  6. Guo H, Tang R, Ye Y, Li Z, He X (2017) Deepfm: a factorization-machine based neural network for CTR prediction. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp 1725–1731

  7. He X, Chua T-S (2017) Neural factorization machines for sparse predictive analytics. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 355–364

  8. Hong W, Xiong Z, You J, Wu X, Xia M (2021) CPIN: Comprehensive present-interest network for CTR prediction. Expert System Application 168:114469

    Article  Google Scholar 

  9. Hu J, Li S, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141

  10. Huang T, Zhang Z, Zhang J (2019) Fibinet: combining feature importance and bilinear feature interaction for click-through rate prediction. In: Proceedings of the 13th ACM Conference on Recommender Systems, pp 169–177

  11. Jiang D, Xu R, Xu X, Xie Y (2021) Multi-view feature transfer for click-through rate prediction. Inf Sci 546:961–976

    Article  Google Scholar 

  12. Juan Y-C, Zhuang Y, Chin W-S, Lin C-J (2016) Field-aware factorization machines for CTR prediction. In: Proceedings of the 10th ACM Conference on Recommender Systems, pp 43–50

  13. Li D, Hu B, Chen Q, Wang X, Qi Q, Wang L, Liu H (2021) Attentive capsule network for click-through rate and conversion rate prediction in online advertising. Knowl-Based Syst 211:106522

    Article  Google Scholar 

  14. Li G, Gan Y, Wu H, Xiao N, Lin L (2019) Cross-modal attentional context learning for rgb-d object detection. IEEE Trans Image Process 28(4):1591–1601

    Article  MathSciNet  Google Scholar 

  15. Li H, Duan H, Zheng Y, Wang Q, Wang Yu (2020) A CTR prediction model based on user interest via attention mechanism. Appl Intell 50(4):1192–1203

    Article  Google Scholar 

  16. Li Z, Cheng W, Chen Y, Chen H, Wang W (2020) Interpretable click-through rate prediction through hierarchical attention. In: Proceedings of the Thirteenth ACM International Conference on Web Search and Data Mining, pp 313–321

  17. Liu B, Zhu C, Li G, Zhang W, Lai J, Tang R, He X, Li Z, Yu Y (2020) Autofis: Automatic feature interaction selection in factorization models for click-through rate prediction. In: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp 2636–2645

  18. Liu M, Cai S, Lai Z, Qiu L, Hu Z, Yi D (2021) A joint learning model for click-through prediction in display advertising. Neurocomputing 445:206–219

    Article  Google Scholar 

  19. Lodhi B, Kang J (2019) Multipath-densenet: a supervised ensemble architecture of densely connected convolutional networks. Inf Sci 482:63–72

    Article  Google Scholar 

  20. Luo Y, Wang M, Zhou H, Yao Q, Tu W-W, Chen Y, Dai W, Yang Q (2019) Autocross: Automatic feature crossing for tabular data in real-world applications. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 1936–1945

  21. Ma C, Mu X, Lin R, Wang S (2021) Multilayer feature fusion with weight adjustment based on a convolutional neural network for remote sensing scene classification. IEEE Geoscience and Remote Sensing Letters 18:241–245

    Article  Google Scholar 

  22. Pan J, Xu J, Ruiz AL, Zhao W, Pan S, Sun Yu, Lu Q (2018) Field-weighted factorization machines for click-through rate prediction in display advertising. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp 1349–1357

  23. Qu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, Wang J (2016) Product-based neural networks for user response prediction. In: Proceedings of the IEEE 16th International Conference on Data Mining, pp 1149–1154

  24. Steffen R (2012) Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology 3(3):1–22

    Google Scholar 

  25. Shan Y, Ryan Hoens T, Jiao J, Wang H, Yu D, Mao JC (2016) Deep crossing: Web-scale modeling without manually crafted combinatorial features. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 255–262

  26. Silveira T, Zhang M, Liu Y, Ma S (2019) How good your recommender system is? a survey on evaluations in recommendation. International Journal of Machine Learning and Cybernetics 10:813–831

    Article  Google Scholar 

  27. Song K, Huang Q, Zhang F, Lu J (2021) Coarse-to-fine: a dual-view attention network for click-through rate prediction. Knowledge Based Systems 216:106767

    Article  Google Scholar 

  28. Tao Z, Wang X, He X, Huang X, Chua T-S (2020) Hoafm: A high-order attentive factorization machine for CTR prediction. Inf Process Manag 57:102076

    Article  Google Scholar 

  29. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, pp 5998–6008

  30. Wang Q, Fang’ai Liu PH, Xing S, Zhao X (2020) A hierarchical attention model for ctr prediction based on user interest. IEEE Syst J 14(3):4015–4024

    Article  Google Scholar 

  31. Wang Q, Liu F, Huang Pu, Xing S, Zhao X (2020) A hierarchical attention model for CTR prediction based on user interest. IEEE Syst J 14(3):4015–4024

    Article  Google Scholar 

  32. Wang Q, Fang’ai Liu SX, Zhao X (2019) Research on CTR prediction based on stacked autoencoder. Appl Intell 49(8):2970–2981

    Article  Google Scholar 

  33. Wang R, Fu B, Fu G, Wang M (2017) Deep & cross network for ad click predictions. In: Proceedings of the ADKDD’17, pp 1–7

  34. Xiao J, Ye H, He X, Zhang H, Wu F, Chua T-S (2017) Attentional factorization machines: Learning the weight of feature interactions via attention networks. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp 3119–3125

  35. En Xu, Yu Z, Guo B, Cui H (2021) Core interest network for click-through rate prediction. ACM Transactions Knowledge Discovery Data 15(2):1–16

    Google Scholar 

  36. Xue N, Liu B, Guo H, Tang R, Zhou F, Zafeiriou SP, Zhang Y, Wang J, Li Z (2020) Autohash: Learning higher-order feature interactions for deep ctr prediction. IEEE Trans Knowl Data Eng, pp 1–1

  37. Yan C, Chen Y, Wan Y, Wang P (2021) Modeling low- and high-order feature interactions with FM and self-attention network. Appl Intell 51(6):3189–3201

    Article  Google Scholar 

  38. Yan C, Li X, Chen Y, Zhang Y (2021) Jointctr: a joint ctr prediction framework combining feature interaction and sequential behavior learning. Appl Intell, pp 1–14

  39. Yi Y, Xu B, Shen S, Shen F, Zhao J (2020) Operation-aware neural networks for user response prediction. Neural Netw 121:161–168

    Article  Google Scholar 

  40. Zhang W, Du T, Wang J (2016) Deep learning over multi-field categorical data - - a case study on user response prediction. In: Advances in information retrieval - 38th european conference on IR research, vol 9626, pp 45–57

  41. Zhong Z, Li J, Luo Z, Chapman M (2018) Spectral–spatial residual network for hyperspectral image classification: a 3-d deep learning framework. IEEE Trans Geosci Remote Sens 56(2):847–858

    Article  Google Scholar 

  42. Zou D, Wang Z, Zhang L, Zou J, Qi L i, Chen Y, Sheng W (2021) Deep field relation neural network for click-through rate prediction. Inf Sci 577:128–139

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinjin Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Zhong, C., Fan, S. et al. Hierarchical attention and feature projection for click-through rate prediction. Appl Intell 52, 8651–8663 (2022). https://doi.org/10.1007/s10489-021-02931-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02931-0

Keywords

Navigation