short-paper

Hierarchical Task-aware Multi-Head Attention Network

Authors:

Zhiwen YuAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1933 - 1937

https://doi.org/10.1145/3477495.3531781

Published: 07 July 2022 Publication History

Abstract

Neural Multi-task Learning is gaining popularity as a way to learn multiple tasks jointly within a single model. While related research continues to break new ground, two major limitations still remain, including (i) poor generalization to scenarios where tasks are loosely correlated; and (ii) under-investigation on global commonality and local characteristics of tasks. Our aim is to bridge these gaps by presenting a neural multi-task learning model coined Hierarchical Task-aware Multi-headed Attention Network (HTMN). HTMN explicitly distinguishes task-specific features from task-shared features to reduce the impact caused by weak correlation between tasks. The proposed method highlights two parts: Multi-level Task-aware Experts Network that identifies task-shared global features and task-specific local features, and Hierarchical Multi-Head Attention Network that hybridizes global and local features to profile more robust and adaptive representations for each task. Afterwards, each task tower receives its hybrid task-adaptive representation to perform task-specific predictions. Extensive experiments on two real datasets show that HTMN consistently outperforms the compared methods on a variety of prediction tasks.

Supplementary Material

MP4 File (sigir22-sp1335.mp4)

Presentation video.

Download
10.97 MB

References

[1]

Arthur Asuncion and David Newman. 2007. UCI machine learning repository.

[2]

Rich Caruana. 1997. Multitask learning. Machine learning, Vol. 28, 1 (1997), 41--75.

[3]

Alvin Deng. 2018. Multi-gate Mixture-of-Experts model in Keras and TensorFlow. https://github.com/drawbridge/keras-mmoe .

[4]

Yuan Gao, Jiayi Ma, Mingbo Zhao, Wei Liu, and Alan L Yuille. 2019. Nddr-cnn: Layerwise feature fusing in multi-task cnns by neural discriminative dimensionality reduction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3205--3214.

[5]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. arxiv: 1703.04247 [cs.IR]

[6]

F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), Vol. 5, 4 (2015), 1--19.

[7]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[8]

Mingsheng Long, Zhangjie Cao, Jianmin Wang, and Philip S Yu. 2015. Learning multiple tasks with multilinear relationship networks. arXiv preprint arXiv:1506.02117 (2015).

[9]

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H Chi. 2018. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1930--1939.

Digital Library

[10]

Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3994--4003.

[11]

Zhen Qin, Yicheng Cheng, Zhe Zhao, Zhe Chen, Donald Metzler, and Jingzheng Qin. 2020. Multitask mixture of sequential experts for user activity streams. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3083--3091.

Digital Library

[12]

Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International Conference on Data Mining. IEEE, 995--1000.

Digital Library

[13]

Sebastian Ruder. 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017).

[14]

Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, and Anders Søgaard. 2017. Sluice networks: Learning what to share between loosely related tasks. arXiv preprint arXiv:1705.08142 (2017).

[15]

Hongyan Tang, Junning Liu, Ming Zhao, and Xudong Gong. 2020. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In Fourteenth ACM Conference on Recommender Systems. 269--278.

Digital Library

[16]

Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 242--264.

[17]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).

Cited By

Zhou YLi QChu HLi JWei BZhang SHan J(2025)Spatiotemporal-view member preference contrastive representation learning for group recommendationMachine Learning10.1007/s10994-024-06655-3114:3Online publication date: 11-Feb-2025
https://doi.org/10.1007/s10994-024-06655-3
Fu CWang KWu JChen YHuzhang GNi YZeng AZhou ZBaeza-Yates RBonchi F(2024)Residual Multi-Task Learner for Applied RankingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671523(4974-4985)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671523
Chen XSong XWei YNie LChua TChen HDuh WHuang HKato MMothe JPoblete B(2023)Dual Semantic Knowledge Composed Multimodal Dialog SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591673(1518-1527)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591673

Index Terms

Hierarchical Task-aware Multi-Head Attention Network
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
    2. Machine learning approaches
      1. Neural networks
2. Information systems
  1. World Wide Web
    1. Web searching and information discovery
      1. Social recommendation

Recommendations

Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Neural-based multi-task learning has been successfully used in many real-world large-scale applications such as recommendation systems. For example, in movie recommendations, beyond providing users movies which they tend to purchase and watch, the ...
Deep Multi-task Augmented Feature Learning via Hierarchical Graph Neural Network
Machine Learning and Knowledge Discovery in Databases. Research Track
Abstract
Deep multi-task learning attracts much attention in recent years as it achieves good performance in many applications. Feature learning is important to deep multi-task learning for sharing common information among tasks. In this paper, we propose ...
Multi-task neural networks by learned contextual inputs
Abstract
This paper explores learned-context neural networks. It is a multi-task learning architecture based on a fully shared neural network and an augmented input vector containing trainable task parameters. The architecture is interesting due to its ...
Highlights
- Complex and flexible task adaptation with few task parameters.
- Learned task parameters lend themselves to analysis.
- Strong performance in cases with few data points.
- Theoretical analysis of task adaptation mechanism.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Chinese Scholarship Council

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
328
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)4

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou YLi QChu HLi JWei BZhang SHan J(2025)Spatiotemporal-view member preference contrastive representation learning for group recommendationMachine Learning10.1007/s10994-024-06655-3114:3Online publication date: 11-Feb-2025
https://doi.org/10.1007/s10994-024-06655-3
Fu CWang KWu JChen YHuzhang GNi YZeng AZhou ZBaeza-Yates RBonchi F(2024)Residual Multi-Task Learner for Applied RankingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671523(4974-4985)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671523
Chen XSong XWei YNie LChua TChen HDuh WHuang HKato MMothe JPoblete B(2023)Dual Semantic Knowledge Composed Multimodal Dialog SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591673(1518-1527)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591673

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten