research-article

DOMFN: A Divergence-Orientated Multi-Modal Fusion Network for Resume Assessment

Authors:
Yang Yang

Nanjing University of Science and Technology & MIIT Key Lab. of Pattern Analysis and Machine Intelligence/State Key Lab. for Novel Software Technology, Nanjing University, Nanjing, China

Nanjing University of Science and Technology & MIIT Key Lab. of Pattern Analysis and Machine Intelligence/State Key Lab. for Novel Software Technology, Nanjing University, Nanjing, China
View Profile

,
Jingshuai Zhang

Baidu Inc, Bejing, China

Baidu Inc, Bejing, China
View Profile

,
Fan Gao

Tokyo Institute of Technology, Tokyo, Japan

Tokyo Institute of Technology, Tokyo, Japan
View Profile

,
Xiaoru Gao

Rutgers University, Newark, NJ, USA

Rutgers University, Newark, NJ, USA
View Profile

,
Hengshu Zhu

Baidu Inc, Beijing, China

Baidu Inc, Beijing, China
View Profile

MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022Pages 1612–1620https://doi.org/10.1145/3503161.3548203

Published:10 October 2022Publication History

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 1612–1620

ABSTRACT

In talent management, resume assessment aims to analyze the quality of a job seeker's resume, which can assist recruiters to discover suitable candidates and benefit job seekers improving resume quality in return. Recent machine learning based methods on large-scale public resume datasets have provided the opportunity for automatic assessment for reducing manual costs. However, most existing approaches are still content-dominated and ignore other valuable information. Inspired by practical resume evaluations that consider both the content and layout, we construct the multi-modalities from resumes but face a new challenge that sometimes the performance of multi-modal fusion is even worse than the best uni-modality. In this paper, we experimentally find that this phenomenon is due to the cross-modal divergence. Therefore, we need to consider when is it appropriate to perform multi-modal fusion? To address this problem, we design an instance-aware fusion method, i.e., Divergence-Orientated Multi-Modal Fusion Network (DOMFN), which can adaptively fuse the uni-modal predictions and multi-modal prediction based on cross-modal divergence. Specifically, DOMFN computes a functional penalty score to measure the divergence of cross-modal predictions. Then, the learned divergence can be used to decide whether to conduct multi-modal fusion and be adopted into an amended loss for reliable training. Consequently, DOMFN rejects multi-modal prediction when the cross-modal divergence is too large, avoiding the overall performance degradation, so as to achieve better performance than uni-modalities. In experiments, qualitative comparison with baselines on real-world dataset demonstrates the superiority and explainability of the proposed DOMFN, e.g., we find a meaningful phenomenon that multi-modal fusion has positive effects for assessing resumes from UI Designer and Enterprise Service positions, whereas affects the assessment of Technology and Product Operation positions.

Supplemental Material

Available for Download

mp4

MM22-fp1869.mp4 (100.8 MB)

References

Jan Ketil Arnulf, Lisa Tegner, and Øyunn Larssen. 2010. Impression making by résumé layout: Its impact on the probability of being shortlisted. European Journal of Work and Organizational Psychology, Vol. 19, 2 (2010), 221--230.Google ScholarCross Ref
Tadas Baltru"aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2019. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 2 (2019), 423--443.Google ScholarDigital Library
Gavin Brown, Jeremy L. Wyatt, and Peter Ti n o. 2005. Managing Diversity in Regression Ensembles. J. Mach. Learn. Res., Vol. 6 (2005), 1621--1650.Google ScholarDigital Library
Kyunghyun Cho, B van Merrienboer, Caglar Gulcehre, F Bougares, H Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Doha, Qatar, 1724--1734.Google ScholarCross Ref
Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2019. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In Proceedings of the International Conference on Learning Representations. Addis Ababa, Ethiopia.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Minneapolis, 4171--4186.Google Scholar
Jeffrey L. Elman. 1990. Finding Structure in Time. Cognitive Science, Vol. 14, 2 (1990), 179--211.Google ScholarCross Ref
Ben Greiner. 2004. An online recruitment system for economic experiments. (2004).Google Scholar
Wei Han, Hui Chen, and Soujanya Poria. 2021a. Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Punta Cana, Dominican, 9180--9192.Google ScholarCross Ref
Zongbo Han, Changqing Zhang, Huazhu Fu, and Joey Tianyi Zhou. 2021b. Trusted Multi-View Classification. In Proceedings of the International Conference on Learning Representations. Virtual Event.Google Scholar
Zongbo Han, Changqing Zhang, Huazhu Fu, and Joey Tianyi Zhou. 2021c. Trusted Multi-View Classification. In Proceedings of the International Conference on Learning Representations. Austria.Google Scholar
Christopher G Harris. 2017. Finding the best job applicants for a job posting: A comparison of human resources search strategies. In 2017 IEEE International Conference on Data Mining Workshops. IEEE, New Orleans, 189--194.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas, 770--778.Google ScholarCross Ref
Jack Hessel and Lillian Lee. 2020. Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Virtual Event, 861--877.Google ScholarCross Ref
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.Google Scholar
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).Google Scholar
Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L Iuzzolino, and Kazuhito Koishida. 2020. MMTM: Multimodal transfer module for CNN fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 13289--13299.Google ScholarCross Ref
Zhen-zhong Lan, Lei Bao, Shoou-I Yu, Wei Liu, and Alexander G Hauptmann. 2014. Multimedia classification and event detection using double fusion. Multimedia tools and applications, Vol. 71, 1 (2014), 333--347.Google Scholar
Hao Lin, Hengshu Zhu, Yuan Zuo, Chen Zhu, Junjie Wu, and Hui Xiong. 2017. Collaborative Company Profiling: Insights from an Employee's Perspective. In Proceedings of the AAAI Conference on Artificial Intelligence. San Francisco, California, 1417--1423.Google ScholarCross Ref
Yong Liu and Xin Yao. 1999. Ensemble learning via negative correlation. Neural networks, Vol. 12, 10 (1999), 1399--1404.Google ScholarDigital Library
Zhiyuan Liu, Yankai Lin, and Maosong Sun. 2020. Representation learning for natural language processing. Springer Nature.Google Scholar
Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, AmirAli Bagher Zadeh, and Louis-Philippe Morency. 2018. Efficient Low-rank Multimodal Fusion With Modality-Specific Factors. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia, 2247--2256.Google ScholarCross Ref
Yong Luo, Huaizheng Zhang, Yongjie Wang, Yonggang Wen, and Xinwen Zhang. 2018. ResumeNet: A learning-based framework for automatic resume quality assessment. In Proceedings of the IEEE International Conference on Data Mining. Singapore, 307--316.Google ScholarCross Ref
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, Vol. 26 (2013).Google ScholarDigital Library
Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun. 2021. Attention bottlenecks for multimodal fusion. Advances in Neural Information Processing Systems, Vol. 34 (2021).Google Scholar
Chuan Qin, Hengshu Zhu, Tong Xu, Chen Zhu, Chao Ma, Enhong Chen, and Hui Xiong. 2020. An Enhanced Neural Network Approach to Person-Job Fit in Talent Recruitment. ACM Trans. Inf. Syst., Vol. 38, 2 (2020), 15:1--15:33.Google ScholarDigital Library
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018).Google Scholar
Dazhong Shen, Hengshu Zhu, Chen Zhu, Tong Xu, Chao Ma, and Hui Xiong. 2018. A Joint Learning Approach to Intelligent Job Interview Assessment. In Proceedings of the International Joint Conference on Artificial Intelligence. Stockholm, Sweden, 3542--3548.Google ScholarCross Ref
Ekaterina Shutova, Douwe Kiela, and Jean Maillard. 2016. Black holes and white rabbits: Metaphor identification with visual features. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. San Diego, 160--170.Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition., 14 pages.Google Scholar
Amit Singh, Catherine Rose, Karthik Visweswariah, Vijil Chenthamarakshan, and Nandakishore Kambhatla. 2010. PROSPECT: a system for screening candidates for recruitment. In Proceedings of the ACM International Conference on Information and Knowledge Management. Toronto, Ontario, Canada, 659--668.Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017), 5998--6008.Google Scholar
Pengfei Wang, Chengquan Zhang, Fei Qi, Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, and Guangming Shi. 2021. PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network. In Proceedings of the AAAI Conference on Artificial Intelligence. Virtual Event, 2782--2790.Google ScholarCross Ref
Weiyao Wang, Du Tran, and Matt Feiszli. 2020. What makes training multi-modal classification networks hard?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 12695--12705.Google ScholarCross Ref
Yanzhao Wu, Ling Liu, Zhongwei Xie, Ka-Ho Chow, and Wenqi Wei. 2021. Boosting Ensemble Accuracy by Revisiting Ensemble Diversity Metrics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Event, 16469--16477.Google ScholarCross Ref
Nan Xu, Wenji Mao, and Guandan Chen. 2019. Multi-interactive memory network for aspect based multimodal sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 371--378.Google ScholarDigital Library
Zhen Xu, David R So, and Andrew M Dai. 2021. MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. Virtual Event, 10532--10540.Google ScholarCross Ref
Rui Yan, Ran Le, Yang Song, Tao Zhang, Xiangliang Zhang, and Dongyan Zhao. 2019. Interview choice reveals your preference on the market: to improve job-resume matching through profiling memories. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Anchorage, 914--922.Google ScholarDigital Library
Yang Yang, Ke-Tao Wang, De-Chuan Zhan, Hui Xiong, and Yuan Jiang. 2019. Comprehensive Semi-Supervised Multi-Modal Learning.. In Proceedings of the International Joint Conference on Artificial Intelligence. Macao, China, 4092--4098.Google ScholarCross Ref
Yang Yang, Yi-Feng Wu, De-Chuan Zhan, Zhi-Bin Liu, and Yuan Jiang. 2018. Complex Object Classification: A Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, UK, 2594--2603.Google ScholarDigital Library
Yang Yang, De-Chuan Zhan, Ying Fan, and Yuan Jiang. 2017a. Instance Specific Discriminative Modal Pursuit: A Serialized Approach. In Proceedings of The 9th Asian Conference on Machine Learning. Seoul, Korea, 65--80.Google Scholar
Yang Yang, De-Chuan Zhan, Ying Fan, Yuan Jiang, and Zhi-Hua Zhou. 2017b. Deep Learning for Fixed Model Reuse. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. San Francisco, California, 2831--2837.Google ScholarDigital Library
Yang Yang, De-Chuan Zhan, Yi-Feng Wu, Zhi-Bin Liu, Hui Xiong, and Yuan Jiang. 2021. Semi-Supervised Multi-Modal Clustering and Classification with Incomplete Modalities. IEEE Trans. Knowl. Data Eng., Vol. 33, 2 (2021), 682--695.Google ScholarDigital Library
Chen Zhang and Hao Wang. 2018. Resumevis: A visual analytics system to discover semantic information in semi-structured resume data. ACM Transactions on Intelligent Systems and Technology, Vol. 10, 1 (2018), 1--25.Google ScholarDigital Library
Chao Zhang, Zichao Yang, Xiaodong He, and Li Deng. 2020. Multimodal intelligence: Representation learning, information fusion, and applications. IEEE Journal of Selected Topics in Signal Processing, Vol. 14, 3 (2020), 478--493.Google ScholarCross Ref
Le Zhang, Zenglin Shi, Ming-Ming Cheng, Yun Liu, Jia-Wang Bian, Joey Tianyi Zhou, Guoyan Zheng, and Zeng Zeng. 2019. Nonlinear regression via deep negative correlation learning. IEEE transactions on pattern analysis and machine intelligence, Vol. 43, 3 (2019), 982--998.Google Scholar

Index Terms

DOMFN: A Divergence-Orientated Multi-Modal Fusion Network for Resume Assessment
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification

Recommendations

See, move and hear: a local-to-global multi-modal interaction network for video action recognition
Abstract
Visual and audio signals are concurrent and complementary types of modality in some video actions. A single visual modality limits the performance of video action recognition due to the similar appearance with subtle movements, such as tapping ...
Read More
Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality
Medical Image Computing and Computer Assisted Intervention – MICCAI 2023
Abstract
The problem of missing modalities is both critical and non-trivial to be handled in multi-modal models. It is common for multi-modal tasks that certain modalities contribute more compared to other modalities, and if those important modalities are ...
Read More
MSAFusionNet: Multiple Subspace Attention Based Deep Multi-modal Fusion Network
Machine Learning in Medical Imaging
Abstract
It is common for doctors to simultaneously consider multi-modal information in diagnosis. However, how to use multi-modal medical images effectively has not been fully studied in the field of deep learning within such a context. In this paper, we ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
instance-aware fusion
multi-modal learning
resume assessment
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 213
  Total Downloads
- Downloads (Last 12 months)119
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DOMFN: A Divergence-Orientated Multi-Modal Fusion Network for Resume Assessment

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

See, move and hear: a local-to-global multi-modal interaction network for video action recognition

Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality

MSAFusionNet: Multiple Subspace Attention Based Deep Multi-modal Fusion Network