ABSTRACT
In talent management, resume assessment aims to analyze the quality of a job seeker's resume, which can assist recruiters to discover suitable candidates and benefit job seekers improving resume quality in return. Recent machine learning based methods on large-scale public resume datasets have provided the opportunity for automatic assessment for reducing manual costs. However, most existing approaches are still content-dominated and ignore other valuable information. Inspired by practical resume evaluations that consider both the content and layout, we construct the multi-modalities from resumes but face a new challenge that sometimes the performance of multi-modal fusion is even worse than the best uni-modality. In this paper, we experimentally find that this phenomenon is due to the cross-modal divergence. Therefore, we need to consider when is it appropriate to perform multi-modal fusion? To address this problem, we design an instance-aware fusion method, i.e., Divergence-Orientated Multi-Modal Fusion Network (DOMFN), which can adaptively fuse the uni-modal predictions and multi-modal prediction based on cross-modal divergence. Specifically, DOMFN computes a functional penalty score to measure the divergence of cross-modal predictions. Then, the learned divergence can be used to decide whether to conduct multi-modal fusion and be adopted into an amended loss for reliable training. Consequently, DOMFN rejects multi-modal prediction when the cross-modal divergence is too large, avoiding the overall performance degradation, so as to achieve better performance than uni-modalities. In experiments, qualitative comparison with baselines on real-world dataset demonstrates the superiority and explainability of the proposed DOMFN, e.g., we find a meaningful phenomenon that multi-modal fusion has positive effects for assessing resumes from UI Designer and Enterprise Service positions, whereas affects the assessment of Technology and Product Operation positions.
Supplemental Material
Available for Download
- Jan Ketil Arnulf, Lisa Tegner, and Øyunn Larssen. 2010. Impression making by résumé layout: Its impact on the probability of being shortlisted. European Journal of Work and Organizational Psychology, Vol. 19, 2 (2010), 221--230.Google ScholarCross Ref
- Tadas Baltru"aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2019. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 2 (2019), 423--443.Google ScholarDigital Library
- Gavin Brown, Jeremy L. Wyatt, and Peter Ti n o. 2005. Managing Diversity in Regression Ensembles. J. Mach. Learn. Res., Vol. 6 (2005), 1621--1650.Google ScholarDigital Library
- Kyunghyun Cho, B van Merrienboer, Caglar Gulcehre, F Bougares, H Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Doha, Qatar, 1724--1734.Google ScholarCross Ref
- Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2019. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In Proceedings of the International Conference on Learning Representations. Addis Ababa, Ethiopia.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Minneapolis, 4171--4186.Google Scholar
- Jeffrey L. Elman. 1990. Finding Structure in Time. Cognitive Science, Vol. 14, 2 (1990), 179--211.Google ScholarCross Ref
- Ben Greiner. 2004. An online recruitment system for economic experiments. (2004).Google Scholar
- Wei Han, Hui Chen, and Soujanya Poria. 2021a. Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Punta Cana, Dominican, 9180--9192.Google ScholarCross Ref
- Zongbo Han, Changqing Zhang, Huazhu Fu, and Joey Tianyi Zhou. 2021b. Trusted Multi-View Classification. In Proceedings of the International Conference on Learning Representations. Virtual Event.Google Scholar
- Zongbo Han, Changqing Zhang, Huazhu Fu, and Joey Tianyi Zhou. 2021c. Trusted Multi-View Classification. In Proceedings of the International Conference on Learning Representations. Austria.Google Scholar
- Christopher G Harris. 2017. Finding the best job applicants for a job posting: A comparison of human resources search strategies. In 2017 IEEE International Conference on Data Mining Workshops. IEEE, New Orleans, 189--194.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas, 770--778.Google ScholarCross Ref
- Jack Hessel and Lillian Lee. 2020. Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Virtual Event, 861--877.Google ScholarCross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.Google Scholar
- Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).Google Scholar
- Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L Iuzzolino, and Kazuhito Koishida. 2020. MMTM: Multimodal transfer module for CNN fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 13289--13299.Google ScholarCross Ref
- Zhen-zhong Lan, Lei Bao, Shoou-I Yu, Wei Liu, and Alexander G Hauptmann. 2014. Multimedia classification and event detection using double fusion. Multimedia tools and applications, Vol. 71, 1 (2014), 333--347.Google Scholar
- Hao Lin, Hengshu Zhu, Yuan Zuo, Chen Zhu, Junjie Wu, and Hui Xiong. 2017. Collaborative Company Profiling: Insights from an Employee's Perspective. In Proceedings of the AAAI Conference on Artificial Intelligence. San Francisco, California, 1417--1423.Google ScholarCross Ref
- Yong Liu and Xin Yao. 1999. Ensemble learning via negative correlation. Neural networks, Vol. 12, 10 (1999), 1399--1404.Google ScholarDigital Library
- Zhiyuan Liu, Yankai Lin, and Maosong Sun. 2020. Representation learning for natural language processing. Springer Nature.Google Scholar
- Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, AmirAli Bagher Zadeh, and Louis-Philippe Morency. 2018. Efficient Low-rank Multimodal Fusion With Modality-Specific Factors. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia, 2247--2256.Google ScholarCross Ref
- Yong Luo, Huaizheng Zhang, Yongjie Wang, Yonggang Wen, and Xinwen Zhang. 2018. ResumeNet: A learning-based framework for automatic resume quality assessment. In Proceedings of the IEEE International Conference on Data Mining. Singapore, 307--316.Google ScholarCross Ref
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, Vol. 26 (2013).Google ScholarDigital Library
- Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun. 2021. Attention bottlenecks for multimodal fusion. Advances in Neural Information Processing Systems, Vol. 34 (2021).Google Scholar
- Chuan Qin, Hengshu Zhu, Tong Xu, Chen Zhu, Chao Ma, Enhong Chen, and Hui Xiong. 2020. An Enhanced Neural Network Approach to Person-Job Fit in Talent Recruitment. ACM Trans. Inf. Syst., Vol. 38, 2 (2020), 15:1--15:33.Google ScholarDigital Library
- Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018).Google Scholar
- Dazhong Shen, Hengshu Zhu, Chen Zhu, Tong Xu, Chao Ma, and Hui Xiong. 2018. A Joint Learning Approach to Intelligent Job Interview Assessment. In Proceedings of the International Joint Conference on Artificial Intelligence. Stockholm, Sweden, 3542--3548.Google ScholarCross Ref
- Ekaterina Shutova, Douwe Kiela, and Jean Maillard. 2016. Black holes and white rabbits: Metaphor identification with visual features. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. San Diego, 160--170.Google ScholarCross Ref
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition., 14 pages.Google Scholar
- Amit Singh, Catherine Rose, Karthik Visweswariah, Vijil Chenthamarakshan, and Nandakishore Kambhatla. 2010. PROSPECT: a system for screening candidates for recruitment. In Proceedings of the ACM International Conference on Information and Knowledge Management. Toronto, Ontario, Canada, 659--668.Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017), 5998--6008.Google Scholar
- Pengfei Wang, Chengquan Zhang, Fei Qi, Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, and Guangming Shi. 2021. PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network. In Proceedings of the AAAI Conference on Artificial Intelligence. Virtual Event, 2782--2790.Google ScholarCross Ref
- Weiyao Wang, Du Tran, and Matt Feiszli. 2020. What makes training multi-modal classification networks hard?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, 12695--12705.Google ScholarCross Ref
- Yanzhao Wu, Ling Liu, Zhongwei Xie, Ka-Ho Chow, and Wenqi Wei. 2021. Boosting Ensemble Accuracy by Revisiting Ensemble Diversity Metrics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual Event, 16469--16477.Google ScholarCross Ref
- Nan Xu, Wenji Mao, and Guandan Chen. 2019. Multi-interactive memory network for aspect based multimodal sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 371--378.Google ScholarDigital Library
- Zhen Xu, David R So, and Andrew M Dai. 2021. MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. Virtual Event, 10532--10540.Google ScholarCross Ref
- Rui Yan, Ran Le, Yang Song, Tao Zhang, Xiangliang Zhang, and Dongyan Zhao. 2019. Interview choice reveals your preference on the market: to improve job-resume matching through profiling memories. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Anchorage, 914--922.Google ScholarDigital Library
- Yang Yang, Ke-Tao Wang, De-Chuan Zhan, Hui Xiong, and Yuan Jiang. 2019. Comprehensive Semi-Supervised Multi-Modal Learning.. In Proceedings of the International Joint Conference on Artificial Intelligence. Macao, China, 4092--4098.Google ScholarCross Ref
- Yang Yang, Yi-Feng Wu, De-Chuan Zhan, Zhi-Bin Liu, and Yuan Jiang. 2018. Complex Object Classification: A Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, UK, 2594--2603.Google ScholarDigital Library
- Yang Yang, De-Chuan Zhan, Ying Fan, and Yuan Jiang. 2017a. Instance Specific Discriminative Modal Pursuit: A Serialized Approach. In Proceedings of The 9th Asian Conference on Machine Learning. Seoul, Korea, 65--80.Google Scholar
- Yang Yang, De-Chuan Zhan, Ying Fan, Yuan Jiang, and Zhi-Hua Zhou. 2017b. Deep Learning for Fixed Model Reuse. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. San Francisco, California, 2831--2837.Google ScholarDigital Library
- Yang Yang, De-Chuan Zhan, Yi-Feng Wu, Zhi-Bin Liu, Hui Xiong, and Yuan Jiang. 2021. Semi-Supervised Multi-Modal Clustering and Classification with Incomplete Modalities. IEEE Trans. Knowl. Data Eng., Vol. 33, 2 (2021), 682--695.Google ScholarDigital Library
- Chen Zhang and Hao Wang. 2018. Resumevis: A visual analytics system to discover semantic information in semi-structured resume data. ACM Transactions on Intelligent Systems and Technology, Vol. 10, 1 (2018), 1--25.Google ScholarDigital Library
- Chao Zhang, Zichao Yang, Xiaodong He, and Li Deng. 2020. Multimodal intelligence: Representation learning, information fusion, and applications. IEEE Journal of Selected Topics in Signal Processing, Vol. 14, 3 (2020), 478--493.Google ScholarCross Ref
- Le Zhang, Zenglin Shi, Ming-Ming Cheng, Yun Liu, Jia-Wang Bian, Joey Tianyi Zhou, Guoyan Zheng, and Zeng Zeng. 2019. Nonlinear regression via deep negative correlation learning. IEEE transactions on pattern analysis and machine intelligence, Vol. 43, 3 (2019), 982--998.Google Scholar
Index Terms
- DOMFN: A Divergence-Orientated Multi-Modal Fusion Network for Resume Assessment
Recommendations
See, move and hear: a local-to-global multi-modal interaction network for video action recognition
AbstractVisual and audio signals are concurrent and complementary types of modality in some video actions. A single visual modality limits the performance of video action recognition due to the similar appearance with subtle movements, such as tapping ...
Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality
Medical Image Computing and Computer Assisted Intervention – MICCAI 2023AbstractThe problem of missing modalities is both critical and non-trivial to be handled in multi-modal models. It is common for multi-modal tasks that certain modalities contribute more compared to other modalities, and if those important modalities are ...
MSAFusionNet: Multiple Subspace Attention Based Deep Multi-modal Fusion Network
Machine Learning in Medical ImagingAbstractIt is common for doctors to simultaneously consider multi-modal information in diagnosis. However, how to use multi-modal medical images effectively has not been fully studied in the field of deep learning within such a context. In this paper, we ...
Comments