skip to main content
10.1145/3581783.3612558acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Data-Scarce Animal Face Alignment via Bi-Directional Cross-Species Knowledge Transfer

Published: 27 October 2023 Publication History

Abstract

Animal face alignment is challenging due to large intra- and inter-species variations and a scarcity of labeled data. Existing studies circumvent this problem by directly finetuning a human face alignment model or focusing on animal-specific face alignment~(e.g., horse, sheep). In this paper, we propose Cross-Species Knowledge Transfer, Meta-CSKT, for animal face alignment, which consists of a base network and an adaptation network. Two networks continuously complement each other through the bi-directional cross-species knowledge transfer. This is motivated by observing knowledge sharing among animals. Meta-CSKT uses a circuit feedback mechanism to improve the base network with the cognitive differences of the adaptation network between few-shot labeled and large-scale unlabeled data. In addition, we propose a positive example mining method to identify positives, semi-hard positives, and hard negatives in unlabeled data to mitigate the scarcity of labeled data and facilitate Meta-CSKT learning. Experiments show that Meta-CSKT outperforms state-of-the-art methods by a large margin on the horse facial keypoint dataset and Japanese Macaque Species dataset, while achieving comparable results to state-of-the-art methods on large-scale labeled AnimalWeb~(e.g., 18K), using only a few labeled images~(e.g., 40)1.

Supplemental Material

MP4 File
Despite the promising applications and significant potential impact of animal face alignment, it remains largely unexplored. In this paper, we propose the Meta-CSKT for data-scarce animal face alignment which is motivated by two observations: i) the knowledge sharing among animals, and ii) the predicted accuracy revealed by landmark shifts between human and animal models. The first observation motivates bi-directional cross-species knowledge transfer between labeled few-shot and large-scale unlabeled animals. The second observation motivates positive example mining to utilize the unlabeled data effectively. Extensive experiments on three datasets demonstrate the superiority of our method for animal face alignment with only a few labeled images.

References

[1]
Gaddi Blumrosen, David Hawellek, and Bijan Pesaran. 2017. Towards automated recognition of facial expressions in animal models. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 2810--2819.
[2]
Adrian Bulat and Georgios Tzimiropoulos. 2017. How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In Proceedings of the IEEE International Conference on Computer Vision. 1021--1030.
[3]
Lisha Chen, Hui Su, and Qiang Ji. 2019. Face alignment with kernel density deep neural network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6992--7002.
[4]
Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 702--703.
[5]
Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. 2020. Retinaface: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5203--5212.
[6]
Jiankang Deng, Anastasios Roussos, Grigorios Chrysos, Evangelos Ververas, Irene Kotsia, Jie Shen, and Stefanos Zafeiriou. 2019. The menpo benchmark for multi-pose 2d and 3d facial landmark localisation and tracking. International Journal of Computer Vision, Vol. 127, 6 (2019), 599--624.
[7]
David Ferman and Gaurav Bharaj. 2022. Multi-domain Multi-definition Landmark Localization for Small Datasets. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part IX. Springer, 646--663.
[8]
Ivan Gogić, Jörgen Ahlberg, and Igor S Pandvz ić. 2021. Regression-based methods for face alignment: A survey. Signal Processing, Vol. 178 (2021), 107755.
[9]
Charlie Hewitt and Marwa Mahmoud. 2019. Pose-informed face alignment for extreme head pose variations in animals. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 1--6.
[10]
Xin Jin and Xiaoyang Tan. 2017. Face alignment in-the-wild: A survey. Computer Vision and Image Understanding, Vol. 162 (2017), 1--22.
[11]
Amin Jourabloo and Xiaoming Liu. 2016. Large-pose face alignment via CNN-based dense 3D model fitting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4188--4196.
[12]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110--8119.
[13]
Muhammad Haris Khan, John McDonagh, Salman Khan, Muhammad Shahabuddin, Aditya Arora, Fahad Shahbaz Khan, Ling Shao, and Georgios Tzimiropoulos. 2020. Animalweb: A large-scale hierarchical dataset of annotated animal faces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6939--6948.
[14]
Martin Koestinger, Paul Wohlhart, Peter M Roth, and Horst Bischof. 2011. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In 2011 IEEE International Conference on Computer Vision Workshops. IEEE, 2144--2151.
[15]
Marek Kowalski, Jacek Naruniec, and Tomasz Trzcinski. 2017. Deep alignment network: A convolutional neural network for robust face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 88--97.
[16]
Hanjiang Lai, Shengtao Xiao, Yan Pan, Zhen Cui, Jiashi Feng, Chunyan Xu, Jian Yin, and Shuicheng Yan. 2016. Deep recurrent regression for facial landmark detection. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 28, 5 (2016), 1144--1157.
[17]
Dong-Hyun Lee et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, Vol. 3. 896.
[18]
Hao Liu, Jiwen Lu, Jianjiang Feng, and Jie Zhou. 2017. Learning deep sharable and structural detectors for face alignment. IEEE Transactions on Image Processing, Vol. 26, 4 (2017), 1666--1678.
[19]
Zhiwei Liu, Xiangyu Zhu, Guosheng Hu, Haiyun Guo, Ming Tang, Zhen Lei, Neil M Robertson, and Jinqiao Wang. 2019. Semantic alignment: Finding semantically consistent ground-truth for facial landmark detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3467--3476.
[20]
Hieu Pham, Zihang Dai, Qizhe Xie, and Quoc V Le. 2021. Meta pseudo labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11557--11568.
[21]
Maheen Rashid. 2021. Towards Automatic Visual Recognition of Horse Pain. Ph.,D. Dissertation. University of California, Davis.
[22]
Maheen Rashid, Sofia Broomé, Katrina Ask, Elin Hernlund, Pia Haubro Andersen, Hedvig Kjellström, and Yong Jae Lee. 2022. Equine Pain Behavior Classification via Self-Supervised Disentangled Pose Representation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1646--1656.
[23]
Maheen Rashid, Xiuye Gu, and Yong Jae Lee. 2017. Interspecies knowledge transfer for facial keypoint detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6894--6903.
[24]
Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 faces in-the-wild challenge: The first facial landmark localization challenge. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 397--403.
[25]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.
[26]
Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. 2020. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems, Vol. 33 (2020), 596--608.
[27]
Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5693--5703.
[28]
Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep convolutional network cascade for facial point detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3476--3483.
[29]
George Trigeorgis, Patrick Snape, Mihalis A Nicolaou, Epameinondas Antonakos, and Stefanos Zafeiriou. 2016. Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4177--4187.
[30]
Alzbeta Vlachynska, Zuzana Kominkova Oplatkova, and Tomas Turecek. 2018. Dogface detection and localization of dogface's landmarks. In Computer Science On-line Conference. Springer, 465--476.
[31]
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, et al. 2020. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 43, 10 (2020), 3349--3364.
[32]
Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at boundary: A boundary-aware face alignment algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2129--2138.
[33]
Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. 2020a. Unsupervised data augmentation for consistency training. Advances in Neural Information Processing Systems, Vol. 33 (2020), 6256--6268.
[34]
Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V Le. 2020b. Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10687--10698.
[35]
Pengfei Xiong, Guoqing Li, and Yuhang Sun. 2017. Combining local and global features for 3D face tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 2529--2536.
[36]
Lumin Xu, Sheng Jin, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, and Xiaogang Wang. 2022a. Pose for Everything: Towards Category-Agnostic Pose Estimation. In Computer Vision--ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part VI. Springer, 398--416.
[37]
Yufei Xu, Jing Zhang, Qiming Zhang, and Dacheng Tao. 2022b. ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation. In Advances in Neural Information Processing Systems.
[38]
Heng Yang, Renqiao Zhang, and Peter Robinson. 2016. Human and sheep facial landmarks localisation by triplet interpolated features. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1--8.
[39]
Jinhai Yang, Hua Yang, and Lin Chen. 2020. Coarse-to-Fine Pseudo-Labeling Guided Meta-Learning for Few-Shot Classification. Technical report (2020).
[40]
Sejong Yang, Subin Jeon, Seonghyeon Nam, and Seon Joo Kim. 2022. Dense Interspecies Face Embedding. Advances in Neural Information Processing Systems, Vol. 35 (2022), 33275--33288.
[41]
Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen. 2014. Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment. In European Conference on Computer Vision. Springer, 1--16.
[42]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, Vol. 23, 10 (2016), 1499--1503.
[43]
Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z Li. 2016. Face alignment across large poses: A 3d solution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 146--155.

Cited By

View all
  • (2024)Generating Stylized Features for Single-Source Cross-Dataset Palmprint Recognition With Unseen Target DatasetIEEE Transactions on Image Processing10.1109/TIP.2024.345193333(4911-4922)Online publication date: 5-Sep-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. animal biometrics
  2. data scarce
  3. face alignment
  4. meta learning
  5. semi-supervised learning

Qualifiers

  • Research-article

Funding Sources

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)7
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Generating Stylized Features for Single-Source Cross-Dataset Palmprint Recognition With Unseen Target DatasetIEEE Transactions on Image Processing10.1109/TIP.2024.345193333(4911-4922)Online publication date: 5-Sep-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media