Abstract:
This article proposes semantic embedding for image transformers (SEiTs) to explore semantic features of facial morphology in the action unit (AU) detection task. The conv...Show MoreMetadata
Abstract:
This article proposes semantic embedding for image transformers (SEiTs) to explore semantic features of facial morphology in the action unit (AU) detection task. The conventional approaches typically rely on external information (e.g., facial landmarks) to obtain the location of facial components, whereas the SEiT can learn morphological features intrinsically from the face image. The pre-training task, namely semantic masked facial image modeling (SMFIM), aims to actively obtain facial morphological information. The pixels of the input facial image are randomly erased with semantic masks (e.g., nose, eyes, eyebrows, mouth, and lip). The embedding model tries to predict the presence of facial components for the input image that can learn semantic representations of the face simultaneously. The learned semantic embeddings are fed to transformer blocks, which enable global interaction between semantic elements. The SEiT integrates facial morphological information and global interaction characters, appropriate for AU detection. The experiments are conducted on the Binghamton-Pittsburgh 4D (BP4D) dataset and Denver intensity of spontaneous facial action (DISFA) dataset, and the results demonstrate the effectiveness of the proposed SEiT.
Published in: IEEE Transactions on Computational Social Systems ( Volume: 10, Issue: 3, June 2023)