Hierarchical Spatial–Temporal Window Transformer for Pose-Based Rodent Behavior Recognition | IEEE Journals & Magazine | IEEE Xplore

Hierarchical Spatial–Temporal Window Transformer for Pose-Based Rodent Behavior Recognition


Abstract:

In the fields of neuroscience and pharmacology, understanding rodent behavior is of vital importance for studying the effects of genetic operations and pharmacological th...Show More

Abstract:

In the fields of neuroscience and pharmacology, understanding rodent behavior is of vital importance for studying the effects of genetic operations and pharmacological therapies. Conventional behavior recognition methods based on raw images often struggle with noise, such as changes in the lighting conditions and the image backgrounds. On the other hand, pose-based approaches have demonstrated robustness against these challenges. However, existing methods rely on manually constructed features, which are time-consuming and may not fully exploit the potential of the pose data. In this work, we propose the hierarchical spatial–temporal window transformer network (HSTWFormer), a novel approach that efficiently extracts multiscale and cross-spacetime features from rodent pose data. By adopting a pure Transformer structure, HSTWFormer not only avoids the need for a predefined skeletal topology, but also enables adaptive recognition of interactive behaviors between multiple rodents. By merging the features of temporal neighbors, we construct a hierarchical structure with different receptive fields that retain essential information of all scales, enabling the extraction of semantic features from low to high level. Furthermore, a spatial–temporal window attention (STWA) block is introduced to capture correlations between different key points across frames. The STWA blocks facilitate the extraction of both short-term and long-term cross-spacetime features by enabling interactions between window information through window shifting, enhancing the network’s modeling performance. The effectiveness of the proposed HSTWFormer is demonstrated on two datasets, CRIM13 and CalMS21. We achieved accuracies of 79.3% and 69.8% for interactive and overall behaviors in the CRIM13 dataset, and 76.4% accuracy in the CalMS21 dataset. Our method harnesses the wealth of information embedded in key points, showcasing robust modeling capabilities for accurate rodent behavior recognition, and provid...
Article Sequence Number: 2512914
Date of Publication: 18 March 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.