Abstract:
Memory network has been extensively used to record prototypical normal patterns to prevent overgeneralization of the network to reconstruct anomalies for video anomaly de...Show MoreMetadata
Abstract:
Memory network has been extensively used to record prototypical normal patterns to prevent overgeneralization of the network to reconstruct anomalies for video anomaly detection. However, existing memory-based methods only record the lossy representation of normal item prototypes, without recording the rich relationships between them. In this work, we propose an Associative Memory with Spatio-Temporal Enhancement (AMSTE) which introduces the global context information constraint of motion to enhance the appearance features and learn the normal item prototypes and their relationship. Specifically, we utilize two encoders to extract spatio-temporal features with the Spatio-Temporal Enhancement Module (STEM) to enhance appearance features with global motion constraints. Then, the prototypical patterns of normal data and their relationships are recorded in the item memory and relational memory, respectively. Finally, we retrieve features from the memory pools and reconstruct the video frame through the decoder. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our approach.
Published in: IEEE Signal Processing Letters ( Volume: 30)