Context-based abnormal object detection using the fully-connected conditional random fields
Introduction
Contextual information plays an important role in computer vision, especially, in object detection and semantic segmentation fields [1], [2], [3], [4]. Using the contextual information, object detection and semantic segmentation have made a quantum leap. This is because the contextual information provides the information of whether a scene and an object are related to each other using the co-occurrence relationship between objects within a scene, or relative position and scale of objects. For example, there is a sofa in a living room in the indoor scene, but a car does not appear. Another advantage of using the contextual information such as the co-occurrence, relative position, and relative scale relationships between the objects helps to interpret a scene, and to remove false positives [5].
However, there still remains a difficulty, especially, in detecting and recognizing of abnormal objects that are unexpected objects in a scene. Since most of the existing object detection models focus on detecting normal objects to increase the performance of object detection by simply considering the normal contextual information, they cannot detect abnormal objects. Using the information that abnormal objects have small context scores, abnormal objects can be detected in a single image. The contextual information helps to detect abnormal objects.
Only a few papers on abnormal object detection have been proposed using the contextual information [5], [6], [7], [8]. Choi et al. proposed a tree-based context model via latent co-occurrence and support tree structures [5,6]. The tree structure has a limit in expressing the fully-connected relationships between objects because of considering the parent-child relationships only. Park et al. proposed a generative model to generate both normal and abnormal objects using the contextual information of the canonical scene [7], which represents the configuration of the normal objects such as location distributions of objects in a scene. The drawback of the canonical scene takes account of only the location distribution of the object itself in the scene, without considering the relationships between objects. The previous work does not consider all the relationships between objects. Cao et al. proposed a high-order contextual descriptor that incorporates the contextual information such as semantic, spatial, and scale contexts [8]. Finding the fully-connected pairwise links between the detected objects, or high-order interaction, this method can provide dependencies among objects in an image and detect out-of-context objects. As in [5], they used the Gaussian distributions to take account of the relative position using the vertical position and depth information. However, the Gaussian distributions do not adequately describe the geometric relationships between objects. In contrast to Cao et al.’s method, we consider the geometric information using the support relationship as in [6].
Other abnormal object detection methods have been proposed [9], [10]. Saleh et al. proposed object-centric anomaly detection using attributes of the objects such as part attributes [11], shape, and color [9]. They extended attributed-based reasoning to scene-centric, context-centric, and object-centric reasoning for abnormality [10]. They also classified taxonomy depending on the reasons of abnormality in images such as scene-centric, context-centric, and object-centric reasoning. However, the proposed model does not focus on detecting or classifying object-centric abnormal objects.
Note that this paper is inspired by the fully-connected conditional random fields (CRFs) [12], which was used for object detection [13], [14] and semantic segmentation [15], [16], [17]. Nematollahi et al. proposed a new context-based fully-connected CRF model by adding a hidden node that describes the overall context of an image to segment an image semantically [15]. Our goal is to label object candidates as normal or abnormal objects, rather than to label the pixel as in semantic segmentation.
In this paper, we propose a new approach to abnormal object detection. Unlike existing models, the proposed model takes account of the relationships between objects and object-scene as the fully-connected relationships for the co-occurrence and support relationships. The proposed model solves the abnormal object detection problem by inferring the labels of object candidates, e.g., normal or abnormal objects. To this end, the proposed model consists of two steps. First, we use an off-the-shelf detector such as a deformable part model (DPM) [1] to generate a pool of object candidates. Then, considering dependencies between objects and a scene, we build a fully-connected CRF for multi-class object labels [12], where nodes represent labels of the object candidates, and edges encode dependencies between object candidates. We also construct a fully-connected CRF to label abnormal objects. To detect abnormal objects, we extend the context-based fully-connected CRF [15]. Since we focus on detecting the context-violating abnormal objects, we need information on which objects violate the contextual information. Unlike the context-based fully-connected CRF with a context node, the proposed model constructs two fully-connected CRFs with the same number of nodes. In the proposed fully-connected CRFs, the object class nodes correspond to a context node, while abnormality nodes match object label nodes. Through variational inference such as the mean field approximation, we predict the labels of the object candidates and abnormal object candidates. Since the co-occurrence correlation and geometric information are represented using a statistical approach, it is difficult to directly use Gaussian kernels that are employed to efficiently infer the fully-connected CRFs. Therefore, we use an embedding technique to take account of dependencies between objects in the Euclidean feature space [18]. We embed the co-occurrence and support relationships between objects into the Euclidean feature space.
We use the SUN09 dataset [5] to train the contextual information of the normal objects. We also evaluate the proposed abnormal object detection model on three public datasets: the out-of-context dataset [6], the abnormal object dataset [7], and the previous out-of-context dataset [5]. Experimental results show that the proposed model outperforms the state-of-the-art models on three public datasets.
Two main contributions of this paper are described as follows:
- 1)
To the best of our knowledge, we first apply the fully-connected CRFs to detect abnormal objects. The advantage of the fully-connected CRF is to consider the fully-connected relationships between nodes and to efficiently perform inference on the fully-connected graph structures using the variational inference method, e.g., mean field approximation.
- 2)
We also use co-occurrence and support features in the Euclidean feature space. It means that we can use Gaussian kernels in pairwise terms and utilize the efficient mean field inference algorithm.
This paper is organized as follows. Section 2 describes the proposed model. Experimental results and discussions are given in Section 3. Finally, Section 4 concludes the paper.
Section snippets
Generating object candidates
Fig. 1 shows an overview of the proposed model. Given an input image, the proposed model generates a pool of object candidates in the terms of bounding boxes by applying a pre-trained detector, DPM. A pool of object candidates is denoted by where N is the number of object candidates. The ith object candidate is denoted by where ci, k, pi, and si represent the object class, the position vector of a bounding box, and the score of the ith object candidate,
Datasets and experimental set-up
To evaluate the performance of the proposed model, we conduct experiments with three publicly available datasets: (1) out-of-context dataset [6], (2) abnormal datasets [7], and (3) the previous out-of-context dataset [5]. From the out-of-context dataset [6] that consists of 209 images, we choose 161 images each of which includes at least one abnormal object that belongs to one of 107 classes. The abnormal object dataset is grouped by the type of abnormality: co-occurrence-violating, relative
Conclusions
This paper proposes a new approach to abnormal object detection using two fully-connected CRFs (multi-class CRF and abnormality CRF), in which the contextual information is incorporated. Furthermore, we also formulate a feature embedding technique to find a geometry that reflects the statistical relationship between objects such as the co-occurrence and support relationships. Because of this consideration, the proposed model provides the useful contextual information, and can perform more
Acknowledgment
This work was supported in part by the Brain Korea 21 Plus.
References (23)
- et al.
Context models and out-of-context objects
Pattern Recognit. Lett
(2012) - et al.
Object detection with discriminatively trained part based models
IEEE Trans. Pattern Anal. Mach. Intell.
(2010) - et al.
Discriminative models for multi-class object layout
Int. J. Comput. Vis.
(2011) - et al.
The role of context for object detection and semantic segmentation in the wild
- et al.
Describing the scene as a whole: joint object detection, scene classification and semantic segmentation
- et al.
A tree-based context model for object recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(2012) - et al.
Abnormal object detection by canonical scene-based contextual model
- et al.
An object-level high-order contextual descriptor based on semantic, spatial, and scale cues
IEEE Trans. Cybern.
(2015) - et al.
Object-centric anomaly detection by attribute-based reasoning
- et al.
Toward a taxonomy and computational models of abnormalities in images
Describing objects by their attributes
Cited by (12)
Learning multimodal relationship interaction for visual relationship detection
2022, Pattern RecognitionCitation Excerpt :In the graph, candidate relationships (subject-object pairs) are modeled as nodes. Inspired by the form of dense conditional random field which is widely used in visual recognition [14,36], we take unary and pairwise factors into account and design three modules in the MSGRIN: entity relevance module to calculate the existences of nodes; multimodal affinity generation module to obtain affinity matrices from multimodal cues; and multimodal feature augmentation module to achieve the refinement of node features. Visual relationship detection can be decoupled into a two-stage decision problem when subjects and objects are given: firstly to determine whether there exist relationships between them and then to classify what the relationships are.
Graph convolutional neural network for multi-scale feature learning
2020, Computer Vision and Image UnderstandingCitation Excerpt :The use of contextual and local information is common in numerous domains, from understanding scene context in images to modeling sentence structure within speech (Yu et al., 2016; Oh et al., 2017).
Symmetry-Driven Unsupervised Abnormal Object Detection for Railway Inspection
2023, IEEE Transactions on Industrial InformaticsFrame-level global context modeling for detection and localization of abnormality
2023, Multimedia Tools and ApplicationsSpatial Context-Aware Object-Attentional Network for Multi-Label Image Classification
2023, IEEE Transactions on Image ProcessingGIAD-ST: Detecting anomalies in human monitoring based on generative inpainting via self-supervised multi-task learning
2022, Journal of Intelligent Information Systems