Context-based abnormal object detection using the fully-connected conditional random fields

doi:10.1016/j.patrec.2017.08.003

Pattern Recognition Letters

Volume 98, 15 October 2017, Pages 16-25

https://doi.org/10.1016/j.patrec.2017.08.003 Get rights and content

Highlights

•
We propose a new approach to abnormal object detection.
•
We formulate abnormal object detection as a joint labeling problem.
•
The statistical relationships between objects are embedded in the Euclidean space.
•
The proposed model considers the fully-connected relationships between objects.
•
The proposed model achieves the state-of-the-arts performances.

Abstract

The contextual information plays an important role in computer vision, particularly in object detection and scene understanding. The existing contextual models use only the relationship between normal objects and natural scenes, and thus there still remains a difficult problem in detection of abnormal objects. This paper proposes an abnormal object detection model using the fully-connected conditional random fields to integrate the contextual information such as the co-occurrence and geometric relationships between objects. With this formulation, the proposed model combines the co-occurrence, spatial interaction between objects, and scale information. To this end, we use a feature embedding technique to find a geometry that reflects the statistical relationship in the pairwise term. Abnormal object detection is solved by using probabilistic variational inference such as the mean field approximation. Experimental results show that the proposed abnormal object detection model achieves significant improvement over the state-of-the-art models on the out-of-context dataset and abnormal object dataset.

Introduction

Contextual information plays an important role in computer vision, especially, in object detection and semantic segmentation fields [1], [2], [3], [4]. Using the contextual information, object detection and semantic segmentation have made a quantum leap. This is because the contextual information provides the information of whether a scene and an object are related to each other using the co-occurrence relationship between objects within a scene, or relative position and scale of objects. For example, there is a sofa in a living room in the indoor scene, but a car does not appear. Another advantage of using the contextual information such as the co-occurrence, relative position, and relative scale relationships between the objects helps to interpret a scene, and to remove false positives [5].

However, there still remains a difficulty, especially, in detecting and recognizing of abnormal objects that are unexpected objects in a scene. Since most of the existing object detection models focus on detecting normal objects to increase the performance of object detection by simply considering the normal contextual information, they cannot detect abnormal objects. Using the information that abnormal objects have small context scores, abnormal objects can be detected in a single image. The contextual information helps to detect abnormal objects.

Only a few papers on abnormal object detection have been proposed using the contextual information [5], [6], [7], [8]. Choi et al. proposed a tree-based context model via latent co-occurrence and support tree structures [5,6]. The tree structure has a limit in expressing the fully-connected relationships between objects because of considering the parent-child relationships only. Park et al. proposed a generative model to generate both normal and abnormal objects using the contextual information of the canonical scene [7], which represents the configuration of the normal objects such as location distributions of objects in a scene. The drawback of the canonical scene takes account of only the location distribution of the object itself in the scene, without considering the relationships between objects. The previous work does not consider all the relationships between objects. Cao et al. proposed a high-order contextual descriptor that incorporates the contextual information such as semantic, spatial, and scale contexts [8]. Finding the fully-connected pairwise links between the detected objects, or high-order interaction, this method can provide dependencies among objects in an image and detect out-of-context objects. As in [5], they used the Gaussian distributions to take account of the relative position using the vertical position and depth information. However, the Gaussian distributions do not adequately describe the geometric relationships between objects. In contrast to Cao et al.’s method, we consider the geometric information using the support relationship as in [6].

Other abnormal object detection methods have been proposed [9], [10]. Saleh et al. proposed object-centric anomaly detection using attributes of the objects such as part attributes [11], shape, and color [9]. They extended attributed-based reasoning to scene-centric, context-centric, and object-centric reasoning for abnormality [10]. They also classified taxonomy depending on the reasons of abnormality in images such as scene-centric, context-centric, and object-centric reasoning. However, the proposed model does not focus on detecting or classifying object-centric abnormal objects.

Note that this paper is inspired by the fully-connected conditional random fields (CRFs) [12], which was used for object detection [13], [14] and semantic segmentation [15], [16], [17]. Nematollahi et al. proposed a new context-based fully-connected CRF model by adding a hidden node that describes the overall context of an image to segment an image semantically [15]. Our goal is to label object candidates as normal or abnormal objects, rather than to label the pixel as in semantic segmentation.

In this paper, we propose a new approach to abnormal object detection. Unlike existing models, the proposed model takes account of the relationships between objects and object-scene as the fully-connected relationships for the co-occurrence and support relationships. The proposed model solves the abnormal object detection problem by inferring the labels of object candidates, e.g., normal or abnormal objects. To this end, the proposed model consists of two steps. First, we use an off-the-shelf detector such as a deformable part model (DPM) [1] to generate a pool of object candidates. Then, considering dependencies between objects and a scene, we build a fully-connected CRF for multi-class object labels [12], where nodes represent labels of the object candidates, and edges encode dependencies between object candidates. We also construct a fully-connected CRF to label abnormal objects. To detect abnormal objects, we extend the context-based fully-connected CRF [15]. Since we focus on detecting the context-violating abnormal objects, we need information on which objects violate the contextual information. Unlike the context-based fully-connected CRF with a context node, the proposed model constructs two fully-connected CRFs with the same number of nodes. In the proposed fully-connected CRFs, the object class nodes correspond to a context node, while abnormality nodes match object label nodes. Through variational inference such as the mean field approximation, we predict the labels of the object candidates and abnormal object candidates. Since the co-occurrence correlation and geometric information are represented using a statistical approach, it is difficult to directly use Gaussian kernels that are employed to efficiently infer the fully-connected CRFs. Therefore, we use an embedding technique to take account of dependencies between objects in the Euclidean feature space [18]. We embed the co-occurrence and support relationships between objects into the Euclidean feature space.

We use the SUN09 dataset [5] to train the contextual information of the normal objects. We also evaluate the proposed abnormal object detection model on three public datasets: the out-of-context dataset [6], the abnormal object dataset [7], and the previous out-of-context dataset [5]. Experimental results show that the proposed model outperforms the state-of-the-art models on three public datasets.

Two main contributions of this paper are described as follows:

1)
To the best of our knowledge, we first apply the fully-connected CRFs to detect abnormal objects. The advantage of the fully-connected CRF is to consider the fully-connected relationships between nodes and to efficiently perform inference on the fully-connected graph structures using the variational inference method, e.g., mean field approximation.
2)
We also use co-occurrence and support features in the Euclidean feature space. It means that we can use Gaussian kernels in pairwise terms and utilize the efficient mean field inference algorithm.

This paper is organized as follows. Section 2 describes the proposed model. Experimental results and discussions are given in Section 3. Finally, Section 4 concludes the paper.

Section snippets

Generating object candidates

Fig. 1 shows an overview of the proposed model. Given an input image, the proposed model generates a pool of object candidates in the terms of bounding boxes by applying a pre-trained detector, DPM. A pool of object candidates is denoted by $X = {x_{1}, \dots, x_{N}},$ where N is the number of object candidates. The ith object candidate is denoted by $x_{i} = {[c_{i, k}, p_{i}, s_{i}]}^{T},$ where c_{i, k}, p_i, and s_i represent the object class, the position vector of a bounding box, and the score of the ith object candidate,

Datasets and experimental set-up

To evaluate the performance of the proposed model, we conduct experiments with three publicly available datasets: (1) out-of-context dataset [6], (2) abnormal datasets [7], and (3) the previous out-of-context dataset [5]. From the out-of-context dataset [6] that consists of 209 images, we choose 161 images each of which includes at least one abnormal object that belongs to one of 107 classes. The abnormal object dataset is grouped by the type of abnormality: co-occurrence-violating, relative

Conclusions

This paper proposes a new approach to abnormal object detection using two fully-connected CRFs (multi-class CRF and abnormality CRF), in which the contextual information is incorporated. Furthermore, we also formulate a feature embedding technique to find a geometry that reflects the statistical relationship between objects such as the co-occurrence and support relationships. Because of this consideration, the proposed model provides the useful contextual information, and can perform more

Acknowledgment

This work was supported in part by the Brain Korea 21 Plus.

References (23)

M.J. Choi et al.
Context models and out-of-context objects
Pattern Recognit. Lett
(2012)
P. Felzenszwalb et al.
Object detection with discriminatively trained part based models
IEEE Trans. Pattern Anal. Mach. Intell.
(2010)
C. Desai et al.
Discriminative models for multi-class object layout
Int. J. Comput. Vis.
(2011)
R. Mottaghi et al.
The role of context for object detection and semantic segmentation in the wild
J. Yao et al.
Describing the scene as a whole: joint object detection, scene classification and semantic segmentation
M.J. Choi et al.
A tree-based context model for object recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(2012)
S. Park et al.
Abnormal object detection by canonical scene-based contextual model
CaoX. et al.
An object-level high-order contextual descriptor based on semantic, spatial, and scale cues
IEEE Trans. Cybern.
(2015)
B. Saleh et al.
Object-centric anomaly detection by attribute-based reasoning
B. Saleh et al.
Toward a taxonomy and computational models of abnormalities in images

B. Saleh et al.

Describing objects by their attributes

Cited by (12)

Learning multimodal relationship interaction for visual relationship detection
2022, Pattern Recognition
Citation Excerpt :
In the graph, candidate relationships (subject-object pairs) are modeled as nodes. Inspired by the form of dense conditional random field which is widely used in visual recognition [14,36], we take unary and pairwise factors into account and design three modules in the MSGRIN: entity relevance module to calculate the existences of nodes; multimodal affinity generation module to obtain affinity matrices from multimodal cues; and multimodal feature augmentation module to achieve the refinement of node features. Visual relationship detection can be decoupled into a two-stage decision problem when subjects and objects are given: firstly to determine whether there exist relationships between them and then to classify what the relationships are.
Visual relationship detection aims to recognize visual relationships in scenes as triplets $〈$ subject-predicate-object $〉$ . Previous works have shown remarkable progress by introducing multimodal features, external linguistics, scene context, etc. Due to the loss of informative multimodal hyper-relations (i.e. relations of relationships), the meaningful contexts of relationships are not fully captured yet, which limits the reasoning ability. In this work, we propose a Multimodal Similarity Guided Relationship Interaction Network (MSGRIN) to explicitly model the relations of relationships in graph neural network paradigm. In a visual scene, the MSGRIN takes the visual relationships as nodes to construct an adaptive graph and enhances deep message passing by introducing Entity Appearance Reconstruction, Entity Relevance Filtering and Multimodal Similarity Attention. We have conducted extensive experiments on two datasets: Visual Relationship Detection (VRD) and Visual Genome (VG). The evaluation results demonstrate that the proposed MSGRIN has empirically performed more effectively overall.
Graph convolutional neural network for multi-scale feature learning
2020, Computer Vision and Image Understanding
Citation Excerpt :
The use of contextual and local information is common in numerous domains, from understanding scene context in images to modeling sentence structure within speech (Yu et al., 2016; Oh et al., 2017).
Automatic deformable 3D modeling is computationally expensive, especially when considering complex position, orientation and scale variations. We present a volume segmentation framework to utilize local and global regularizations in a data-driven approach. We introduce automated correspondence search to avoid manually labeling landmarks and improve scalability. We propose a novel marginal space learning technique, utilizing multi-resolution pooling to obtain local and contextual features without training numerous detectors or excessively dense patches. Unlike conventional convolutional neural network operators, graph-based operators allow spatially related features to be learned on the irregular domain of the multi-resolution space, and a graph-based convolutional neural network is proposed to learn representations for position and orientation classification. The graph-CNN classifiers are used within a marginal space learning framework to provide efficient and accurate shape pose parameter hypothesis prediction. During segmentation, a global constraint is initially non-iteratively applied, with local and geometric constraints applied iteratively for refinement. Comparison is provided against both classical deformable models and state-of-the-art techniques in the complex problem domain of segmenting aortic root structure from computerized tomography scans. The proposed method shows improvement in both pose parameter estimation and segmentation performance.
Symmetry-Driven Unsupervised Abnormal Object Detection for Railway Inspection
2023, IEEE Transactions on Industrial Informatics
Frame-level global context modeling for detection and localization of abnormality
2023, Multimedia Tools and Applications
Spatial Context-Aware Object-Attentional Network for Multi-Label Image Classification
2023, IEEE Transactions on Image Processing
GIAD-ST: Detecting anomalies in human monitoring based on generative inpainting via self-supervised multi-task learning
2022, Journal of Intelligent Information Systems

View all citing articles on Scopus

View full text

Context-based abnormal object detection using the fully-connected conditional random fields

Highlights

Abstract

Introduction

Section snippets

Generating object candidates

Datasets and experimental set-up

Conclusions

Acknowledgment

Pattern Recognit. Lett

Object detection with discriminatively trained part based models

IEEE Trans. Pattern Anal. Mach. Intell.

Discriminative models for multi-class object layout

Int. J. Comput. Vis.

The role of context for object detection and semantic segmentation in the wild

Describing the scene as a whole: joint object detection, scene classification and semantic segmentation

A tree-based context model for object recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Abnormal object detection by canonical scene-based contextual model

An object-level high-order contextual descriptor based on semantic, spatial, and scale cues

IEEE Trans. Cybern.

Object-centric anomaly detection by attribute-based reasoning

Toward a taxonomy and computational models of abnormalities in images

Describing objects by their attributes