Keywords

1 Introduction

Object detection is an important and challenging computer vision problem. State of the art object detectors, such as Faster R-CNN, YOLO, SSD and DeNet, rely on deep convolutional neural networks and show remarkable results in terms of accuracy and speed. Fusing results of several object detection methods is a common way to increase accuracy of object detection. In the companion paper [1] a new late fusion algorithm for object detection called ALFA was proposed. ALFA relies on agglomerative clustering and shows state of the art results on PASCAL VOC 2007 and 2012 object detection datasets.

We also implemented Dynamic Belief Fusion – state of the art late fusion algorithm for object detection proposed in [2] – as our baseline, since the implementation from authors is not available.

Here we describe our implementation of ALFA and DBF providing pseudocode for the key functions of these methods. We also provide hyperparameter values required to reproduce results from [1] on PASCAL VOC 2012 dataset. Results on PASCAL VOC 2007 are not reproducible due to randomness of a cross-validation procedure.

Link to our implementation: http://github.com/IuliiaSaveleva/ALFA. All the details required to successfully run the code are provided in README.md.

2 Implementation

Assume object detection task for K classes and N trained object detectors \(D_1, D_2, ..., D_N\). Given an image I object detector produces a set of predictions:

$$ D_i(I) = \{p_1, ..., p_{m_i}\}, \quad p = (r, c), $$

where \(m_i\) is the number of detected objects, r represents four coordinates of the axis-aligned bounding box and c is class scores tuple of size \((K + 1)\), including “no object” score \(c^{(0)}\).

2.1 ALFA Implementation

The steps of ALFA are given below.

2.1.1 Agglomerative Clustering of Base Detectors Predictions

We assume that prediction bounding box \(r_i\) and class scores \(c_i\) should be similar to other prediction bounding box \(r_j\) and class scores \(c_j\) if they correspond to the same object. Let \(C_i\) and \(C_j\) be two clusters and \(\sigma (p, \tilde{p})\) – similarity score function between predictions p and \(\tilde{p}\). We define the following similarity score function with hyperparameter \(\tau \) for prediction clusters:

$$\begin{aligned} \sigma (C_i, C_j) = \min _{p \in C_i, \tilde{p} \in C_j} \sigma (p, \tilde{p}), \quad \text {while} \quad max_{i, j} \sigma (C_i, C_j) \ge \tau . \end{aligned}$$
(1)

We propose the following measure of similarity between predictions:

$$\begin{aligned} \sigma (p_i, p_j) = IoU(r_i, r_j)^\gamma \cdot BC(\bar{c}_i, \bar{c}_j) ^{1 - \gamma }, \end{aligned}$$
(2)

where \(\gamma \in [0, 1]\) is a hyperparameter, BC – Bhattacharyya coefficient as a measure of similarity between class scores (\(\bar{c}\) is obtained from class score tuple c by omitting the zeroth “no object” component and renormalizing):

$$\begin{aligned} BC(\bar{c}_i, \bar{c}_j) = \sum _{k = 1}^K \sqrt{\bar{c}_i^{(k)}\bar{c}_j^{(k)}}, \quad \bar{c}^{(k)} = \frac{c^{(k)}}{1 - c^{(0)}}, \quad k = 1, ... K, \end{aligned}$$
(3)

IoU – intersection over union coefficient which is widely used as a measure of similarity between bounding boxes:

$$\begin{aligned} IoU(r_i, r_j) = \frac{r_i \cap r_j}{r_i \cup r_j}. \end{aligned}$$
(4)

See Algorithm 1.

2.1.2 Class Scores Aggregation

Assume that predictions from detectors \(D_{i_1}, D_{i_2}, ..., D_{i_s}\) were assigned to object proposal \(\pi \). We assign an additional low-confidence class scores tuple to this object proposal for every detector that missed:

$$\begin{aligned} c_{lc} = \left( 1 - \varepsilon , \frac{\varepsilon }{K}, \frac{\varepsilon }{K}, ..., \frac{\varepsilon }{K} \right) , \end{aligned}$$
(5)

where \(\varepsilon \) is a hyperparameter.

Each method uses one of two class scores aggregation strategies:

  • Averaging fusion:

    $$\begin{aligned} c_{\pi }^{(k)} = \frac{1}{N} \left( \sum _{d = 1}^s c_{i_d}^{(k)} + (N - s)\cdot c_{lc}^{(k)} \right) , k = 0, ..., K. \end{aligned}$$
    (6)
  • Multiplication fusion:

    $$\begin{aligned} c_{\pi }^{(k)} = \frac{\tilde{c}_{\pi }^{(k)}}{\sum _{i} \tilde{c}_{\pi }^{(i)}}, \quad \tilde{c}_{\pi }^{(k)} = \left( c_{lc}^{(k)} \right) ^{N - s} \prod _{d = 1}^s c_{i_d}^{(k)}, \quad k = 0, ..., K. \end{aligned}$$
    (7)

2.1.3 Bounding Box Aggregation

All methods have the same bounding box aggregation strategy:

$$\begin{aligned} r_{\pi } = \frac{1}{\sum _{i \in \pi } c_{i}^{(l)}} \sum _{i \in \pi } c_{i}^{(l)} \cdot r_{i}, \quad \text {where} \quad l = \displaystyle \mathop {\text {argmax}}_{k \ge 1} c_{\pi }^{(k)}. \end{aligned}$$
(8)

Best ALFA parameters are provided in Table 1:

Table 1. Best ALFA parameters.
figure a

2.2 DBF Implementation

Our implementation of DBF consists of the following steps:

  1. 1.

    Compute PR-curves \(PR^k_i\) for each class k and each detector \(D_i\), \(i = 1, ..., N\);

  2. 2.

    Construct detection vectors for each \(p \in D_i(I)\), \(i = 1, ..., N\), and calculation of basic probabilities of hypothesis according to label l and \(PR^k_i\). See Algorithm 2;

  3. 3.

    Join basic probabilities by Dempster-Shaffer combination rule:

    $$ m_f(A) = \frac{1}{N}\sum _{X_1 \cap {X_2 ... \cap {X_K}} = A} \prod _{i = 1}^{K}m_i(X_i), $$

    where \(N = \sum _{X_1 \cap {X_2 ... \cap {X_K}} \ne \varnothing } \prod _{i = 1}^{K}m_i(X_i)\), to determine fused basic probabilities \(m_f(T)\) and \(m_f(\lnot {T})\);

  4. 4.

    Get fused score as \(\bar{s} = m_f(T) - m_f(\lnot {T})\);

  5. 5.

    Apply NMS to bounding boxes r and scores \(\bar{s}\). In order to help DBF more on NMS step we sort detections by score \(\bar{s}\) and precision from \(PR^k_i\), k = l, if detections had equal \(\bar{s}\) values.

figure b

Best DBF parameters are provided in Table 2:

Table 2. Best DBF parameters.

3 Conclusion

This paper had presented implementation details of ALFA and DBF late fusion methods for object detection. We provide source code and hyperparameter values that allow one to reproduce results from [1] on PASCAL VOC 2012.