Skip to main content
Log in

Label-Free Robustness Estimation of Object Detection CNNs for Autonomous Driving Applications

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

The advent of Convolutional Neural Networks (CNNs) has led to its increased application in several domains. One noteworthy application is the perception system for autonomous driving that rely on the predictions from CNNs. On one hand, predicting the learned objects with maximum accuracy is of importance. On the other hand, it is still a challenge to evaluate the reliability of CNN-based perception systems without ground truth information. Such evaluations are of significance for autonomous driving applications. One way to estimate reliability is by evaluating robustness of the detections in the presence of artificial perturbations. However, several existing works on perturbation-based robustness quantification rely on the ground truth labels. Acquiring the ground truth labels is a tedious, expensive and error-prone process. In this work we propose a novel label-free robustness metric for quantifying the robustness of CNN object detectors. We quantify the robustness of the detections to a specific type of input perturbation based on the prediction confidences. In short, we check the sensitivity of the predicted confidences under increased levels of artificial perturbation. Thereby, we avoid the need for ground truth annotations. We perform extensive evaluations on our traffic light detector from autonomous driving applications and on public object detection networks and datasets. The evaluations show that our label-free metric is comparable to the ground truth aided robustness scoring.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. For the sake of simplicity we assume each image I has only one bounding box. However, there can be multiple bounding boxes per image.

  2. This was done using a single randomly chosen image for each perturbation.

  3. for a KLD normalised between [0,1]

  4. The details of the architecture and training parameters are provided in Appendix 2

References

  • Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A., & Criminisi, A. Measuring neural net robustness with constraints. In Advances in neural information processing systems (pp. 2613–2621) (2016).

  • Brach, K., Sick, B., & Dürr, O. (2020). Single shot mc dropout approximation. arXiv preprint arXiv:2007.03293.

  • Carlini, N., Athalye, A., Papernot, N., Brendel, W., Rauber, J., Tsipras, D., GoodfellowI., Madry, A., & Kurakin A. (2019). On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705.

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213–3223).

  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255) IEEE.

  • Dodge, S., & Karam, L. (2016). Understanding how image quality affects deep neural networks. In 2016 eighth international conference on quality of multimedia experience (QoMEX) (pp. 1–6). IEEE.

  • Dolatshah, M., Teoh, M., Wang, J., & Pei, J. (2018). Cleaning crowdsourced labels using oracles for statistical classification. Proceedings of the VLDB Endowment, 12(4), 376–389.

    Article  Google Scholar 

  • Feng, D., Rosenbaum, L., & Dietmayer, K. (2018).Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection. In 2018 21st international conference on intelligent transportation systems (ITSC) (pp. 3266–3273). IEEE.

  • Fregin, A., Müller, J., Kre\(\beta \)el, U., & Dietmayer, K. (2018). The driveu traffic light dataset: Introduction and comparison with existing datasets. In 2018 IEEE international conference on robotics and automation (ICRA) (pp. 3376–3383). IEEE.

  • Gal, Y., & Ghahramani, Z. (2016). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International conference on machine learning (pp. 1050–1059).

  • Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.

  • Google cloud data labeling service: Pricing. https://cloud.google.com/ai-platform/data-labeling/pricing. Accessed: 2020-02-13.

  • Gopinath, D., Katz, G., Păsăreanu, C. S., & Barrett, C. (2018). Deepsafe: A data-driven approach for assessing robustness of neural networks. In International symposium on automated technology for verification and analysis (pp. 3–19). Springer.

  • Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17 (pp. 1321–1330). JMLR.org.

  • Hein, M., & Andriushchenko, M. (2017). Formal guarantees on the robustness of a classifier against adversarial manipulation. In Advances in neural information processing systems (pp. 2266–2276).

  • Hendrycks, D., & Dietterich, T. (2019). Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261

  • Hosseini, H., Xiao, B., & Poovendran, R. (2017). Google’s cloud vision api is not robust to noise. In 2017 16th IEEE international conference on machine learning and applications (ICMLA) (pp. 101–105). IEEE.

  • Huang, X., Kwiatkowska, M., Wang, S., & Wu, M. (2017). Safety verification of deep neural networks. In International conference on computer aided verification (pp. 3–29). Springer.

  • Johanson, M., Belenki, S., Jalminger, J., Fant, M., & Gjertz, M. (2014). Big automotive data: Leveraging large volumes of data for knowledge-driven product development. In 2014 IEEE international conference on big data (Big Data) (pp. 736–741).

  • Kapishnikov, A., Bolukbasi, T., Viégas, F. B., & Terry, M. (2019). Segment integrated gradients: Better attributions through regions. CoRR. arXiv:1906.02825.

  • Katz, G., Barrett, C., Dill, D. L., Julian, K., & Kochenderfer, M. J. (2017). Reluplex: An efficient smt solver for verifying deep neural networks. In International conference on computer aided verification (pp. 97–117). Springer.

  • Katz, G., Barrett, C., Dill, D. L., Julian, K., & Kochenderfer, M. J. (2017). Towards proving the adversarial robustness of deep neural networks. arXiv preprint arXiv:1709.02802.

  • Krizhevsky, A., Hinton, G. et al. (2009). Learning multiple layers of features from tiny images.

  • Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems (pp. 6402–6413).

  • Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777.

    Article  Google Scholar 

  • Lin, J. (1991). Divergence measures based on the shannon entropy. IEEE Transactions on Information theory, 37(1), 145–151.

    Article  MathSciNet  Google Scholar 

  • Lin, T.-Y., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick, R. B., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Lawrence Zitnick, C. (2014). Microsoft COCO: Common objects in context. CoRR, abs/1405.0312,

  • Maglogiannis, I. G. (2007). Emerging artificial intelligence applications in computer engineering: Real word AI systems with applications in ehealth, hci, information retrieval and pervasive technologies (Vol. 160). Ios Press.

  • Maier-Hein, L., Mersmann, S., Kondermann, D., Stock, C., Kenngott, H. G., Sanchez, A., Wagner, M., Preukschas, A., Wekerle, A.-L., Helfert, S., et al. (2014). Crowdsourcing for reference correspondence generation in endoscopic images. In International conference on medical image computing and computer-assisted intervention (pp. 349–356). Springer.

  • Mangal, R., Nori, A. V., Orso, A. (2019). Robustness of neural networks: A probabilistic and practical approach. In 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER) (pp. 93–96). IEEE.

  • Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1765–1773).

  • Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 427–436).

  • Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017). Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security (pp. 506–519).

  • Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510–4520).

  • Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., & Madry, A. (2018). Adversarially robust generalization requires more data. In Advances in neural information processing systems (pp. 5014–5026).

  • Shahrokni, A., & Feldt, R. (2013). A systematic review of software robustness. Information and Software Technology, 55(1), 1–17.

    Article  Google Scholar 

  • Shekar, A. K., Bocklisch, T., Sánchez, P. I., Straehle, C. N., & Müller, E. (2017). Including multi-feature interactions and redundancy for feature ranking in mixed datasets. In Joint European conference on machine learning and knowledge discovery in databases (pp. 239–255). Springer.

  • Strong, A. I. (2016). Applications of artificial intelligence & associated technologies. Science [ETEBMS-2016], 5(6).

  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818–2826).

  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.

  • Temel, D., Lee, J., & AlRegib, G. (2018). Cure-or: Challenging unreal and real environments for object recognition. In 2018 17th IEEE international conference on machine learning and applications (ICMLA) (pp. 137–144) IEEE.

  • Tensorflow deeplab model zoo. https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md. Accessed: 2020-28-23.

  • Tensorflow detection model zoo. https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md. Accessed 2020-02-20.

  • Tramér, F., Behrmann, J., Carlini, N., Papernot, N., & Jacobsen, J.-H. (2020). Fundamental tradeoffs between invariance and sensitivity to adversarial perturbations. arXiv preprint arXiv:2002.04599.

  • Vijayanarasimhan, S., Grauman, K. (2009). What’s it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In 2009 IEEE conference on computer vision and pattern recognition (pp. 2262–2269). IEEE.

  • Von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 319–326).

  • Yu, F., Qin, Z., Liu, C., Zhao, L., Wang, Y., & Chen, X. (2019). Interpreting and evaluating neural network robustness. arXiv preprint arXiv:1905.04270.

  • Zhang, H., Weng, T.-W., Chen, P.-Y., Hsieh, C.-J., & Daniel, L. (2018). Efficient neural network robustness certification with general activation functions. In Advances in neural information processing systems (pp. 4939–4948).

  • Zhang, J. M., Harman, M., Ma, L., & Liu, Y. (2020). Machine learning testing: Survey, landscapes and horizons. CoRR. arXiv:1906.10742.

  • Zhao, Z.-Q., Zheng, P., Shou-tao, X., & Xindong, W. (2019). Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3212–3232.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arvind Kumar Shekar.

Additional information

Communicated by Konrad Schindler.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 A . KLD Symmetricity Analysis

In Sect. 4 we instantiated the divergence function for label-free robustness metric in Eq. 1 with KL-divergence. However, it was an experimental instantiation and other divergence functions are applicable. For this reason, we reuse the experimental settings from Sect. 5 with different instantiation of divergence functions. Firstly, as a non-symmetric function, i.e., \(KLD(P(\mathcal {V}),P(\mathcal {V}_s))\ne KLD(P(\mathcal {V}_s),P(\mathcal {V}))\), we instantiate Eq.  1 as \(KLD(P(\mathcal {V}_s),P(\mathcal {V}))\) and tabulate the results in Table 7. \(KLD(P(\mathcal {V}_s),P(\mathcal {V}))\) estimates the entropy of \(P(\mathcal {V})\) relative to \(P(\mathcal {V}_s)\) (Lin 1991). The entropy of the reference distribution will be relatively smaller in comparison to distorted distribution. Which means the \(KLD(P(\mathcal {V}_s),P(\mathcal {V}))\) will be relatively smaller in comparison to \(KLD(P(\mathcal {V}),P(\mathcal {V}_s))\). Due to this reason, in comparison to Table 7, the values in Table 3 has slightly smaller values. As we use the rob metric to compare between multiple detectors, this difference is not a strong weakness.

Secondly, to eliminate the problem of non-symmetric function we estimate divergence using JSD and tabulate the results in Table 8. From the both evaluations we observe that there are no significant discrepancy on which model is maximally robust. Thereby implying that the divergence function does not largely influence our robustness metric.

1.2 CNN Architecture for Training on CIFAR-10 Dataset

Figure 14 shows the CNN architecture we employed to perform classification on CIFAR-10 image dataset in Sect. 6.2. The dataset consists of 50000 training images and 10000 test images that belong to one of the 10 classes. While using dropout as a regularization technique, i.e., only on training data, we applied a dropout rate of 0.5. We set a maximum of 50 epochs for minimization of cross-entropy as the loss function with early stopping, i.e., the training loop is terminated if there is no reduction in the validation loss for more than 10 epochs.

Fig. 14
figure 14

Convolution network architecture for uncertainty estimation in Sect. 6.2

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shekar, A.K., Gou, L., Ren, L. et al. Label-Free Robustness Estimation of Object Detection CNNs for Autonomous Driving Applications. Int J Comput Vis 129, 1185–1201 (2021). https://doi.org/10.1007/s11263-020-01423-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01423-x

Keywords

Navigation