Skip to main content
Log in

Designing effective power law-based loss function for faster and better bounding box regression

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Effective bounding box regression is essential for running any real-time object detection algorithm with acceptable accuracy. The currently available loss functions have issues like high computations, and sometimes they suffer from a subtle problem of plateau for non-overlapping bounding boxes, as the resultant bounding boxes are found to be far from the ground truth. In the present investigation, we have proposed a loss function with a new power-law term introduced in it for the normalized distance, which converges as fast as the Complete Intersection over Union (CIoU), but turns out to be computationally much faster than the Intersection over Union (IoU) and Generalised IoU (GIoU). The proposed function is simpler than CIoU. The incorporated power term has been optimized based on the corresponding computational time and on the sum of errors simulated for about multi-million cases, the details of which have been elaborated in the paper. The proposed Absolute IoU (AIoU) loss function has been successfully implemented and tested using the state-of-the-art object detection algorithms, such as You Only Look Once (YOLO) and Single Shot Multibox Detector (SSD) and is found to achieve significant performance improvement, using well-known metric Average Precision (AP), indicating the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Abbreviations

G :

Ground truth box

P :

Predicted box

\(\rho (p, g)\) :

Distance between the centre (pg) of the boxes

c:

Diagonal distance between the two opposite end points of the smallest box enclosing the G and P.

\(\mathfrak {R}_{DIoU}\) :

Penalty term

\(\alpha \) :

Prefactor

N:

Power law index

AP:

Average precision

BB:

Bounding box

x,y :

Centre coordinates of the bounding box.

w, h :

Width and height of the bounding box, respectively.

IoU:

Intersection over Union

DIoU:

Distance IoU

CIoU:

Complete IoU

AIoU:

Absolute IoU

References

  1. Abdallah, A.A., Saab, S.S., Kassas, Z.M.: A machine learning approach for localization in cellular environments. In: 2018 IEEE/ION position, location and navigation symposium (PLANS), pp. 1223–1227 (2018). https://doi.org/10.1109/PLANS.2018.8373508

  2. Akbarizadeh, G.: A new statistical-based kurtosis wavelet energy feature for texture recognition of SAR images (2012). https://doi.org/10.1109/TGRS.2012.2194787

  3. Chen, W., Huang, H., Peng, S., Zhou, C., Zhang, C.: Yolo-face: a real-time face detector. Vis. Comput. (2020). https://doi.org/10.1007/s00371-020-01831-7

  4. Davari, N., Akbarizadeh, G., Mashhour, E.: Intelligent diagnosis of incipient fault in power distribution lines based on corona detection in uv-visible videos. In: IEEE Transactions on Power Delivery pp. 1–1 (2020). https://doi.org/10.1109/TPWRD.2020.3046161

  5. Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88, 303–338 (2009)

    Article  Google Scholar 

  6. Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD : Deconvolutional single shot detector. CoRR abs/1701.06659 (2017). arXiv:1701.06659

  7. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR abs/1311.2524 (2013). arXiv:1311.2524

  8. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017). https://doi.org/10.1109/ICCV.2017.322

  9. Ikonomakis, E., Kotsiantis, S., Tampakas, V.: Text classification using machine learning techniques. WSEAS Trans. Comput. 4, 966–974 (2005)

    Google Scholar 

  10. Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (Eds.) Computer Vision—ECCV 2018, pp. 816–832. Springer, Cham (2018)

  11. Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. CoRR abs/1708.02002 (2017). arXiv:1708.02002

  12. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. CoRR abs/1512.02325 (2015). arXiv:1512.02325

  13. Lorenčí-k, D., Zolotová ,I.: Object recognition in traffic monitoring systems. In: World Symposium on Digital Intelligence for Systems and Machines (DISA), pp. 277–282 (2018). https://doi.org/10.1109/DISA.2018.8490634

  14. Mahajan, M., Bhattacharjee, T., Krishnan, A., Shukla, P., Nandi, G.C.: Robotic grasp detection by learning representation in a vector quantized manifold. In: International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2020). https://doi.org/10.1109/SPCOM50965.2020.9179578

  15. Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. CoRR abs/1904.02701 (2019). arXiv:1904.02701

  16. Raghunandan, A., Mohana, Raghav, P., Aradhya, H.V.R.: Object detection algorithms for video surveillance applications. In: International Conference on Communication and Signal Processing (ICCSP), pp. 0563–0568 (2018). https://doi.org/10.1109/ICCSP.2018.8524461

  17. Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection (2015). arXiv:1506.02640

  18. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. CoRR abs/1804.02767 (2018). arXiv:1804.02767

  19. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1 (2015). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  20. Rezatofighi, S.H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I.D., Savarese, S.: Generalized intersection over union: A metric and A loss for bounding box regression. (2019). arXiv:1902.09630

  21. Song, T., Sun, L., Xie, D., Sun, H., Pu, S.: Small-scale pedestrian detection based on somatic topology localization and temporal feature aggregation. (2018). arXiv:1807.01438

  22. Tang, P., Wang, C., Wang, X., Liu, W., Zeng, W., Wang, J.: Object detection in videos by high quality object linking. IEEE Trans. Pattern Anal. Mach. Intell. 42(5), 1272–1278 (2020). https://doi.org/10.1109/TPAMI.2019.2910529

    Article  Google Scholar 

  23. Tirandaz, Z., Akbarizadeh, G.: A two-phase algorithm based on kurtosis curvelet energy and unsupervised spectral regression for segmentation of sar images (2016). https://doi.org/10.1109/JSTARS.2015.2492552

  24. Tirandaz, Z., Akbarizadeh, G., Kaabi, H.: Polsar image segmentation based on feature extraction and data compression using weighted neighborhood filter bank and hidden markov random field-expectation maximization. Measurement 153, 10732 (2020). https://doi.org/10.1016/j.measurement.2019.107432

    Article  Google Scholar 

  25. van de Sande, K.E.A., Uijlings, J.R.R., Gevers, T., Smeulders, A.W.M.: Segmentation as selective search for object recognition. In: International Conference on Computer Vision, pp. 1879–1886 (2011). https://doi.org/10.1109/ICCV.2011.6126456

  26. Wäldchen, J., Mäder, P.: Machine learning for image based species identification. Methods Ecol. Evol. 9, 1 (2018). https://doi.org/10.1111/2041-210x.13075

    Article  MATH  Google Scholar 

  27. Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.S.: Unitbox: An advanced object detection network. CoRR abs/1608.01471 (2016). arXiv:1608.01471

  28. Zalpour, M., Akbarizadeh, G., Alaei-Sheini, N.: A new approach for oil tank detection using deep learning features with control false alarm rate in high-resolution satellite imagery (2020). https://doi.org/10.1080/01431161.2019.1685720

  29. Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-iou loss: Faster and better learning for bounding box regression (2019). arxiv:1911.08287

  30. Zhou, H., Meng, D., Zhang, Y., Peng, X., Du, J., Wang, K., Qiao, Y.: Exploring emotion features and fusion strategies for audio-video emotion recognition, pp. 562–566 (2019). https://doi.org/10.1145/3340555.3355713

Download references

Acknowledgements

We are very thankful to the Central Computing Facility (CCF) of Indian Institute of Information Technology, Allahabad for providing us the necessary GPU access.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Priya Shukla.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aswal, D., Shukla, P. & Nandi, G.C. Designing effective power law-based loss function for faster and better bounding box regression. Machine Vision and Applications 32, 87 (2021). https://doi.org/10.1007/s00138-021-01206-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-021-01206-5

Keywords

Navigation