Skip to main content
Log in

Cox Processes for Counting by Detection

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Abstract

In this work, doubly stochastic Poisson (Cox) processes and convolutional neural net (CNN) classifiers are used to estimate the number of instances of an object in an image. Poisson processes are well suited to model events that occur randomly in space, such as the location of objects in an image or the enumeration of objects in a scene. The proposed algorithm selects a subset of bounding boxes in the image domain, then queries them for the presence of the object of interest by running a pre-trained CNN classifier. The resulting observations are then aggregated, and a posterior distribution over the intensity of a Cox process is computed. This intensity function is summed up, providing an estimator of the number of instances of the object over the entire image. Despite the flexibility and versatility of Cox processes, their application to large datasets is limited as their computational complexity and storage requirements do not easily scale with image size, typically requiring \(O(n^3)\) computation time and \(O(n^2)\) storage, where n is the number of observations. To mitigate this problem, we employ the Kronecker algebra, which takes advantage of direct product structures. As the likelihood is non-Gaussian, the Laplace approximation is used for inference, employing the conjugate gradient and Newton’s method. Our approach has then close to linear performance, requiring only \(O(n^{3/2})\) computation time and O(n) memory. Results are presented on simulated data and on images from the publicly available MS COCO dataset. We compare our counting results with the state-of-the-art detection method, Faster RCNN, and demonstrate superior performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Kaggle competition noaa fisheries steller sea lion population count. https://www.kaggle.com/c/noaa-fisheries-steller-sea-lion-population-count. Accessed 28 April 2017

  2. Arteta, C., Lempitsky, V., Noble, J.A., Zisserman, A.: Interactive object counting. In: European conference on computer vision, pp. 504–518. Springer (2014)

  3. Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: IEEE conference on computer vision and pattern recognition, CVPR 2008. IEEE, pp. 1–7 (2008)

  4. Cho, S.Y., Chow, T.W., Leung, C.T.: A neural-based crowd estimation by hybrid global learning algorithm. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 29(4), 535–541 (1999)

  5. Cox, D.R.: Some statistical methods connected with series of events. J. R. Stat. Soc. Ser. B (Methodol.) 17, 129–164 (1955)

    MathSciNet  MATH  Google Scholar 

  6. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, CVPR 2005. IEEE, vol. 1, pp. 886–893 (2005)

  7. Dassios, A., Jang, J.W.: Pricing of catastrophe reinsurance and derivatives using the cox process with shot noise intensity. Financ. Stoch. 7(1), 73–95 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  8. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  9. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  10. Fiaschi, L., Koethe, U., Nair, R., Hamprecht, F.A.: Learning to count with regression forest and structured labels. In: 2012 21st international conference on pattern recognition (ICPR). IEEE, pp. 2685–2688 (2012)

  11. Flaxman, S., Wilson, A.G., Neill, D.B., Nickisch, H., Smola, A.J.: Fast kronecker inference in Gaussian processes with non-Gaussian likelihoods. In: International conference on machine learning, vol. 2015 (2015)

  12. Ge, W., Collins, R.T.: Marked point processes for crowd counting. In:IEEE conference on computer vision and pattern recognition, CVPR 2009. IEEE, pp. 2913–2920 (2009)

  13. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)

  14. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)

  15. Han, W., Rajan, P., Frazier, P.I., Jedynak, B.M.: Bayesian group testing under sum observations: a parallelizable two-approximation for entropy loss. IEEE Trans. Inf. Theory 63(2), 915–933 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp. 346–361. Springer (2014)

  17. Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2547–2554 (2013)

  18. Kingman, J.F.C.: Poisson Processes. Clarendon Press, Oxford (1993)

    MATH  Google Scholar 

  19. Kong, D., Gray, D., Tao, H.: A viewpoint invariant approach for crowd counting. In: 18th international conference on pattern recognition, ICPR 2006. IEEE, vol. 3, pp. 1187–1190 (2006)

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)

  21. Lafarge, F., Descombes, X.: Geometric feature extraction by a multimarked point process. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1597–1609 (2010)

    Article  Google Scholar 

  22. Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in neural information processing systems, pp. 1324–1332 (2010)

  23. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)

  24. Liu, X., Tu, P.H., Rittscher, J., Perera, A., Krahnstoever, N.: Detecting and counting people in surveillance applications. In: IEEE conference on advanced video and signal based surveillance, AVSS 2005. IEEE, pp. 306–311 (2005)

  25. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  26. MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)

  27. Marana, A., Velastin, S., Costa, L., Lotufo, R.: Estimation of crowd density using image processing. In: IEE colloquium on image processing for security applications (Digest No.: 1997/074). IET, pp. 11–1 (1997)

  28. Merchan-Perez, A., Rodriguez, J., Alonso-Nanclares, L., Schertel, A., DeFelipe, J.: Counting synapses using fib/sem microscopy: a true revolution for ultrastructural volume reconstruction. Front. Neuroanat. 3, 18 (2009)

    Article  Google Scholar 

  29. Moghaddam, B., Pentland, A.: Probabilistic visual learning for object representation. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 696–710 (1997)

    Article  Google Scholar 

  30. Onoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: European conference on computer vision, pp. 615–629. Springer (2016)

  31. Pham, T.T., Chin, T.J., Schindler, K., Suter, D.: Interacting geometric priors for robust multimodel fitting. IEEE Trans. Image Process. 23(10), 4601–4610 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  32. Pham, T.T., Hamid Rezatofighi, S., Reid, I., Chin, T.J.: Efficient point process inference for large-scale object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2837–2845 (2016)

  33. Rajan, P., Han, W., Sznitman, R., Frazier, P., Jedynak, B.: Bayesian multiple target localization. In: International conference on machine learning, pp. 1945–1953 (2015)

  34. Rasmussen, C.E., Williams, C.K.I.: Gaussian processes for machine learning. MIT Press (2006)

  35. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)

  36. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)

  37. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)

  38. Ross, S.M.: Stochastic Processes. Wiley, New York (1996)

    MATH  Google Scholar 

  39. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  40. Saatçi, Y.: Scalable inference for structured Gaussian process models. Ph.D. thesis, Citeseer (2012)

  41. Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol. 1, p. 6 (2017)

  42. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  43. Sindagi, V.A., Patel, V.M.: Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp. 1–6 (2017)

  44. Sindagi, V.A., Patel, V.M.: A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recognit. Lett. 107, 3–16 (2018)

    Article  Google Scholar 

  45. Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013). https://doi.org/10.1007/s11263-013-0620-5

    Article  Google Scholar 

  46. Verdie, Y., Lafarge, F.: Detecting parametric objects in large scenes by monte carlo sampling. Int. J. Comput. Vis. 106(1), 57–75 (2014)

    Article  MATH  Google Scholar 

  47. Walach, E., Wolf, L.: Learning to count with cnn boosting. In: European conference on computer vision, pp. 660–676. Springer (2016)

  48. Wang, C., Zhang, H., Yang, L., Liu, S., Cao, X.: Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM, pp. 1299–1302 (2015)

  49. Warren, S., Hewett, P., Foltz, C.: The kx method for producing k-band flux-limited samples of quasars. Mon. Not. R. Astron. Soc. 312(4), 827–832 (2000)

    Article  Google Scholar 

Download references

Acknowledgements

This work was made possible in part thanks to Portland Institute for Computational Science and its resources acquired using NSF Grant DMS 1624776 and ARO Grant W911NF-16-1-0307, and the Department of Computer Science, Johns Hopkins University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Purnima Rajan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Method of Moments for Kernel Parameter Estimation

Suppose the image domain is \(\varOmega \) with size \(\left| \varOmega \right| =a\times b\), where a and b are, respectively, the number of rows and columns of pixels in the image. Let \(N(\varOmega )\) be the number of instances of the object within \(\varOmega \). According to the model, \(N(\varOmega )|\lambda \)\(\sim \) Poisson \(\left( \int _\varOmega \lambda (s)\,\mathrm{{d}}s\right) \), where \(\lambda (s)=\alpha g^2(s)\) and s is a Gaussian process g\(\sim \)\(\mathrm{{GP}}\left( 0,K\right) \) with,

$$\begin{aligned} K\left( s_1,s_2\right) =\sigma ^2 \mathrm{{exp}}\left\{ -\frac{1}{2l^2}\left\| s_1-s_2\right\| ^2 \right\} . \end{aligned}$$

Assume that we have m samples \(N_1\left( \varOmega \right) \ldots N_m\left( \varOmega \right) \) over the same domain \(\varOmega \), or over domains of the same size \(\left| \varOmega \right| \). We can then use these samples together with the method of moments to estimate the parameters of the prior \(\sigma \) and l as follows. Note that

$$\begin{aligned} E\left[ N\left( \varOmega \right) \right]&=E\left[ E\left[ N\left( \varOmega \right) |\lambda \right] \right] \\&=E\left[ \int _\varOmega \lambda (s)\,\mathrm{{d}}s \right] \\&=\int _\varOmega E[\lambda (s)]\mathrm{{d}}s\\&=\int _\varOmega E[\alpha g^2(s)]\mathrm{{d}}s\\&=\alpha \int _\varOmega K\left( s,s\right) \mathrm{{d}}s \\&=\alpha \int _\varOmega \sigma ^2 \mathrm{{d}}s \\&=\alpha \sigma ^2 \left| \varOmega \right| \end{aligned}$$
$$\begin{aligned}V\left[ N\left( \varOmega \right) \right]&=V\left[ E\left[ N\left( \varOmega \right) |\lambda \right] \right] +E\left[ V\left[ N\left( \varOmega \right) |\lambda \right] \right] \\&=V\left[ E\left[ N\left( \varOmega \right) |\lambda \right] \right] +E\left[ \int _\varOmega \lambda (s)\,\mathrm{{d}}s \right] \\&=V\left[ E\left[ N\left( \varOmega \right) |\lambda \right] \right] +\alpha \sigma ^2 \left| \varOmega \right| . \end{aligned}$$

Further, let \(Z=V\left[ E\left[ N\left( \varOmega \right) |\lambda \right] \right] \), then

$$\begin{aligned} Z&=V\left[ \int _\varOmega \lambda \left( s\right) \mathrm{{d}}s \right] \\&=E\left[ \left( \int _\varOmega \lambda \left( s\right) \mathrm{{d}}s \right) ^2 \right] -E^2 \left[ \int _\varOmega \lambda \left( s\right) \mathrm{{d}}s \right] \\&=E\left[ \left( \int _\varOmega \alpha g^2 \left( s\right) \mathrm{{d}}s \right) ^2 \right] -\left( \alpha \sigma ^2 \left| \varOmega \right| \right) ^2 \\&=\alpha ^2 \int _\varOmega \int _\varOmega E\left[ g^2(s)g^2(t)\right] \mathrm{{d}}s\mathrm{{d}}t-\left( \alpha \sigma ^2 \left| \varOmega \right| \right) ^2 \\&=\alpha ^2 \int _\varOmega \int _\varOmega 2\mathrm{{cov}}^2(g(s),g(t))+v[g(s)]v[g(t)]\mathrm{{d}}s\mathrm{{d}}t\\&\quad -\left( \alpha \sigma ^2 \left| \varOmega \right| \right) ^2 \\&=2\alpha ^2 \sigma ^4 \left( \int _{s_1=0}^{a} \int _{t_1=0}^{a} \mathrm{{exp}} \left\{ -\frac{1}{l^2}(s_1-t_1)^2 \right\} \mathrm{{d}}s_1\mathrm{{d}}t_1\right) \\&\quad \left( \int _{s_2=0}^{b} \int _{t_2=0}^{b} \mathrm{{exp}} \left\{ -\frac{1}{l^2}(s_2-t_2)^2 \right\} \mathrm{{d}}s_2\mathrm{{d}}t_2 \right) \\&\quad +\alpha ^2 \sigma ^4 \left| \varOmega \right| ^2-\left( \alpha \sigma ^2 \left| \varOmega \right| \right) ^2\\&=2\alpha ^2\sigma ^4\left( \sqrt{\pi } al\sqrt{1-\exp \{-\frac{a}{l^2}\}}\right) \\&\quad \left( \sqrt{\pi } bl\sqrt{1-\exp \{-\frac{b}{l^2}\}}\right) . \end{aligned}$$

In the last equation above, if a and b are large and l is small, which are the real case in general, then terms \(\sqrt{1-\exp \{-\frac{a}{l^2}\}}\) and \(\sqrt{1-\exp \{-\frac{b}{l^2}\}}\) are approximately equal to 1 and we have \(Z\approx 2\alpha ^2 \sigma ^4 \pi abl^2 \). Therefore, \(V\left[ N\left( \varOmega \right) \right] =2\alpha ^2 \sigma ^4 \pi abl^2 +\alpha \sigma ^2 \left| \varOmega \right| \). Based on the method of moments, we have the following two equations:

$$\begin{aligned} \bar{N}&=\frac{1}{m} \sum _{i=1}^m N_i (\varOmega )=\alpha \sigma ^2 \left| \varOmega \right| \\ S^2&=\frac{1}{m}\sum _{i=m}^m (N_i(\varOmega )-\bar{N})^2=2 \alpha ^2 \sigma ^4 \pi abl^2 +\alpha \sigma ^2 \left| \varOmega \right| . \end{aligned}$$

Solve for \(\sigma \) and l (set \(\alpha =1\)),

$$\begin{aligned} \begin{aligned} \sigma&=\sqrt{\frac{\bar{N}}{ab}}\\ l&=\sqrt{\frac{ab\big (V\left[ N\left( \varOmega \right) \right] -\bar{N}\big )}{2\pi \bar{N}^2}}. \end{aligned} \end{aligned}$$
(18)

1.2 Concavity of the Forward Model

Recall that \(\lambda _m=\alpha \phi (g_m)\), with \(\alpha > 0\). Choosing \(p=2\), we obtain for the model in Eq. (3):

$$\begin{aligned} {\nabla _{g_m} \ln p(y|g)}&=-4\beta \big (g_m-\sqrt{y_m}\big )^{3} \end{aligned}$$
(19)
$$\begin{aligned} {\nabla \nabla _{g_m} \ln p(y|g)}&=-12\beta \big (g_m-\sqrt{y_m}\big )^{2}. \end{aligned}$$
(20)

Note that the last quantity is continuous and negative so that the matrix W is positive semi-definite.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajan, P., Ma, Y. & Jedynak, B. Cox Processes for Counting by Detection. J Math Imaging Vis 61, 380–393 (2019). https://doi.org/10.1007/s10851-018-0838-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10851-018-0838-5

Keywords

Navigation