Cox Processes for Counting by Detection

Rajan, Purnima; Ma, Yongming; Jedynak, Bruno

doi:10.1007/s10851-018-0838-5

Cox Processes for Counting by Detection

Published: 21 September 2018

Volume 61, pages 380–393, (2019)
Cite this article

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

397 Accesses
1 Citation
Explore all metrics

Abstract

In this work, doubly stochastic Poisson (Cox) processes and convolutional neural net (CNN) classifiers are used to estimate the number of instances of an object in an image. Poisson processes are well suited to model events that occur randomly in space, such as the location of objects in an image or the enumeration of objects in a scene. The proposed algorithm selects a subset of bounding boxes in the image domain, then queries them for the presence of the object of interest by running a pre-trained CNN classifier. The resulting observations are then aggregated, and a posterior distribution over the intensity of a Cox process is computed. This intensity function is summed up, providing an estimator of the number of instances of the object over the entire image. Despite the flexibility and versatility of Cox processes, their application to large datasets is limited as their computational complexity and storage requirements do not easily scale with image size, typically requiring $O(n^3)$ computation time and $O(n^2)$ storage, where n is the number of observations. To mitigate this problem, we employ the Kronecker algebra, which takes advantage of direct product structures. As the likelihood is non-Gaussian, the Laplace approximation is used for inference, employing the conjugate gradient and Newton’s method. Our approach has then close to linear performance, requiring only $O(n^{3/2})$ computation time and O(n) memory. Results are presented on simulated data and on images from the publicly available MS COCO dataset. We compare our counting results with the state-of-the-art detection method, Faster RCNN, and demonstrate superior performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

SSD: Single Shot MultiBox Detector

Counting in the Wild

Avoiding Over-Detection: Towards Combined Object Detection and Counting

References

Kaggle competition noaa fisheries steller sea lion population count. https://www.kaggle.com/c/noaa-fisheries-steller-sea-lion-population-count. Accessed 28 April 2017
Arteta, C., Lempitsky, V., Noble, J.A., Zisserman, A.: Interactive object counting. In: European conference on computer vision, pp. 504–518. Springer (2014)
Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: IEEE conference on computer vision and pattern recognition, CVPR 2008. IEEE, pp. 1–7 (2008)
Cho, S.Y., Chow, T.W., Leung, C.T.: A neural-based crowd estimation by hybrid global learning algorithm. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 29(4), 535–541 (1999)
Cox, D.R.: Some statistical methods connected with series of events. J. R. Stat. Soc. Ser. B (Methodol.) 17, 129–164 (1955)
MathSciNet MATH Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, CVPR 2005. IEEE, vol. 1, pp. 886–893 (2005)
Dassios, A., Jang, J.W.: Pricing of catastrophe reinsurance and derivatives using the cox process with shot noise intensity. Financ. Stoch. 7(1), 73–95 (2003)
Article MathSciNet MATH Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Fiaschi, L., Koethe, U., Nair, R., Hamprecht, F.A.: Learning to count with regression forest and structured labels. In: 2012 21st international conference on pattern recognition (ICPR). IEEE, pp. 2685–2688 (2012)
Flaxman, S., Wilson, A.G., Neill, D.B., Nickisch, H., Smola, A.J.: Fast kronecker inference in Gaussian processes with non-Gaussian likelihoods. In: International conference on machine learning, vol. 2015 (2015)
Ge, W., Collins, R.T.: Marked point processes for crowd counting. In:IEEE conference on computer vision and pattern recognition, CVPR 2009. IEEE, pp. 2913–2920 (2009)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)
Han, W., Rajan, P., Frazier, P.I., Jedynak, B.M.: Bayesian group testing under sum observations: a parallelizable two-approximation for entropy loss. IEEE Trans. Inf. Theory 63(2), 915–933 (2017)
Article MathSciNet MATH Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp. 346–361. Springer (2014)
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2547–2554 (2013)
Kingman, J.F.C.: Poisson Processes. Clarendon Press, Oxford (1993)
MATH Google Scholar
Kong, D., Gray, D., Tao, H.: A viewpoint invariant approach for crowd counting. In: 18th international conference on pattern recognition, ICPR 2006. IEEE, vol. 3, pp. 1187–1190 (2006)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)
Lafarge, F., Descombes, X.: Geometric feature extraction by a multimarked point process. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1597–1609 (2010)
Article Google Scholar
Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in neural information processing systems, pp. 1324–1332 (2010)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)
Liu, X., Tu, P.H., Rittscher, J., Perera, A., Krahnstoever, N.: Detecting and counting people in surveillance applications. In: IEEE conference on advanced video and signal based surveillance, AVSS 2005. IEEE, pp. 306–311 (2005)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)
Marana, A., Velastin, S., Costa, L., Lotufo, R.: Estimation of crowd density using image processing. In: IEE colloquium on image processing for security applications (Digest No.: 1997/074). IET, pp. 11–1 (1997)
Merchan-Perez, A., Rodriguez, J., Alonso-Nanclares, L., Schertel, A., DeFelipe, J.: Counting synapses using fib/sem microscopy: a true revolution for ultrastructural volume reconstruction. Front. Neuroanat. 3, 18 (2009)
Article Google Scholar
Moghaddam, B., Pentland, A.: Probabilistic visual learning for object representation. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 696–710 (1997)
Article Google Scholar
Onoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: European conference on computer vision, pp. 615–629. Springer (2016)
Pham, T.T., Chin, T.J., Schindler, K., Suter, D.: Interacting geometric priors for robust multimodel fitting. IEEE Trans. Image Process. 23(10), 4601–4610 (2014)
Article MathSciNet MATH Google Scholar
Pham, T.T., Hamid Rezatofighi, S., Reid, I., Chin, T.J.: Efficient point process inference for large-scale object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2837–2845 (2016)
Rajan, P., Han, W., Sznitman, R., Frazier, P., Jedynak, B.: Bayesian multiple target localization. In: International conference on machine learning, pp. 1945–1953 (2015)
Rasmussen, C.E., Williams, C.K.I.: Gaussian processes for machine learning. MIT Press (2006)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)
Ross, S.M.: Stochastic Processes. Wiley, New York (1996)
MATH Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Saatçi, Y.: Scalable inference for structured Gaussian process models. Ph.D. thesis, Citeseer (2012)
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol. 1, p. 6 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sindagi, V.A., Patel, V.M.: Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp. 1–6 (2017)
Sindagi, V.A., Patel, V.M.: A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recognit. Lett. 107, 3–16 (2018)
Article Google Scholar
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013). https://doi.org/10.1007/s11263-013-0620-5
Article Google Scholar
Verdie, Y., Lafarge, F.: Detecting parametric objects in large scenes by monte carlo sampling. Int. J. Comput. Vis. 106(1), 57–75 (2014)
Article MATH Google Scholar
Walach, E., Wolf, L.: Learning to count with cnn boosting. In: European conference on computer vision, pp. 660–676. Springer (2016)
Wang, C., Zhang, H., Yang, L., Liu, S., Cao, X.: Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM, pp. 1299–1302 (2015)
Warren, S., Hewett, P., Foltz, C.: The kx method for producing k-band flux-limited samples of quasars. Mon. Not. R. Astron. Soc. 312(4), 827–832 (2000)
Article Google Scholar

Download references

Acknowledgements

This work was made possible in part thanks to Portland Institute for Computational Science and its resources acquired using NSF Grant DMS 1624776 and ARO Grant W911NF-16-1-0307, and the Department of Computer Science, Johns Hopkins University.

Author information

Authors and Affiliations

Johns Hopkins University, 3400 N Charles St, Baltimore, MD, 21218, USA
Purnima Rajan
Portland State University, 724 SW Harrison St, Portland, OR, 97201, USA
Yongming Ma & Bruno Jedynak

Authors

Purnima Rajan
View author publications
You can also search for this author inPubMed Google Scholar
Yongming Ma
View author publications
You can also search for this author inPubMed Google Scholar
Bruno Jedynak
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Purnima Rajan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Method of Moments for Kernel Parameter Estimation

Suppose the image domain is $\varOmega $ with size $\left| \varOmega \right| =a\times b$, where a and b are, respectively, the number of rows and columns of pixels in the image. Let $N(\varOmega )$ be the number of instances of the object within $\varOmega $. According to the model, $N(\varOmega )|\lambda $$\sim $ Poisson $\left( \int _\varOmega \lambda (s)\,\mathrm{{d}}s\right) $, where $\lambda (s)=\alpha g^2(s)$ and s is a Gaussian process g$\sim $$\mathrm{{GP}}\left( 0,K\right) $ with,

$$\begin{aligned} K\left( s_1,s_2\right) =\sigma ^2 \mathrm{{exp}}\left\{ -\frac{1}{2l^2}\left\| s_1-s_2\right\| ^2 \right\} . \end{aligned}$$

Assume that we have m samples $N_1\left( \varOmega \right) \ldots N_m\left( \varOmega \right) $ over the same domain $\varOmega $, or over domains of the same size $\left| \varOmega \right| $. We can then use these samples together with the method of moments to estimate the parameters of the prior $\sigma $ and l as follows. Note that

$$\begin{aligned} E\left[ N\left( \varOmega \right) \right]&=E\left[ E\left[ N\left( \varOmega \right) |\lambda \right] \right] \\&=E\left[ \int _\varOmega \lambda (s)\,\mathrm{{d}}s \right] \\&=\int _\varOmega E[\lambda (s)]\mathrm{{d}}s\\&=\int _\varOmega E[\alpha g^2(s)]\mathrm{{d}}s\\&=\alpha \int _\varOmega K\left( s,s\right) \mathrm{{d}}s \\&=\alpha \int _\varOmega \sigma ^2 \mathrm{{d}}s \\&=\alpha \sigma ^2 \left| \varOmega \right| \end{aligned}$$

$$\begin{aligned}V\left[ N\left( \varOmega \right) \right]&=V\left[ E\left[ N\left( \varOmega \right) |\lambda \right] \right] +E\left[ V\left[ N\left( \varOmega \right) |\lambda \right] \right] \\&=V\left[ E\left[ N\left( \varOmega \right) |\lambda \right] \right] +E\left[ \int _\varOmega \lambda (s)\,\mathrm{{d}}s \right] \\&=V\left[ E\left[ N\left( \varOmega \right) |\lambda \right] \right] +\alpha \sigma ^2 \left| \varOmega \right| . \end{aligned}$$

Further, let $Z=V\left[ E\left[ N\left( \varOmega \right) |\lambda \right] \right] $, then

$$\begin{aligned} Z&=V\left[ \int _\varOmega \lambda \left( s\right) \mathrm{{d}}s \right] \\&=E\left[ \left( \int _\varOmega \lambda \left( s\right) \mathrm{{d}}s \right) ^2 \right] -E^2 \left[ \int _\varOmega \lambda \left( s\right) \mathrm{{d}}s \right] \\&=E\left[ \left( \int _\varOmega \alpha g^2 \left( s\right) \mathrm{{d}}s \right) ^2 \right] -\left( \alpha \sigma ^2 \left| \varOmega \right| \right) ^2 \\&=\alpha ^2 \int _\varOmega \int _\varOmega E\left[ g^2(s)g^2(t)\right] \mathrm{{d}}s\mathrm{{d}}t-\left( \alpha \sigma ^2 \left| \varOmega \right| \right) ^2 \\&=\alpha ^2 \int _\varOmega \int _\varOmega 2\mathrm{{cov}}^2(g(s),g(t))+v[g(s)]v[g(t)]\mathrm{{d}}s\mathrm{{d}}t\\&\quad -\left( \alpha \sigma ^2 \left| \varOmega \right| \right) ^2 \\&=2\alpha ^2 \sigma ^4 \left( \int _{s_1=0}^{a} \int _{t_1=0}^{a} \mathrm{{exp}} \left\{ -\frac{1}{l^2}(s_1-t_1)^2 \right\} \mathrm{{d}}s_1\mathrm{{d}}t_1\right) \\&\quad \left( \int _{s_2=0}^{b} \int _{t_2=0}^{b} \mathrm{{exp}} \left\{ -\frac{1}{l^2}(s_2-t_2)^2 \right\} \mathrm{{d}}s_2\mathrm{{d}}t_2 \right) \\&\quad +\alpha ^2 \sigma ^4 \left| \varOmega \right| ^2-\left( \alpha \sigma ^2 \left| \varOmega \right| \right) ^2\\&=2\alpha ^2\sigma ^4\left( \sqrt{\pi } al\sqrt{1-\exp \{-\frac{a}{l^2}\}}\right) \\&\quad \left( \sqrt{\pi } bl\sqrt{1-\exp \{-\frac{b}{l^2}\}}\right) . \end{aligned}$$

In the last equation above, if a and b are large and l is small, which are the real case in general, then terms $\sqrt{1-\exp \{-\frac{a}{l^2}\}}$ and $\sqrt{1-\exp \{-\frac{b}{l^2}\}}$ are approximately equal to 1 and we have $Z\approx 2\alpha ^2 \sigma ^4 \pi abl^2 $. Therefore, $V\left[ N\left( \varOmega \right) \right] =2\alpha ^2 \sigma ^4 \pi abl^2 +\alpha \sigma ^2 \left| \varOmega \right| $. Based on the method of moments, we have the following two equations:

$$\begin{aligned} \bar{N}&=\frac{1}{m} \sum _{i=1}^m N_i (\varOmega )=\alpha \sigma ^2 \left| \varOmega \right| \\ S^2&=\frac{1}{m}\sum _{i=m}^m (N_i(\varOmega )-\bar{N})^2=2 \alpha ^2 \sigma ^4 \pi abl^2 +\alpha \sigma ^2 \left| \varOmega \right| . \end{aligned}$$

Solve for $\sigma $ and l (set $\alpha =1$),

$$\begin{aligned} \begin{aligned} \sigma&=\sqrt{\frac{\bar{N}}{ab}}\\ l&=\sqrt{\frac{ab\big (V\left[ N\left( \varOmega \right) \right] -\bar{N}\big )}{2\pi \bar{N}^2}}. \end{aligned} \end{aligned}$$

(18)

1.2 Concavity of the Forward Model

Recall that $\lambda _m=\alpha \phi (g_m)$, with $\alpha > 0$. Choosing $p=2$, we obtain for the model in Eq. (3):

$$\begin{aligned} {\nabla _{g_m} \ln p(y|g)}&=-4\beta \big (g_m-\sqrt{y_m}\big )^{3} \end{aligned}$$

(19)

$$\begin{aligned} {\nabla \nabla _{g_m} \ln p(y|g)}&=-12\beta \big (g_m-\sqrt{y_m}\big )^{2}. \end{aligned}$$

(20)

Note that the last quantity is continuous and negative so that the matrix W is positive semi-definite.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rajan, P., Ma, Y. & Jedynak, B. Cox Processes for Counting by Detection. J Math Imaging Vis 61, 380–393 (2019). https://doi.org/10.1007/s10851-018-0838-5

Download citation

Received: 27 October 2017
Accepted: 28 July 2018
Published: 21 September 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10851-018-0838-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cox Processes for Counting by Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Counting in the Wild

Avoiding Over-Detection: Towards Combined Object Detection and Counting

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Method of Moments for Kernel Parameter Estimation

1.2 Concavity of the Forward Model

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now