Abstract
In this work, doubly stochastic Poisson (Cox) processes and convolutional neural net (CNN) classifiers are used to estimate the number of instances of an object in an image. Poisson processes are well suited to model events that occur randomly in space, such as the location of objects in an image or the enumeration of objects in a scene. The proposed algorithm selects a subset of bounding boxes in the image domain, then queries them for the presence of the object of interest by running a pre-trained CNN classifier. The resulting observations are then aggregated, and a posterior distribution over the intensity of a Cox process is computed. This intensity function is summed up, providing an estimator of the number of instances of the object over the entire image. Despite the flexibility and versatility of Cox processes, their application to large datasets is limited as their computational complexity and storage requirements do not easily scale with image size, typically requiring \(O(n^3)\) computation time and \(O(n^2)\) storage, where n is the number of observations. To mitigate this problem, we employ the Kronecker algebra, which takes advantage of direct product structures. As the likelihood is non-Gaussian, the Laplace approximation is used for inference, employing the conjugate gradient and Newton’s method. Our approach has then close to linear performance, requiring only \(O(n^{3/2})\) computation time and O(n) memory. Results are presented on simulated data and on images from the publicly available MS COCO dataset. We compare our counting results with the state-of-the-art detection method, Faster RCNN, and demonstrate superior performance.






Similar content being viewed by others
References
Kaggle competition noaa fisheries steller sea lion population count. https://www.kaggle.com/c/noaa-fisheries-steller-sea-lion-population-count. Accessed 28 April 2017
Arteta, C., Lempitsky, V., Noble, J.A., Zisserman, A.: Interactive object counting. In: European conference on computer vision, pp. 504–518. Springer (2014)
Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: IEEE conference on computer vision and pattern recognition, CVPR 2008. IEEE, pp. 1–7 (2008)
Cho, S.Y., Chow, T.W., Leung, C.T.: A neural-based crowd estimation by hybrid global learning algorithm. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 29(4), 535–541 (1999)
Cox, D.R.: Some statistical methods connected with series of events. J. R. Stat. Soc. Ser. B (Methodol.) 17, 129–164 (1955)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, CVPR 2005. IEEE, vol. 1, pp. 886–893 (2005)
Dassios, A., Jang, J.W.: Pricing of catastrophe reinsurance and derivatives using the cox process with shot noise intensity. Financ. Stoch. 7(1), 73–95 (2003)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Fiaschi, L., Koethe, U., Nair, R., Hamprecht, F.A.: Learning to count with regression forest and structured labels. In: 2012 21st international conference on pattern recognition (ICPR). IEEE, pp. 2685–2688 (2012)
Flaxman, S., Wilson, A.G., Neill, D.B., Nickisch, H., Smola, A.J.: Fast kronecker inference in Gaussian processes with non-Gaussian likelihoods. In: International conference on machine learning, vol. 2015 (2015)
Ge, W., Collins, R.T.: Marked point processes for crowd counting. In:IEEE conference on computer vision and pattern recognition, CVPR 2009. IEEE, pp. 2913–2920 (2009)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)
Han, W., Rajan, P., Frazier, P.I., Jedynak, B.M.: Bayesian group testing under sum observations: a parallelizable two-approximation for entropy loss. IEEE Trans. Inf. Theory 63(2), 915–933 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp. 346–361. Springer (2014)
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2547–2554 (2013)
Kingman, J.F.C.: Poisson Processes. Clarendon Press, Oxford (1993)
Kong, D., Gray, D., Tao, H.: A viewpoint invariant approach for crowd counting. In: 18th international conference on pattern recognition, ICPR 2006. IEEE, vol. 3, pp. 1187–1190 (2006)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)
Lafarge, F., Descombes, X.: Geometric feature extraction by a multimarked point process. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1597–1609 (2010)
Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in neural information processing systems, pp. 1324–1332 (2010)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)
Liu, X., Tu, P.H., Rittscher, J., Perera, A., Krahnstoever, N.: Detecting and counting people in surveillance applications. In: IEEE conference on advanced video and signal based surveillance, AVSS 2005. IEEE, pp. 306–311 (2005)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)
Marana, A., Velastin, S., Costa, L., Lotufo, R.: Estimation of crowd density using image processing. In: IEE colloquium on image processing for security applications (Digest No.: 1997/074). IET, pp. 11–1 (1997)
Merchan-Perez, A., Rodriguez, J., Alonso-Nanclares, L., Schertel, A., DeFelipe, J.: Counting synapses using fib/sem microscopy: a true revolution for ultrastructural volume reconstruction. Front. Neuroanat. 3, 18 (2009)
Moghaddam, B., Pentland, A.: Probabilistic visual learning for object representation. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 696–710 (1997)
Onoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: European conference on computer vision, pp. 615–629. Springer (2016)
Pham, T.T., Chin, T.J., Schindler, K., Suter, D.: Interacting geometric priors for robust multimodel fitting. IEEE Trans. Image Process. 23(10), 4601–4610 (2014)
Pham, T.T., Hamid Rezatofighi, S., Reid, I., Chin, T.J.: Efficient point process inference for large-scale object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2837–2845 (2016)
Rajan, P., Han, W., Sznitman, R., Frazier, P., Jedynak, B.: Bayesian multiple target localization. In: International conference on machine learning, pp. 1945–1953 (2015)
Rasmussen, C.E., Williams, C.K.I.: Gaussian processes for machine learning. MIT Press (2006)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)
Ross, S.M.: Stochastic Processes. Wiley, New York (1996)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Saatçi, Y.: Scalable inference for structured Gaussian process models. Ph.D. thesis, Citeseer (2012)
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol. 1, p. 6 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sindagi, V.A., Patel, V.M.: Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp. 1–6 (2017)
Sindagi, V.A., Patel, V.M.: A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recognit. Lett. 107, 3–16 (2018)
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013). https://doi.org/10.1007/s11263-013-0620-5
Verdie, Y., Lafarge, F.: Detecting parametric objects in large scenes by monte carlo sampling. Int. J. Comput. Vis. 106(1), 57–75 (2014)
Walach, E., Wolf, L.: Learning to count with cnn boosting. In: European conference on computer vision, pp. 660–676. Springer (2016)
Wang, C., Zhang, H., Yang, L., Liu, S., Cao, X.: Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on Multimedia. ACM, pp. 1299–1302 (2015)
Warren, S., Hewett, P., Foltz, C.: The kx method for producing k-band flux-limited samples of quasars. Mon. Not. R. Astron. Soc. 312(4), 827–832 (2000)
Acknowledgements
This work was made possible in part thanks to Portland Institute for Computational Science and its resources acquired using NSF Grant DMS 1624776 and ARO Grant W911NF-16-1-0307, and the Department of Computer Science, Johns Hopkins University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Method of Moments for Kernel Parameter Estimation
Suppose the image domain is \(\varOmega \) with size \(\left| \varOmega \right| =a\times b\), where a and b are, respectively, the number of rows and columns of pixels in the image. Let \(N(\varOmega )\) be the number of instances of the object within \(\varOmega \). According to the model, \(N(\varOmega )|\lambda \)\(\sim \) Poisson \(\left( \int _\varOmega \lambda (s)\,\mathrm{{d}}s\right) \), where \(\lambda (s)=\alpha g^2(s)\) and s is a Gaussian process g\(\sim \)\(\mathrm{{GP}}\left( 0,K\right) \) with,
Assume that we have m samples \(N_1\left( \varOmega \right) \ldots N_m\left( \varOmega \right) \) over the same domain \(\varOmega \), or over domains of the same size \(\left| \varOmega \right| \). We can then use these samples together with the method of moments to estimate the parameters of the prior \(\sigma \) and l as follows. Note that
Further, let \(Z=V\left[ E\left[ N\left( \varOmega \right) |\lambda \right] \right] \), then
In the last equation above, if a and b are large and l is small, which are the real case in general, then terms \(\sqrt{1-\exp \{-\frac{a}{l^2}\}}\) and \(\sqrt{1-\exp \{-\frac{b}{l^2}\}}\) are approximately equal to 1 and we have \(Z\approx 2\alpha ^2 \sigma ^4 \pi abl^2 \). Therefore, \(V\left[ N\left( \varOmega \right) \right] =2\alpha ^2 \sigma ^4 \pi abl^2 +\alpha \sigma ^2 \left| \varOmega \right| \). Based on the method of moments, we have the following two equations:
Solve for \(\sigma \) and l (set \(\alpha =1\)),
1.2 Concavity of the Forward Model
Recall that \(\lambda _m=\alpha \phi (g_m)\), with \(\alpha > 0\). Choosing \(p=2\), we obtain for the model in Eq. (3):
Note that the last quantity is continuous and negative so that the matrix W is positive semi-definite.
Rights and permissions
About this article
Cite this article
Rajan, P., Ma, Y. & Jedynak, B. Cox Processes for Counting by Detection. J Math Imaging Vis 61, 380–393 (2019). https://doi.org/10.1007/s10851-018-0838-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-018-0838-5