Estimation of Pedestrian Height Using Uncalibrated Cameras

Valdés-Camejo, Alejandro; Aguirre-Carrazana, Guillermo; Alonso-Baryolo, Raúl; Morales-González, Annette; Silva-Mata, Francisco J.; García-Reyes, Edel

doi:10.1007/978-3-319-75193-1_82

Alejandro Valdés-Camejo¹⁵,
Guillermo Aguirre-Carrazana¹⁵,
Raúl Alonso-Baryolo¹⁵,
Annette Morales-González ORCID: orcid.org/0000-0003-2716-3144¹⁵,
Francisco J. Silva-Mata¹⁵ &
…
Edel García-Reyes¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10657))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

2098 Accesses
1 Citations

Abstract

The height of a person can be used as a Soft Biometrics feature in surveillance scenarios. The automatic estimation of pedestrian height have been addressed mostly on calibrated cameras. We are proposing a new method for real height estimation in videos from uncalibrated cameras. Our proposal computes the horizon line within the scene and then, the relative height of each person is obtained. We employ the real height distribution of a population to provide the final height value. In this process, it has been included an evaluation of the silhouette’s quality, in order to improve the results. Experiments were conducted in uncontrolled scenarios, showing a good performance of our method.

You have full access to this open access chapter, Download conference paper PDF

A simplified nonlinear regression method for human height estimation in video surveillance

Article Open access 12 October 2015

Automatic Calibration of Stationary Surveillance Cameras in the Wild

The Measurement of Human Height Based on Coordinate Transformation

Keywords

1 Introduction

Detection, localization and identification of people are some of the hardest problems in video-surveillance research field with an increasing interest in last years. Currently, the use of physical and behavioral traits has been introduced as a recognition technique which is known as Soft Biometrics [1]. This can be used to filter large amounts of data or to identify and re-identify individuals.

This paper focuses on determining human height, which is a geometric information that can be automatically estimated in scenarios from video-surveillance. Unlike biometric features, this information can be obtained from videos taken at long distances where people walk in any directions. Height estimation of people can be used both as Soft Biometrics and as a person tracking feature. The first case eliminates some possible subjects having considerably different heights than the target, and focus on determining distinctive identification features. It can be used for temporal and spatial correspondence analysis for person tracking.

A significant number of works tackle the problem of object height estimation. Some of them are based on calibrated cameras and they use their intrinsic and extrinsic parameters for estimation [2, 3]. On the other hand, there are methods based on uncalibrated cameras [4,5,6,7,8]. Generally, these methods perform a camera’s auto-calibration based on scene geometry or object tracking [6, 9]. The main problem of these methods is that they require information about extrinsic parameters of the camera or measures in 3D world as reference for the estimation.

We propose a new method for estimating real pedestrian height using uncalibrated cameras. This work is based on the algorithm of Richardson et al. [10] who proposed a method to estimate the image horizon from multiple tracked objects and an estimation of relative height. Our differences w.r.t that work, and main contributions are: (1) we introduce an algorithm to evaluate the human silhouette quality within the processes of horizon detection and height estimation, and (2) a new method to estimate real human height, that obtains the real height value of pedestrians in uncalibrated cameras without using any prior knowledge about the scene geometry or camera parameters.

2 Estimation of the Horizon and Relative Height

An important step for object height estimation is the image horizon estimation. For this task, Richardson et al. [10] established the condition that the camera X-axis must be parallel to the ground plane of the 3D-world, which implies that the horizon can be uniquely defined by a single value in the image Y-axis. Nevertheless, if this condition is not met, they proposed a method to find the angle between the camera’s X-axis and the ground plane in order to rotate the video. Also a flat ground plane is assumed. Under this condition, when an object moves in a way that its position in the image Y-axis ascends, it becomes smaller. The image horizon is defined as the position where the moving object becomes infinitesimally small.

Richardson et al. [10] demonstrated that a linear relationship exists between the vertical image position (\(y_0\)) and its height (\(h_0\)) on the image (See Fig. 1b). In other words, for a moving object in the scene, the equation \(y=mh+n\) (height equation) is met. Parameters m, n from this equation can be estimated by performing a linear regression from a set of values \(\{(h_i,y_i)\}\) extracted from the track of a moving object (See Fig. 1a). In the equation, n is the vertical position in the image where the object’s height becomes zero. Therefore, n represents the horizon according to the definition (See Fig. 1b).

The information obtained from multiple tracked objects is necessary for the horizon estimation since the tracking points from an object can be partially affected by occlusion, segmentation, identity changes and others. A robust method based on the Hough transform is presented in [10], as a voting procedure. In the height equation the value of n obtained for each object is considered as a vote. A histogram is created with these values on the y axis of the image. The resulting probability function for the horizon will have a sharp peak at \(y_h\) (the most voted value).

If the image horizon (\(y_h\)) is found, for a pair (\(y_i, h_i\)) from a person, it is possible to estimate this person’s height in pixels for all possible positions using the line that joins (\(0, y_h\)) to (\(y_i, h_i\)) (height line). Being A and B two objects at the scene, a relationship between objects’ height and the slope of their height lines can be established: A is higher than B if and only if the line’s slope of A is bigger than the line’s slope of B. Note that, if A is larger than B (See Fig. 2b) then for two points \(p_a (x_A, y_1)\) and \(p_b (x_B, y_1)\) at \(y_1\) position, the relationship \((y_1 - y_h )/x_B < (y_1 - y_h )/x_A\) is met. The backward implication is similar. The slopes of these lines are a good estimation value of its relative height within the scene.

3 Silhouette Quality Process

In the work of Richardson et al. [10], all the points extracted from each moving object in the video, are used to estimate the horizon and height. They consider several types of objects: cars, people, animals, among others. People are the only interest in our work. Also, as was mentioned before, the points might be affected by segmentation and tracking problems which cause significant variations on the line parameters. For those reasons, in the present work we propose to introduce a process that allows to select silhouettes belonging to people. The process consists on a shape matching algorithm to compare the frame silhouettes to a set of prototypes. As a result, depending on the matching score, it is possible to select several person’s silhouettes with good quality. In our case, a good quality silhouette must be well segmented (with no missing body parts). For horizon and height estimation we only consider these good silhouettes.

3.1 Shape Matching Algorithm

Currently, there are many works for shape matching. The selected method is known as Turning Angle [11]. This is an efficient method with good results, as it is shown in the survey conducted in [12]. Also, in our case, since the persons are always standing, this method can be even faster by omitting the starting point selection in the matching process. In this method, a description of the shape is created based on its contour. The angle between the shape slope at a given point and the X-axis is calculated for each contour point in an ordered way. The shape representation is defined as the sequence of contour angles (Fig. 3). To compare two representations the Dynamic Time Warping method is used.

3.2 Prototype Selection

The silhouettes from CASIA-B database [13] were used for prototype selection. In this database there are people walking in 11 different angles under controlled environments. We selected 20 people ensuring the presence of men and women. Then, an algorithm for clustering was used to split the silhouette sets into groups according to their similarity and we selected its representative element as prototype. The algorithm used was k-medoids, since it is simple, fast and provides a representative object for each cluster. The distance among silhouettes used for the k-medoids algorithm was the edit distance. It was calculated over Freeman chains extracted from silhouettes contours. We selected 50 prototypes (Fig. 4). This number was chosen as a trade-off between having enough samples and faster processing times.

4 Estimation of the Real Height

The real height estimation proposed in this work is based on the relative height sampling and its comparison with a known height distribution. For example, it is well known that the Cuban population heights has a normal distribution with \(\mu = 1.68\,\mathrm{m}\) for men and \(\mu = 1.56\,\mathrm{m}\) for women [14]. The overall mean is 1.61 m.

Once the image horizon is found, we propose to store a certain amount of relative heights calculated from good silhouettes. Then, the normal distribution parameters are calculated. Under the assumption that relative heights follow the same distribution that real height, we propose to make a distribution match by their means. The real height of a person will be the corresponding value in the real height distribution given that person’s relative height.

5 Experimental Results

There is not a unified criterion in literature for evaluating the results in this topic. Moreover, there are no public datasets or real height data to compare with other works. Authors usually present the results in their own databases under controlled environments [5, 15].

The accuracy in the horizon estimation is very subjective, the exact value depends on a technical opinion. Nevertheless, in the majority of the analyzed videos, the estimated value falls within a neighborhood close to the real apparent horizon. The influence of introducing the silhouette’s quality process can be observed in Fig. 5. The horizon estimation in Fig. 5b shows better results than Fig. 5a and the vote distribution is more compact. Note that, in the case of Fig. 5b, the bounding boxes corresponding to two superposed persons, or to an incomplete person, were classified as bad, and not selected for estimation.

To test the height estimation we captured several videos under uncontrolled scenarios. The video scene is a parking lot where people walk towards the camera (See Fig. 6). The camera resolution is \(625 \times 500\). To estimate the horizon and relative height distribution we considered the first 40 people in the video. Then, the height estimations of other 30 people were compared with their measured heights. People were measured wearing shoes. The results are shown in Table 1 split into men and women.

Table 1. Results of the estimation algorithm (1) for men and (2) for women: In table: E.H is estimated height; M.H. is measured height and Diff is the difference between them

Full size table

The automatic estimations for the 30 people are in a difference range of 4 cm w.r.t. the real value. This corresponds to a \(100\%\) of accuracy if we consider an error of \({\pm }5\,\mathrm{cm}\). Also in the range of difference of 1 cm there are 16 out of 30 correct measurements (\(53\%\)). The range of height estimation was [1.60–1.82]. The overall mean error was 1.53 cm. The mean error for men was 1.35 cm and for women 1.90 cm. In our model, 90% of the estimated values variability is explained by the linear relation with observed values. The F statistic value is 166.81 and is statistically significant with p = 0.0001. There is evidence to refuse the null hypothesis and thus it is possible to set up a model from linear regression. This method was tested in a PC Intel Core I7 with 8 GB RAM. The processing time of 1 frame was 25 ms, therefore, we are able to process 40 frames per second, achieving real time conditions.

In order to understand these results we provide further considerations. First, human walking is a cyclic set of different poses. When legs are open the height is lower than when they are closed. For estimation we considered the values corresponding to the pose where the person appears more straight. Second, the height of blobs is affected by footwear and hairstyle, that explains why the error in women is a slightly larger than in men. Some of the women used for testing were wearing high shoes or up hairstyles. The measurement of the real height and the capture of the test videos were done in different dates. Despite the quality method, some errors can occur. If parts of a silhouette are selected as good, it decreases the distribution mean and tends to overestimate the heights.

The main limitation of the present work is that it needs a large number of people for a good approximation of the population height distribution. If the number is too small then there is no guarantee that the sample mean will be close to the population mean. Nevertheless, the distribution parameters computation can be updated with the relative height of each new tracked person that appears in the scene. Also, the view angle must be oblique enough to notice differences on the silhouette height when its position changes.

6 Conclusions

In this work we presented a method to estimate the height of pedestrians in uncalibrated cameras. The method does not require information about the scene or camera parameters. The height estimation was based on existing techniques, but this work incorporates a step for evaluating human silhouette quality. This helps to overcome some segmentation problems that affect the results. The accuracy of the proposed method has been experimentally shown in outdoor environments, under uncontrolled lighting conditions. An error mean of 1.53 cm was achieved for 30 test subjects from our database. We believe that real height estimation can be a very useful feature for people tracking, and in future work it can be used as a step for camera calibration.

References

Dantcheva, A., Velardo, C., Dangelo, A., Dugelay, J.L.: Bag of soft biometrics for person identification. Multimedia Tools Appl. 51(2), 739–777 (2011)
Article Google Scholar
Hansen, D.M., Mortensen, B.K., Duizer, P.T., Andersen, J.R., Moeslund, T.B.: Automatic annotation of humans in surveillance video. In: Fourth Canadian Conference on Computer and Robot Vision, CRV 2007, pp. 473–480. IEEE (2007)
Google Scholar
Kispál, I., Jeges, E.: Human height estimation using a calibrated camera. In: Proceedings of the CVPR (2008)
Google Scholar
Arigbabu, O.A., Ahmad, S.M.S., Adnan, W.A.W., Yussof, S., Iranmanesh, V., Malallah, F.L.: Estimating body related soft biometric traits in video frames. Sci. World J. 2014 (2014)
Google Scholar
Jung, J., Kim, H., Yoon, I., Paik, J.: Human height analysis using multiple uncalibrated cameras. In: 2016 IEEE International Conference on Consumer Electronics (ICCE), pp. 213–214. IEEE (2016)
Google Scholar
Jung, J., Yoon, I., Lee, S., Paik, J.: Object detection and tracking-based camera calibration for normalized human height estimation. J. Sens. 2016, 8347841:1–8347841:9 (2016)
Article Google Scholar
Criminisi, A., Zisserman, A., Van Gool, L., Bramble, S., Compton, D.: A new approach to obtain height measurements from video. In: Proceedings of SPIE, vol. 3576 (1998)
Google Scholar
De Angelis, D., Sala, R., Cantatore, A., Poppa, P., Dufour, M., Grandi, M., Cattaneo, C.: New method for height estimation of subjects represented in photograms taken from video surveillance systems. Int. J. Legal Med. 121(6), 489–492 (2007)
Article Google Scholar
O’Gorman, L., Yang, G.: Orthographic perspective mappings for consistent wide-area motion feature maps from multiple cameras. IEEE Trans. Image Process. 25(6), 2817–2832 (2016)
Article MathSciNet Google Scholar
Richardson, E., Peleg, S., Werman, M.: Scene geometry from moving objects. In: 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 13–18. IEEE (2014)
Google Scholar
McConnell, R., Kwok, R., Curlander, J.C., Kober, W., Pang, S.S.: \(\uppsi \)-s correlation and dynamic time warping: two methods for tracking ice floes in SAR images. IEEE Trans. Geosci. Remote Sens. 29(6), 1004–1012 (1991)
Article Google Scholar
Mingqiang, Y., Kidiyo, K., Joseph, R.: A Survey of Shape Feature Extraction Techniques. INTECH Open Access Publisher (2008)
Google Scholar
Yu, S., Tan, D., Tan, T.: A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 4, pp. 441–444. IEEE (2006)
Google Scholar
Centro de estudios de población y desarrollo, Cálculos de peso y talla promedio de la población por provincias y Cuba, Oficina Nacional de Estadística (2008)
Google Scholar
Shao, J., Zhou, S.K., Chellappa, R.: Robust height estimation of moving objects from uncalibrated videos. IEEE Trans. Image Process. 19(8), 2221–2232 (2010)
Article MathSciNet MATH Google Scholar
Ferryman, J., Shahrokni, A.: Pets2009: dataset and challenge. In: 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS-Winter), pp. 1–6. IEEE (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Advanced Technologies Application Center, 7a # 21812 b/ 218 and 222, Rpto. Siboney, Playa, 12200, Havana, Cuba
Alejandro Valdés-Camejo, Guillermo Aguirre-Carrazana, Raúl Alonso-Baryolo, Annette Morales-González, Francisco J. Silva-Mata & Edel García-Reyes

Authors

Alejandro Valdés-Camejo
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Aguirre-Carrazana
View author publications
You can also search for this author in PubMed Google Scholar
Raúl Alonso-Baryolo
View author publications
You can also search for this author in PubMed Google Scholar
Annette Morales-González
View author publications
You can also search for this author in PubMed Google Scholar
Francisco J. Silva-Mata
View author publications
You can also search for this author in PubMed Google Scholar
Edel García-Reyes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Annette Morales-González .

Editor information

Editors and Affiliations

Universidad Federico Santa María, Santiago, Chile
Marcelo Mendoza
Carlos III University of Madrid, Madrid, Spain
Sergio Velastín

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Valdés-Camejo, A., Aguirre-Carrazana, G., Alonso-Baryolo, R., Morales-González, A., Silva-Mata, F.J., García-Reyes, E. (2018). Estimation of Pedestrian Height Using Uncalibrated Cameras. In: Mendoza, M., Velastín, S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2017. Lecture Notes in Computer Science(), vol 10657. Springer, Cham. https://doi.org/10.1007/978-3-319-75193-1_82

Download citation

DOI: https://doi.org/10.1007/978-3-319-75193-1_82
Published: 04 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75192-4
Online ISBN: 978-3-319-75193-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)