Abstract
Aggregated Channel Features (ACF) proposed by Dollar [3] provide strong framework for pedestrian detection. In this paper we show that, fine tuning the parameters of the baseline ACF detector can achieve competitive performance without additional channels and filtering actions. We experimentally determined the optimized values of four parameters of ACF detector: (1) size of training dataset, (2) sliding window stride, (3) sliding window size and (4) number of bootstrapping stages. Accordingly, our optimized detector using pre learned eigen filters achieved state of the art performance compared with other variants of ACF detector on Caltech pedestrian dataset.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Detection of pedestrians from images has got special interest due to its wide spread applications in vision based systems. Aggregated Channel Features (ACF) proposed by Dollar [3], have a simple framework, used HOG [2] based channel features and produced the best result on Caltech-pedestrian dataset till that date. Vision researchers keep on increasing the performance of ACF detector either by adding more and more channels [6, 8] or by applying some filters on the existing channels on ACF [5, 9], such that they have achieved a miss rate almost less than 15% than that of original ACF detector. At this point, adding more channels or some more filtering actions on channels itself is not able to improve the performance further and so, researchers are now coupling the best performing ACF variant with deep networks [9].
In order to increase the performance of a detector, fine tuning of parameters like sliding window size, sliding window stride, training dataset size, number of bootstrapping stages etc. is also important. In this paper we study the effect of these parameters on the performance of the detector in terms of miss rate and propose an optimized set of parameters. In fact our ACF detector with the optimized parameter set outperformed many variants of ACF detector which either uses extra channels or additional filters on the existing channels. We further used decorrelated channel features (by convolution of the channels with decorrelated filters) along with our optimized parameter set and obtained the second best result on Caltech pedestrian dataset.
2 Related Work
In this section we briefly describe the variants of ACF detector. Detectors based on ACF can be basically classified into two; methods which add more channels to basic ACF detector and methods which apply appropriate filters on the existing channels of ACF detector. Paisitkriangkrai et al. [6] used spatially pooled covariance descriptors and local binary patterns as additional channels (total 259 channels), where as [8] Yang et al. [6] proposed Convolutional Channel Features (CCF) by adding channels extracted from images using a pre trained convolutional neural network (CNN). Squares Channel Features introduced by Benenson et al. [1] applied square sized averaging filters on the channels. Locally Decorrelated Channel Features (LDCF) proposed by Nam et al. [5] used decorrelated channel features for the training of the classifier. CheckerBoards [9] also uses a set of filters which consists of averaging filters, horizontal and vertical gradient filters and all possible checker board pattern filters in the given model window. CheckerBorads and its more faster version RotatedFilters [9] currently have the state of the art miss rate in Caltech pedestrian dataset. Even though detectors based on deep networks also provided state of the art result on Caltech pedestrian dataset [7], it comes at the cost of expensive hardware supports like GPUs and complex computations. Nevertheless, variants of ACF detector achieved comparable results with respect to methods based on deep networks in spite of lesser computations and hardware support.
3 Optimized ACF Detector
In this section we learn the best parameter values for ACF detector. In the baseline ACF detector we used, a given input image is represented as 10 lower resolution channels (LUV channels + gradient magnitude + six HOG channels). Table 1 shows the parameters used for fine tuning of ACF detector and their values used in the baseline detector. All the experiments are done on Caltech pedestrian dataset. Also, we used the new annotations provided by Benenson et al. [9] for training and testing. Figure 4 shows the number of pedestrian instances for two different heights (between 50–80 pixels and greater than 80 pixels) which are missed by the baseline detector and detected by the four variants of ACF detector (obtained by changing the above mentioned parameters).
3.1 Sliding Window Size
Sliding window size determines the number of features available for learning. Higher the number of features, better will be the performance of the detector and can be achieved by increasing the window size. But our experiments showed that changing the window size to 120\(\,\times \,\)60 increases the miss rate to 73.14\(\%\) (Fig. 1). This result is reasonable since the number of pedestrian instances in Caltech test set with height less than 120 pixels is very large. In order to get a reduced miss rate for bigger window size, sliding window scanning has to be done at upsampled scales along with down sampled scales for the input image. When we attempted this for the window size 120\(\,\times \,\)60, miss rate reduced by \(\sim 5\%\). Figure 1 shows the miss rate value for three different model sizes, with and without upsampling of the image. When upsampling is done for window size 64\(\,\times \,\)32, the miss rate is increased by \(5\%\), due to over fitting. From Fig. 4, when the window size is changed to 120\(\,\times \,\)60 with upsampling, large number of small pedestrian instances (\(\sim \)60) are detected, which means pedestrian of very small resolutions correctly fit into the new window size, when the images are upsampled. Figure 2 shows the learned classifier representation for the two window sizes 120\(\,\times \,\)60 and 64\(\,\times \,\)32. It can be observed that, bigger window size classifier representation have more similarity to human silhouette when compared with that of smaller window size. Hence window size of 120\(\,\times \,\)60 (with upsampling) is taken as the most effective sliding window size of the ACF detector for Caltech dataset.
3.2 Training Dataset Size
Six set of videos are available for training in the Caltech pedestrian dataset. Number of images available for training can be varied by changing the number of frames taken from the videos. From Fig. 3 it can be observed that miss rate decreases with increase in training dataset size upto a certain extent and then it is either increases or remains almost constant. The training dataset obtained by taking every third frame provided the best result and after that the classifier was getting over fitted, hence increase in miss rate. This indicates that further change in dataset size beyond that of 42782 images is needless and hence it is taken as the best dataset size. From Fig. 4 it is evident that the baseline detector is able to detect more instances of both low resolution and high resolution using the dataset obtained by taking every third frame from training video set.
3.3 Sliding Window Stride
Sliding window stride value defines the number of pixels skipped in between adjacent sliding window scans. Increasing the stride value will reduce computations, but detector will also miss out small instances of pedestrians. Decreasing the stride value has two advantages, while training it will enrich the detector with more false positives and while testing, the detector will not miss out small instances of pedestrians. Table 2 shows the miss rate for changing the stride value for four different cases. Decreasing the stride value for the both training and testing decreases the miss rate, but when we changed the stride value for training only miss rate increases, which indicates that baseline detector is over fitted when more false positives are added in training stage. For the third case (stride value change for testing stage only) miss rate attained the least value since there is no over fitting of classifier and detector is able to detect more small pedestrian instances in comparison with that of baseline detector and this is also depicted in Fig. 4. The optimum value of sliding window stride is taken as two.
3.4 Number of Bootstrapping Stages
Bootstrapping stages feed the detector with hard examples and hence increase the discriminative power of the detector. In our experiments, we modified the number of bootstrapping stages from four to five. Table 3 shows the miss rate of the baseline detector with respect to change in number of bootstrapping stages. The table shows that miss rate is reduced with increased number of bootstrapping rounds. Also, increasing the number of bootstrapping stages should always supplemented with larger training dataset. Hence we performed training with the optimum training dataset size as obtained in Sect. 3.2 and the result is provided in Table 3. We can see that miss rate is reduced by 8% from the baseline detector.
3.5 Final Detector
Combining all these best performing parameters together, we achieved state of the art performance for the ACF detector. Table 4 shows the step by step reduction in miss rate achieved by ACF detector by adding each optimized parameters we obtained from the above experiments. We have also further enhanced our optimized ACF detector by filtering the channels with top four pre learned eigenvectors which produces locally decorrelated channels [5]. This optimized ACF detector + eigen filters has provided the best result for Caltech database among the other variants of ACF detector. Table 5 shows the miss rate of our enhanced ACF detector and enhanced ACF detector + eigen filters along with other state of the art detectors for Caltech dataset. Rows in grey shade represents detectors using deep networks and all other methods are variants of ACF detector including our two proposed methods. The table depicts that our proposed method Optimized ACF + eigen filters detector has reduced miss rate than that of CheckerBoards [9] (the best method among the variants of ACF detector). Table 6 shows the comparison between different parameters of CheckerBoards and our proposed optimized ACF detector. From table we can see that CheckerBoards has a stride value of 6, which means reduced computations, but they have 61 filters, while we use only 4 filters, so the detection speed of our method is much faster (nearly 25x) than CheckerBoards (Table 6). Hence our method is completely efficient in comparison to CheckerBoards.
4 Conclusion
In this work, we fine tuned the four different parameters of ACF detector and found the optimized parameter set. Our ACF detector with improved parameter set achieved superior performance compared to other variants of ACF detector on Caltech pedestrian dataset. Also when we used decorrelated channels obtained by filtering with top four eigen filters on our optimized ACF detector we have achieved state of the art result on Caltech dataset. Our future work includes the use of the optimized parameters for deep network based detection.
References
Benenson, R., Mathias, M., Tuytelaars, T., Van Gool, L.: Seeking the strongest rigid detector. In: CVPR (2013)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893. IEEE (2005)
Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1532–1545 (2014)
Hosang, J., Omran, M., Benenson, R., Schiele, B.: Taking a deeper look at pedestrians. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4073–4082 (2015)
Nam, W., Dollár, P., Han, J.H.: Local decorrelation for improved pedestrian detection. In: Advances in Neural Information Processing Systems, pp. 424–432 (2014)
Paisitkriangkrai, S., Shen, C., van den Hengel, A.: Strengthening the effectiveness of pedestrian detection with spatially pooled features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 546–561. Springer, Cham (2014). doi:10.1007/978-3-319-10593-2_36
Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5087 (2015)
Yang, B., Yan, J., Lei, Z., Li, S.Z.: Convolutional channel features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 82–90 (2015)
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: CVPR (2016)
Acknowledgements
We gratefully acknowledge for the research fellowship (3501/(NET-DEC.2014)) provided by the University Grants Commission (UGC) Govt. of India.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Bastian, B.T., Jiji, C.V. (2017). Aggregated Channel Features with Optimum Parameters for Pedestrian Detection. In: Shankar, B., Ghosh, K., Mandal, D., Ray, S., Zhang, D., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2017. Lecture Notes in Computer Science(), vol 10597. Springer, Cham. https://doi.org/10.1007/978-3-319-69900-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-69900-4_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69899-1
Online ISBN: 978-3-319-69900-4
eBook Packages: Computer ScienceComputer Science (R0)