Abstract
We propose a novel node splitting method for regression trees and incorporate it into the random regression forest framework. Unlike traditional binary splitting, where the splitting rule is selected from a predefined set of binary splitting rules via trial-and-error, the proposed node splitting method first finds clusters in the training data which at least locally minimize the empirical loss without considering the input space. Then splitting rules which preserve the found clusters as much as possible, are determined by casting the problem as a classification problem. Consequently, our new node splitting method enjoys more freedom in choosing the splitting rules, resulting in more efficient tree structures. In addition to the algorithm for the ordinary Euclidean target space, we present a variant which can naturally deal with a circular target space by the proper use of circular statistics. In order to deal with challenging, ambiguous image-based pose estimation problems, we also present a voting-based ensemble method using the mean shift algorithm. Furthermore, to address data imbalanceness problems present in some of the datasets, we propose a bootstrap sampling method using a sample weighting technique. We apply the proposed random regression forest algorithm to head pose estimation, car direction estimation and pedestrian orientation estimation tasks, and demonstrate its competitive performance.
Similar content being viewed by others
Notes
In the earlier version Hara and Chellappa (2014) and in Pelleg and Moore (2000), the variance is incorrectly estimated by missing q in the denominator. The results of the AKRF on Pointing’04 datasets have been updated. However, the difference is insignificant. The results on EPFL Multi-vew Car Dataset are unaffected as \(q=1\).
References
Andriluka, M., Roth, S., & Schiele, B. (2010). Monocular 3D pose estimation and tracking by detection. In CVPR 2010: IEEE conference on computer vision and pattern recognition.
Bailly, K., Milgram, M., & Phothisane, P. (2009). Head pose estimation by a stepwise nonlinear regression. In International conference on computer analysis of images and patterns.
Baltieri, D., Vezzani, R., & Cucchiara, R. (2012). People orientation recognition by mixtures of wrapped distributions on random trees. In European conference on computer vision. Heidelberg: Springer.
Berzal, F., Cubero, J. C., Marn, N., & Sánchez, D. (2004). Building multi-way decision trees with numerical attributes. Information Sciences, 165(1–2), 73–90.
Bissacco, A., Yang, M. H., & Soatto, S. (2007). Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In 2007 IEEE conference on computer vision and pattern recognition.
Breiman, L. (2001). Random forest. Machine Learning, 45(1), 5–32.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. London: Chapman and Hall/CRC.
Cao, X., Wei, Y., Wen, F., & Sun, J. (2012). Face alignment by explicit shape regression. In IEEE conference on computer vision and pattern recognition (CVPR).
Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27.
Chang-Chien, S. J., Hung, W. L., & Yang, M. S. (2012). On mean shift-based clustering for circular data. Soft Computing, 16(6), 1043–1060.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique nitesh. Journal of Artificial Intelligence Research, 16, 321–357.
Chen, C., Liaw, A., & Breiman, L. (2004). Using random forest to learn imbalanced data. UC Berkeley: Technical report, Department of Statistics.
Chen, C., Heili, A., & Odobez, J. M. (2011). Combined estimation of location and body pose in surveillance video. In International conference on advanced video and signal based surveillance (AVSS)
Cheng, Y. (1995). Mean shift, mode seeking, and clustering. PAMI, 17(8), 790–799.
Chou, P. A. (1991). Optimal partitioning for classification and regression trees. PAMI, 13(4), 340–354.
Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. PAMI, 24(5), 603–619.
Criminisi, A., & Shotton, J. (2013). Decision forests for computer vision and medical image analysis. New York: Springer.
Criminisi, A., Shotton, J., Robertson, D., & Konukoglu, E. (2010). Regression forests for efficient anatomy detection and localization in CT studies. In Medical computer vision. Recognition techniques and applications in medical imaging (Vol. 6533, pp. 106–117).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05).
Dantone, M., Gall, J., Fanelli, G., & Gool, L. V. (2012). Real-time facial feature detection using conditional regression forests. In 2012 IEEE conference on computer vision and pattern recognition (CVPR).
Dobra, A., & Gehrke, J. (2002). Secret: A scalable linear regression tree algorithm. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining.
Dollár, P., Welinder, P., & Perona, P. (2010). Cascaded pose regression. In 2010 IEEE conference on computer vision and pattern recognition (CVPR).
Domingos, P. (1999). MetaCost: A general method for making classifiers cost-sensitive. In Proceedings of the 5th ACM SIGKDD international conference on Knowledge discovery and data mining.
Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. (1996). Support vector regression machines. In Advances in neural information processing systems NIPS
Drummond, C., & Holte, R. C. (2003). C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In: ICML workshop on learning from imbalanced datasets II.
Duin, R. P. W. (1976). On the choice of smoothing parameters for parzen estimators of probability density functions. IEEE Transactions on Computers, C–25(11), 1175–1179.
Enzweiler, M., & Gavrila, D. M. (2010). Integrated pedestrian classification and orientation estimation. In CVPR 2010: IEEE conference on computer vision and pattern recognition
Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.
Fanelli, G., Gall, J., & Gool, L. V. (2011). Real time head pose estimation with random regression forests. In 2011 IEEE conference on computer vision and pattern recognition (CVPR)
Fayyad, U. M., Irani, & K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the international joint conference on uncertainty in AI
Fenzi, M., & Ostermann, J. (2014). Embedding geometry in generative models for pose estimation of object categories. In British machine vision conference.
Fenzi, M., Leal-taixé, L., Rosenhahn, B., & Ostermann, J. (2013). Class generative models based on feature regression for pose estimation of object categories. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Fenzi, M., Leal-taixé, L., Ostermann, J., & Tuytelaars, T. (2015). Continuous pose estimation with a spatial ensemble of fisher regressors. In Proceedings of the IEEE international conference on computer vision (ICCV).
Fisher, N. I. (1996). Statistical analysis of circular data. Cambridge: Cambridge University Press.
Fukunaga, K., & Hostetler, L. D. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1), 32–40.
Gaile, G. L., & Burt, J. E. (1980). Directional statistics (concepts and techniques in modern geography). Norwich: Geo Abstracts Ltd.
Gall, J., & Lempitsky, V. (2009). Class-specific hough forests for object detection. In IEEE conference on computer vision and pattern recognition (CVPR).
Gandhi, T., & Trivedi, M. M. (2008). Image based estimation of pedestrian orientation for improving path prediction. In Intelligent vehicles symposium.
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.
Girshick, R., Shotton, J., Kohli, P., Criminisi, A., & Fitzgibbon, A. (2011). Efficient regression of general-activity human poses from depth images. In 2011 IEEE international conference on computer vision (ICCV).
Goto, K., Kidono, K., Kimura, Y., & Naito, T. (2011). Pedestrian detection and direction estimation by cascade detector with multi-classifiers utilizing feature interaction descriptor. In IEEE intelligent vehicles symposium (IV).
Gourier, N., Hall, D., & Crowley, J. L. (2004). Estimating face orientation from robust detection of salient facial structures. In ICPR international workshop on visual observation of deictic gestures.
Habbema, J. D. F., & Hermans, J. (1977). Selection of variables in discriminant analysis by F-statistic and error rate. Technometrics, 19(4), 487–493.
Haj, M. A., Gonzalez, J., & Davis, L. S. (2012). On partial least squares in head pose estimation: How to simultaneously deal with misalignment. In 2012 IEEE conference on computer vision and pattern recognition (CVPR).
Hara, K., & Chellappa, R. (2013). Computationally efficient regression on a dependency graph for human pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Hara, K., & Chellappa, R. (2014). Growing regression forests by classification: Applications to object pose estimation. In The European conference on computer vision (ECCV).
He, K., Sigal, L., & Sclaroff, S. (2014). Parameterizing object detectors in the continuous pose space. In The European conference on computer vision (ECCV).
Herdtweck, C., & Curio, C. (2013). Monocular car viewpoint estimation with circular regression forests. In Intelligent vehicles symposium (IVS)
Ho, H. T., & Chellappa, R. (2012). Automatic head pose estimation using randomly projected dense SIFT descriptors. In 2012 19th IEEE international conference on image processing.
Huang, C., Ding, X., & Fang, C. (2010). Head pose estimation based on random forests for multiclass classification. In 2010 20th International conference on pattern recognition (ICPR).
Kafai, M., Miao, Y., & Okada, K. (2010). Directional mean shift and its application for topology classification of local 3D structures. In CVPR workshop.
Kashyap, R. L. (1977). A Bayesian comparison of different classes of dynamic models using empirical data. IEEE Transactions on Automatic Control, 22(5), 715–727.
Kobayashi, T., & Otsu, N. (2010). Von mises-fisher mean shift for clustering on a hypersphere. In 2010 20th International conference on pattern recognition (ICPR).
Kubat, M., Holte, R., & Matwin, S. (1997). Learning when negative examples abount. In Proceedings of ECML-97, 10th European conference on machine learning.
Loh, W. Y., & Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis. Journal of the American Statistical Association, 83(403), 715–725.
Mardia, K. V., & Jupp, P. (2000). Directional statistics (2nd ed.). New York: Wiley.
Nakajima, C., Pontil, M., Heisele, B., & Poggio, T. (2003). Full-body person recognition system. Pattern Recognition, 36(9), 1997–2006.
Orozco, J., Gong, S., & Xiang, T. (2009). Head pose classification in crowded scenes. In Procedings of the British machine vision conference (BMVC 2009).
Ozuysal, M., Lepetit, V., & Fua, P. (2009). Pose estimation for category specific multiview object localization. In 2009 IEEE conference on computer vision and pattern recognition (CVPR).
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. In Proceedings of the 11th international conference on machine learning.
Pelleg, D., & Moore, A. (2000). X-means: Extending K-means with efficient estimation of the number of clusters. In Proceedings of the 17th international conference on machine learning.
Redondo-cabrera, C., Lopez-Sastre, R., & Tuytelaars, T. (2014). All together now : Simultaneous object detection and continuous pose estimation using a hough forest with probabilistic locally enhanced voting. In 25th British machine vision conference—BMVC.
Rosipal, R., & Trejo, L. J. (2001). Kernel partial least squares regression in reproducing kernel hilbert space. JMLR, 2, 97–123.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Shimizu, H., & Poggio, T. (2004). Direction estimation of pedestrian from multiple still images. In Intelligent vehicles symposium (IVS).
Sun, M., Kohli, P., & Shotton, J. (2012). Conditional regression forests for human pose estimation. In 2012 IEEE conference on computer vision and pattern recognition (CVPR).
Tao, J., & Klette, R. (2013). Integrated pedestrian and direction classification using a random decision forest. In ICCV Workshop.
Torgo, L., & Gama, J. (1996). Regression by classification. In Brazilian symposium on artificial intelligence.
Torgo, L., Ribeiro, R. P., Pfahringer, B., & Branc, P. (2013). SMOTE for regression. In Portuguese conference on artificial intelligence.
Torki, M., Elgammal, A. (2011). Regression from local features for viewpoint and pose estimation. In 2011 International conference on computer vision.
Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
Weiss, S. M., & Indurkhya, N. (1995). Rule-based machine learning methods for functional prediction. Journal of Artificial Intelligence Research, 3, 383–403.
Wu, K. L., & Yang, M. S. (2007). Mean shift-based clustering. Pattern Recognition, 40(11), 3035–3052.
Yan, Y., Ricci, E., Subramanian, R., Lanz, O., & Sebe, N. (2013). No matter where you are: Flexible graph-guided multi-task learningfor multi-view head pose classification under target motion. In Proceedings of the IEEE international conference on computer vision.
Yang, L., Liu, J., & Tang, X. (2014). Object detection and viewpoint estimation with auto-masking neural network. In European conference on computer vision.
Zhang, H., El-gaaly, T., Elgammal, A., & Jiang, Z. (2013). Joint object and pose recognition using homeomorphic manifold analysis. In Association for the advancement of artificial intelligence (AAAI).
Zhao, G., Takafumi, M., Shoji, K., & Kenji, M. (2012). Video based estimation of pedestrian walking direction for pedestrian protection system. Journal of Electronics (China), 29(1–2), 72–81.
Zhao, G., Takafumi, M., Shoji, K., & Kenji, M. (2012). Video based estimation of pedestrian walking direction for pedestrian protection system. Journal of Electronics (China), 29(1–2), 72–81.
Zhen, X., Wang, Z., Yu, M., & Li, S. (2015). Supervised descriptor learning for multi-output regression. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Acknowledgments
This research was supported by a MURI Grant from the US Office of Naval Research under N00014-10-1-0934.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Hiroshi Ishikawa, Takeshi Masuda, Yasuyo Kita and Katsushi Ikeuchi.
Rights and permissions
About this article
Cite this article
Hara, K., Chellappa, R. Growing Regression Tree Forests by Classification for Continuous Object Pose Estimation. Int J Comput Vis 122, 292–312 (2017). https://doi.org/10.1007/s11263-016-0942-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-016-0942-1