Local structured representation for generic object detection

Zhang, Junge; Huang, Kaiqi; Tan, Tieniu; Zhang, Zhaoxiang

doi:10.1007/s11704-016-5530-6

Local structured representation for generic object detection

Research Article
Published: 02 March 2017

Volume 11, pages 632–648, (2017)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Junge Zhang^1,3,
Kaiqi Huang^1,2,3,
Tieniu Tan^1,2,3 &
…
Zhaoxiang Zhang^2,3

61 Accesses
3 Citations
Explore all metrics

Abstract

Structure information plays an important role in both object recognition and detection. This paper studies what visual structure is and addresses the problem of structure modeling and representation from two aspects: visual feature and topology model. Firstly, at feature level, we propose Local Structured Descriptor to capture the object’s local structure effectively, and develop the descriptors from shape and texture information, respectively. Secondly, at topology level, we present a local structured model with a boosted feature selection and fusion scheme. All experiments are conducted on the challenging PASCAL Visual Object Classes (VOC) datasets from VOC2007 to VOC2010. Experimental results show that our method achieves very competitive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Group-Based Recursive Learning for Multistage Object Detection

Combining Texture and Shape Cues for Object Recognition with Minimal Supervision

A contemporary approach for object recognition based on spatial layout and low level features’ integration

Article 13 November 2018

References

Alexe B, Deselaers T, Ferrari V. Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(11): 2189–2202
Article Google Scholar
Cheng M M, Zhang Z, Lin W Y, Torr P. Bing: binarized normed gradients for objectness estimation at 300fps. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2014, 3286–3293
Google Scholar
Zitnick C, Dollár P. Edge boxes: locating object proposals from edges. In: Proceedings of European Conference on Computer Vision. 2014, 391–405
Google Scholar
Yao C, Bai X, Liu W. A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 2014, 23(11): 4737–4749
Article MathSciNet Google Scholar
Zhu Y, Yao C, Bai X. Scene text detection and recognition: recent advances and future trends. Frontiers of Computer Science, 2016, 10(1): 19–36
Article Google Scholar
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2005, 886–893
Google Scholar
Vedaldi A, Gulshan V, Varma M, Zisserman A. Multiple kernels for object detection. In: Proceedings of IEEE International Conference on Computer Vision. 2009, 606–613
Google Scholar
Wang X, Han T X, Yan S. An HOG-LBP human detector with partial occlusion handling. In: Proceedings of IEEE International Conference on Computer Vision. 2009, 32–39
Google Scholar
Felzenszwalb P, Girshick R, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627–1645
Article Google Scholar
Fergus R, Perona P, Zisserman A. Object class recognition by unsupervised scale-invariant learning. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2003, 264–271
Google Scholar
Schnitzspan P, Roth S, Schiele B. Automatic discovery of meaningful object parts with latent CRFs. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2010, 121–128
Google Scholar
Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1385–1392
Google Scholar
Zhu L, Chen Y, Yuille A L, Freeman W T. Latent hierarchical structural learning for object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2010, 1062–1069
Google Scholar
Fischler M, Elschlager R. The representation and matching of pictorial structures. IEEE Transactions on Computers, 1973, 22(1): 67–92
Article Google Scholar
Ojala T, Pietikäinen M, Harwood D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 1996, 29(1): 51–59
Article Google Scholar
Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91–110
Article Google Scholar
Mark E, Gool L, Williams C K, Winn J, Zisserman A. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303–338
Article Google Scholar
Zhang J, Huang K, Yu Y, Tan T. Boosted local structured HOG-LBP for object localization. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1393–1400
Google Scholar
Papageorgiou, C, Poggio T. A trainable system for object detection. International Journal of Computer Vision, 2000, 38(1): 15–33
Article MATH Google Scholar
Viola P, Jones M J. Robust real-time face detection. International Journal of Computer Vision, 2004, 57(2): 137–154
Article Google Scholar
Lee T S. Image representation using 2D gabor wavelets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996, 18(10): 959–971
Article Google Scholar
Shechtman E, Irani M. Matching local self-similarities across images and videos. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2007, 1–8
Google Scholar
Ferrari V, Fevrier L, Jurie F, Schmid C. Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(1): 36–51
Article Google Scholar
Bai X, Bai S, Zhu Z, Latecki L J. 3D shape matching via two layer coding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(12): 2361–2373
Article Google Scholar
Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2006, 2169–2178
Google Scholar
Sivic J, Russell B, Efros A, Zisserman A, Freeman W. Discovering objects and their location in images. In: Proceedings of IEEE International Conference on Computer Vision. 2005, 370–377
Google Scholar
Felzenszwalb P F, Huttenlocher D P. Distance transforms of sampled functions. Theory of Computing, 2012, 8(1): 415–428
Article MathSciNet MATH Google Scholar
Estepar R S J. Local Structure tensor for multidimensional signal processing: applications to medical image analysis. Dissertation for the Doctoral Degree. Valladolid: University of Valladolid, 2005
Google Scholar
Morrone C, Burr D. Feature detection in human vision: a phasedependent energy model. In: Proceedings of the Royal Society of London B: Biological Sciences. 1988, 221–245
Google Scholar
Venkatesh S, Owens R. On the classification of image features. Pattern Recognition Letters, 1990, 11(5): 339–349
Article MATH Google Scholar
Granlund G H, Knutsson H. Signal Processing for Computer Vision. Dordrecht: Kluwer Academic Publishers, 1995
Book Google Scholar
Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 2001, 42(3): 145–175
Article MATH Google Scholar
Ojala T, Pietikainen M, Maenpaa T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 971–987
Article MATH Google Scholar
Varma M, Babu B R. More generality in efficient multiple kernel learning. In: Proceedings of International Conference onMachine Learning. 2009, 1065–1072
Google Scholar
Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. Annuals of Statistics, 2000, 28(2): 374–376
Article MathSciNet MATH Google Scholar
Hussain S, Triggs B. Feature sets and dimensionality reduction for visual object detection. In: Proceedings of British Machine Vision Conference. 2010
Google Scholar
Felzenszwalb P F, Girshick R B, McAllester D. Discriminatively Trained Deformable Part Models, Release 3
Felzenszwalb, P F, Girshick R B, McAllester D. Discriminatively Trained Deformable Part Models, Release 4, 2010
Gehler P, Nowozin S. On feature combination for multiclass object classification. In: Proceedings of IEEE International Conference on Computer Vision. 2009, 221–228
Google Scholar
Torralba A, Murphy K, Freeman W. Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2004, 762–769
Google Scholar
Everingham M, Gool V L, Williams C K I, Winn J, Zisserman A. Empirical analysis of detection cascades of boosted classifiers for rapid object detection. Lecture Notes in Computer Science, 2003, 2781: 297–304
Article Google Scholar
Everingham M, Gool V L, Williams C K I, Winn J, Zisserman A. The PASCAL visual object classes challenge 2007 (VOC2007) results. International Journal of Computer Vision, 2010, 88(2): 303–338
Article Google Scholar
Desai C, Ramanan D, Fowlkes C. Discriminative models for multiclass object layout. In: Proceedings of IEEE International Conference on Computer Vision. 2009, 229–236
Google Scholar
Pedersoli M, Vedaldi A, Gonzalez J. A coarse-to-fine approach for fast deformable object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1353–1360
Google Scholar
Razavi N, Gall J, Gool V L. Scalable multi-class object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1505–1512
Google Scholar
Divvala S K, Zitnick C, Kapoor A, Baker S. Detecting objects using unsupervised parts-based attributes. Technical Report CMU-RI-TR-11- 10, Robotics Institute. 2010
Google Scholar
Schnitzspan P, Fritz M, Roth S, Schiele B. Discriminative structure learning of hierarchical representations for object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2009, 2238–2245
Google Scholar
Malisiewicz T, Gupta A, Efros A A. Ensemble of exemplar-svms for object detection and beyond. In: Proceedings of IEEE International Conference on Computer Vision. 2011, 89–96
Google Scholar
Dubout C, Fleuret F. Deformable part models with individual part scaling. In: Proceedings of the British Machine Vision Conference. 2013
Google Scholar
Gidaris S, Komodakis N. Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1134–1142
Google Scholar
Girshick R. Fast r-cnn. 2015, arXiv:1504.08083
Book Google Scholar
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2014, 580–587
Google Scholar
He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of European Conference on Computer Vision. 2014, 346–361
Google Scholar
Liang X, Liu S, Wei Y, Liu L, Lin L, Yan S. Computational baby learning. 2014, arXiv:1411.2861
Google Scholar
Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. 2015, arXiv:1506.01497
Google Scholar
Ren S, He K, Girshick R, Zhang X, Sun J. Object detection networks on convolutional feature maps. 2015, arXiv:1504.06066
Google Scholar
Ren W, Huang K, Tao D, Tan T. Weakly supervised large scale object localization with multiple instance learning and bag splitting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 32(2): 405–416
Article Google Scholar
Wan L, Eigen D, Fergus R. End-to-end integration of a convolutional network, deformable parts model and non-maximum suppression. 2014, arXiv:1411.5309
Google Scholar
Wang C, Huang K, Ren W, Zhang J, Maybank S. Large-scale weakly supervised object localization via latent category learning. IEEE Transactions on Image Processing, 2015, 24(4): 1371–1385
Article MathSciNet Google Scholar
Zhang Y, Sohn K, Villegas R, Pan G, Lee H. Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction. 2015, arXiv:1504.03293
Book Google Scholar
Zhu Y, Urtasun R, Salakhutdinov R, Fidler S. segDeepM: exploiting segmentation and context in deep neural networks for object detection. 2015, arXiv:1502.04275
Google Scholar
Song X, Wu T, Jia Y, Zhu S C. Discriminatively trained and-or tree models for object detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2013, 23–28
Google Scholar
Wang X, Lin L, Huang L, Yan S. Incorporating structural alternatives and sharing into hierarchy for multiclass object recognition and detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2013, 3334–3341
Google Scholar
Mark E, Gool V L, Williams C K I, Winn J, Zisserman A. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. Technical Report. 2008
Google Scholar
Chen Y, Zhu L, Yuille A. Active mask hierarchies for object detection. In: Proceedings of European Conference on Computer Vision. 2010, 43–56
Google Scholar
Ott P, Everingham M. Shared parts for deformable part-based models. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition. 2011, 1513–1520
Google Scholar
Zhang J, Huang Y, Huang K, Wu Z, Tan T. Data decomposition and spatial mixture modeling for part based model. In: Proceedings of Asian Conference on Computer Vision. 2012, 123–137
Google Scholar

Download references

Acknowledgements

This work was funded by the National Basic Research Program of China (2012CB316302), the National Natural Science Foundation of China (Grant Nos. 61403387, 61322209 and 61175007), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA06040102).

Author information

Authors and Affiliations

Center for Research on Intelligent Perception and Computing, Chinese Academy of Sciences, Beijing, 100190, China
Junge Zhang, Kaiqi Huang & Tieniu Tan
Research Center for Brain-inspired Intelligence, Chinese Academy of Sciences, Beijing, 100190, China
Kaiqi Huang, Tieniu Tan & Zhaoxiang Zhang
National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Junge Zhang, Kaiqi Huang, Tieniu Tan & Zhaoxiang Zhang

Authors

Junge Zhang
View author publications
Search author on:PubMed Google Scholar
Kaiqi Huang
View author publications
Search author on:PubMed Google Scholar
Tieniu Tan
View author publications
Search author on:PubMed Google Scholar
Zhaoxiang Zhang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Junge Zhang.

Additional information

Junge Zhang received his PhD in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2013. In July 2013, he joined the Center for Research on Intelligent Perception and Computing (CRIPAC), China, as an assistant professor. His major research interests include computer vision, pattern recognition. He served as the publicity chair and the technical program committee member of several conferences, and the peer reviewer of over 10 international journals and conferences. In 2010 and 2011, he and his group members won the champion of PASCAL VOC challenge on object detection and ranked the second on object classification.

Kaiqi Huang received his MS in electrical engineering from Nanjing University of Science and Technology, China, and PhD in signal and information processing from Southeast University, China. After receiving the PhD, he became a postdoctoral researcher with the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China where he is currently a professor. He has published over 100 papers on TPAMI, TIP, TCSVT, TSMCB, CVIU, Pattern Recognition and CVPR, ECCV. He is senior member of IEEE and the Deputy Secretary General of the IEEE Beijing Section. His interests include visual surveillance, image and video analysis, human vision and cognition, computer vision, etc.

Tieniu Tan received his BS in electronic engineering from Xi’an Jiaotong University, China in 1984, and his MS and PhD in electronic engineering from Imperial College London, UK in 1986 and 1989, respectively. In January 1998, he returned to China to join the National Laboratory of Pattern Recognition (NLPR), Institute of Automation of the Chinese Academy of Sciences (CAS), China as a full professor. He is currently the director of Center for Research on Intelligent Perception and Computing at the Institute of Automation, China, and also serves as the vice president of CAS. His current research interests include biometrics, image and video understanding, and information forensics and security.

Zhaoxiang Zhang received his BS in circuits and systems from the University of Science and Technology of China, China in 2004. After that, he was a PhD candidate under the supervision of Professor Tieniu Tan in the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences (CAS), China, where he received his PhD in 2009. In October 2009, he joined the School of Computer Science and Engineering, Beihang University, China as an assistant professor (2009–2011), an associate professor (2012–2015) and the vise-director of the Department of Computer application technology (2014–2015). In July 2015, he returned to the Institute of Automation, CAS. He is now a professor in the Research Center for Brain-inspired Intelligence.

Electronic supplementary material

Supplementary material, approximately 584 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Huang, K., Tan, T. et al. Local structured representation for generic object detection. Front. Comput. Sci. 11, 632–648 (2017). https://doi.org/10.1007/s11704-016-5530-6

Download citation

Received: 08 December 2015
Accepted: 18 March 2016
Published: 02 March 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11704-016-5530-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local structured representation for generic object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Group-Based Recursive Learning for Multistage Object Detection

Combining Texture and Shape Cues for Object Recognition with Minimal Supervision

A contemporary approach for object recognition based on spatial layout and low level features’ integration

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 584 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now