skip to main content
10.1145/1743384.1743420acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Wavelet, active basis, and shape script: a tour in the sparse land

Published: 29 March 2010 Publication History

Abstract

Sparse coding is a key principle that underlies wavelet representation of natural images. In this paper, we explain that the effort of seeking a common wavelet sparse coding of images from the same object category leads to an active basis model, where the images share the same set of selected wavelet elements, which form a linear basis for representing the images. The selected wavelet elements are allowed to perturb their locations and orientations to account for shape deformations, so that the basis becomes active, and the active basis serves as a mathematical representation of a deformable template. We show that a recursive application of the strategy underlying the active basis model leads to a shape script model, which is a composition of shape motifs such as ellipsoids, parallel bars, angles, etc. These shape motifs are allowed to change their locations, orientations, scales and aspect ratios, and the shape motifs themselves are modeled by active bases. Compared to the active basis model, the shape script model is a sparser representation and therefore has stronger generalization power. It can also be considered another layer of sparse coding of the selected wavelet elements that themselves provide sparse coding of the image intensities.

References

[1]
X. Bai, X. Wang, W. Liu, L. J. Latecki, and Z. Tu. Active skeleton for non-rigid object detection. In Proceedings of International Conference on Computer Vision, 2009.
[2]
E. J. Candes and D. L. Donoho. Curvelets - a surprisingly effective nonadaptive representation for objects with edges. Curves and Surfaces L. L. Schumakeretal. (eds), 1999.
[3]
J. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of Optical Society of America, 2:1160--1169, 1985.
[4]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39:1--38, 1977.
[5]
A. Dubinsky and S.-C. Zhu. A multiscale generative model for animate shape and parts. In Proceedings of International Conference on Computer Vision, 2003.
[6]
J. H. Friedman. Exploratory projection pursuit. Journal of the American Statistical Association, 82:249--266, 1987.
[7]
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:119--139, 1997.
[8]
P. Konye and K. Ashforth. Funky Things to Draw. Hinkler Books, 2008.
[9]
S. Geman, D. F. Potter, and Z. Chi. Composition systems. Quarterly of Applied Mathematics, 60:707--736, 2002.
[10]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86:2278--2324, 1998.
[11]
S. Mallat and Z. Zhang. Matching pursuit in a time-frequency dictionary. IEEE Transactions on Signal Processing, 41:3397--3415, 1993.
[12]
B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607--609, 1996.
[13]
M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience, 2:1019--1025, 1999.
[14]
Z. Si, H. Gong, S.-C. Zhu, and Y. N. Wu. Learning active basis models by EM-type algorithms. Statistical Science, in press, 2009.
[15]
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, B, 58: 267--288, 1996.
[16]
D. Toll. You Can Draw: Over 100 Drawings to Master. Hinkler Books, 2006.
[17]
P. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57:137--154, 2004.
[18]
Y. N. Wu, Z. Si, H. Gong, and S.-C. Zhu. Learning active basis model for object detection and recognition. International Journal of Computer Vision, in press, 2009.
[19]
A. L. Yuille, P. W. Hallinan, and D. S. Cohen. Feature extraction from faces using deformable templates. International Journal of Computer Vision, 8:99--111, 1992.
[20]
L. Zhu, C. Lin, H. Huang, Y. Chen, and A. Yuille. Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. In Proceedings of European Conference on Computer Vision, 2008.
[21]
S.-C. Zhu and D. B. Mumford. A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2:259--362, 2006.

Cited By

View all
  • (2014)Sparse Coding with a Coupled Dictionary Learning Approach for Textual Image Super-resolutionProceedings of the 2014 22nd International Conference on Pattern Recognition10.1109/ICPR.2014.763(4459-4464)Online publication date: 24-Aug-2014
  • (2010)Batch Mode Sparse Active LearningProceedings of the 2010 IEEE International Conference on Data Mining Workshops10.1109/ICDMW.2010.175(875-882)Online publication date: 13-Dec-2010

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MIR '10: Proceedings of the international conference on Multimedia information retrieval
March 2010
600 pages
ISBN:9781605588155
DOI:10.1145/1743384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deformable template
  2. generative model
  3. sparse coding

Qualifiers

  • Research-article

Conference

MIR '10
Sponsor:
MIR '10: International Conference on Multimedia Information Retrieval
March 29 - 31, 2010
Pennsylvania, Philadelphia, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Sparse Coding with a Coupled Dictionary Learning Approach for Textual Image Super-resolutionProceedings of the 2014 22nd International Conference on Pattern Recognition10.1109/ICPR.2014.763(4459-4464)Online publication date: 24-Aug-2014
  • (2010)Batch Mode Sparse Active LearningProceedings of the 2010 IEEE International Conference on Data Mining Workshops10.1109/ICDMW.2010.175(875-882)Online publication date: 13-Dec-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media