Abstract
Scattering representation which is invariant to translation, rotation, scale and linear transformation of image, has good power to describe signal classes with a deformable structure. However, natural images contain general object classes with far more complex sources of variability, including occlusions, clutter or complex changes of shape and/or texture. The variability of physical transformations such as translation or rotation is universal and does not need to be learnt, but for complex data, learning becomes important in order to address more complex sources of variability. This paper proposes a novel framework by combining scattering transform and deep learning architecture to address the limitations of ScatNet. Wavelet scattering networks may provide the first two layers of general deep architectures to avoid the learning of large number of low-level filters for deep network. The proposed method is proved experimentally efficient.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987 (2002)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Lowe, D.G.: Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, pp. 1150–1157 (1999)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Softw. Eng. 35, 1798–1828 (2014)
Bellman, R.: Dynamic programming and lagrange multipliers. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 42, p. 767 (1956)
Le, Q.V.: Building high-level features using large scale unsupervised learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8595–8598 (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2013)
Bruna, J., Mallat, S.: Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1872–1886 (2013)
Sifre, L., Mallat, S.: Rotation, scaling and deformation invariant scattering for texture discrimination. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1233–1240 (2013)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)
LeCun, Y.: http://yann.lecun.com/exdb/mnist/
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 473–480 (2007)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto. Technical report 1, 7 (2009)
Fischer, A., Igel, C.: An Introduction to restricted Boltzmann machines. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) CIARP 2012. LNCS, vol. 7441, pp. 14–36. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33275-3_2
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
Zabalza, J., Ren, J., Zheng, J., et al.: Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neuro Comput. 185, 1–10 (2016)
Hong, R., Zhang, L., Zhang, C., et al.: Flickr circles: aesthetic tendency discovery by multi-view regularized topic modeling. IEEE Trans. Multimed. 18, 1555–1567 (2016)
Hong, R., Hu, Z., Wang, R., et al.: Multi-view object retrieval via multi-scale topic models. IEEE Trans. Image Process. 25, 5814–5827 (2016)
Hong, R., Yang, Y., Wang, M., et al.: Learning visual semantic relationships for efficient visual retrieval. IEEE Trans. Big Data. 1, 152–161 (2017)
Acknowledgments
This work is supported by the Science and Technology Planning Project of Guangdong Province of China (No. 2014B010111003, 2014B010111006, 2016B010108008), Guangzhou Key Lab of Body Data Science under Grant 201605030011 and National Natural Science Founding of China under Grant 61401163.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Huang, W., Zou, X., Qiao, R. (2018). Scattering Wavelet Based Deep Network for Image Classification. In: Huet, B., Nie, L., Hong, R. (eds) Internet Multimedia Computing and Service. ICIMCS 2017. Communications in Computer and Information Science, vol 819. Springer, Singapore. https://doi.org/10.1007/978-981-10-8530-7_45
Download citation
DOI: https://doi.org/10.1007/978-981-10-8530-7_45
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8529-1
Online ISBN: 978-981-10-8530-7
eBook Packages: Computer ScienceComputer Science (R0)