Abstract
Tile2vec has proven to be a good representation learning model in the remote sensing field. The success of the model depends on l2-norm regularization. However, l2-norm regularization has the main drawback that affects the regularization. We propose to replace the l2-norm with regularization with predicting noise framework. We then develop an algorithm to integrate the framework. We evaluate the model by using it as a feature extractor on the land cover classification task. The result shows that our proposed model outperforms all the baseline models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862 (2017)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223. PMLR, July 2017
Bojanowski, P., Joulin, A.: Unsupervised learning by predicting noise. In: International Conference on Machine Learning, pp. 517–526. PMLR, July 2017
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR, November 2020
Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE international Conference on Computer Vision, pp. 1422–1430 (2015)
Ermon: tile2vec. https://github.com/ermongroup/tile2vec (2019). Accessed 24 June 2021
Fried, O., Avidan, S., Cohen-Or, D.: Patch2vec: globally consistent image patch representation. In: Computer Graphics Forum, vol. 36, no. 7, pp. 183–194, October 2017
Gao, S., Yan, B.: Place2vec: visualizing and reasoning about place type similarity and relatedness by learning context embeddings. In: Adjunct Proceedings of the 14th International Conference on Location Based Services, pp. 225–226. ETH Zurich, January 2018
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1, no. 2. MIT Press, Cambridge (2016)
Goodfellow, I.J., et al.: Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)
Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Helber, P., Bischke, B., Dengel, A., Borth, D.: Eurosat: a novel dataset and deep learning benchmark for land use and land cover classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 12(7), 2217–2226 (2019)
Jean, N., Wang, S., Samar, A., Azzari, G., Lobell, D., Ermon, S.: Tile2vec: unsupervised representation learning for spatially distributed data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 3967–3974, July 2019
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Lin, D., Fu, K., Wang, Y., Xu, G., Sun, X.: MARTA GANs: unsupervised representation learning for remote sensing image classification. IEEE Geosci. Remote Sens. Lett. 14(11), 2092–2096 (2017)
Lu, X., Zheng, X., Yuan, Y.: Remote sensing scene classification by unsupervised representation learning. IEEE Trans. Geosci. Remote Sens. 55(9), 5148–5157 (2017)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Parr, T.: 3 The difference between L1 and L2 regularization. https://explained.ai/regularization/L1vsL2.html#sec:3.2. Accessed 22 June 2021
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543, October 2014
Qi, G.J., Zhang, L., Chen, C.W., Tian, Q.: AVT: unsupervised learning of transformation equivariant representations by autoencoding variational transformations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8130–8139 (2019)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Rubner, Y., Tomasi, C., Guibas, L.J.: A metric for distributions with applications to image databases. In: Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), pp. 59–66. IEEE, January 1998
Vali, A., Comai, S., Matteucci, M.: Deep learning for land use and land cover classification based on hyperspectral and multispectral earth observation data: a review. Remote Sens. 12(15), 2495 (2020)
Varghese, A., Gubbi, J., Ramaswamy, A., Balamuralidhar, P.: ChangeNet: a deep learning architecture for visual change detection. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0 (2018)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103, July 2008
Wang, Z., Li, H., Rajagopal, R.: Urban2Vec: incorporating street view imagery and pois for multi-modal urban neighborhood embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, pp. 1013–1020, April 2020
Zhang, L., Qi, G.J., Wang, L., Luo, J.: Aet vs. aed: unsupervised representation learning by auto-encoding transformations rather than data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2547–2555 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Model’s Architecture
The model adopts Resnet18 architecture with slight differences. Each row describes a residual block with particular kernels except the first row. All blocks set padding as 1.
Encoder | Note |
---|---|
Conv(kernels=64, size=3, stride=1), B-Norm, ReLU | 1 Block |
Conv(kernels=64, size=3, stride=1), B-Norm, ReLU | 2 Blocks |
Conv(kernels=64, size=3, stride=1), B-Norm, ReLU | |
Conv(kernels=128, size=3, stride=2), B-Norm, ReLU | 2 Blocks |
Conv(kernels=128, size=3, stride=1), B-Norm, ReLU | |
Conv(kernels=256, size=3, stride=2), B-Norm, ReLU | 2 Blocks |
Conv(kernels=256, size=3, stride=1), B-Norm, ReLU | |
Conv(kernels=512, size=3, stride=2), B-Norm, ReLU | 2 Blocks |
Conv(kernels=512, size=3, stride=1), Batch Norm, ReLU | |
Conv(kernels=z, size=3, stride=2), Batch Norm, ReLU | 2 Blocks |
Conv(kernels=z, size=3, stride=1), Batch Norm, ReLU |
B Hyperparameters
Parameter | Value |
---|---|
Learning rate | 0.02 |
\(\alpha \) | 1.0 |
Representation dimension | 512 |
m | 0.1 |
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sinaga, M.A., Ali, F.M., Arymurthy, A.M. (2021). Tile2Vec with Predicting Noise for Land Cover Classification. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13111. Springer, Cham. https://doi.org/10.1007/978-3-030-92273-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-92273-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92272-6
Online ISBN: 978-3-030-92273-3
eBook Packages: Computer ScienceComputer Science (R0)