Preserving Spatio-Temporal Information in Machine Learning: A Shift-Invariant k-Means Perspective

Oktar, Yigit; Turkan, Mehmet

doi:10.1007/s11265-022-01818-8

Preserving Spatio-Temporal Information in Machine Learning: A Shift-Invariant k-Means Perspective

Published: 30 September 2022

Volume 94, pages 1471–1483, (2022)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

445 Accesses
Explore all metrics

Abstract

In conventional machine learning applications, each data attribute is assumed to be orthogonal to others. Namely, every pair of dimension is orthogonal to each other and thus there is no distinction of in-between relations of dimensions. However, this is certainly not the case in real world signals which naturally originate from a spatio-temporal configuration. As a result, the conventional vectorization process disrupts all of the spatio-temporal information about the order/place of data whether it be 1D, 2D, 3D, or 4D. In this paper, the problem of orthogonality is first investigated through conventional k-means of images, where images are to be processed as vectors. As a solution, shift-invariant k-means is proposed in a novel framework with the help of sparse representations. A generalization of shift-invariant k-means, convolutional dictionary learning is then utilized as an unsupervised feature extraction method for classification. Experiments suggest that Gabor feature extraction as a simulation of shallow convolutional neural networks provides a little better performance compared to convolutional dictionary learning. Other alternatives of convolutional-logic are also discussed for spatio-temporal information preservation, including a spatio-temporal hypercomplex encoding scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminative Sparse Representations

Unsupervised Feature Extraction – A CNN-Based Approach

Convolutional Transform Learning

References

Sandor, J. (1996). On the arithmetical functions dk(n) and d*k(n). Portugaliæ Mathematica, 53, 107–116.
MathSciNet MATH Google Scholar
Jafari, M., & Molaei, H. (2014). Spherical linear interpolation and Bezier curves. General Science Research, 2, 13–17.
Google Scholar
Oktar, Y., & Turkan, M. (2018). A review of sparsity-based clustering methods. Signal Processing, 148, 20–30.
Article Google Scholar
Oktar, Y., & Turkan, M. (2019). K-polytopes: A superproblem of k-means. Signal, Image, Video Processing, 13, 1207–1214.
Article Google Scholar
Oktar, Y., & Turkan, M. (2020). Evolutionary simplicial learning as a generative and compact sparse framework for classification. Signal Processing, 174.
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666.
Article Google Scholar
Pati, Y. C., Rezaiifar, R., & Krishnaprasad, P. S. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Asilomar Conference on Signals, Systems, and Computers (pp. 40–44).
Engan, K., Aase, S. O., & Husoy, J. H. (1999). Method of optimal directions for frame design. In IEEE International Conference on Acoustics, Speech, & Signal Processing (pp. 2443–2446). volume 5.
Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54, 4311–4322.
Article MATH Google Scholar
Tang, W., Panahi, A., Krim, H., & Dai, L. (2019). Analysis dictionary learning: An efficient and discriminative solution. In IEEE International Conference Acoustics, Speech, Signal Processing (pp. 3682–3686).
Zhang, Z., Jiang, W., Qin, J., Zhang, L., Li, F., Zhang, M., & Yan, S. (2017). Jointly learning structured analysis discriminative dictionary and analysis multiclass classifier. IEEE Transactions on Neural Networks and Learning Systems, 29, 3798–3814.
Article MathSciNet Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems (pp. 2672–2680).
Zhang, Z., Sun, Y., Wang, Y., Zhang, Z., Zhang, H., Liu, G., & Wang, M. (2020). Twin-incoherent self-expressive locality-adaptive latent dictionary pair learning for classification. IEEE Transactions on Neural Networks and Learning Systems.
Garcia-Cardona, C., & Wohlberg, B. (2018). Convolutional dictionary learning: A comparative review and new algorithms. IEEE Transactions on Computational Imaging, 4, 366–381.
Article MathSciNet Google Scholar
Pu, Y., Yuan, W., Stevens, A., Li, C., & Carin, L. (2016). A deep generative deconvolutional image model. In Statistics in Artificial Intelligence (pp. 741–750).
Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. (2010). Deconvolutional networks. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2528–2535).
LeCun, Y., Cortes, C., & Burges, C. J. C. (2010). MNIST Handwritten Digit Database.
Dhillon, I. S., Guan, Y., & Kulis, B. (2004). Kernel k-means: Spectral clustering and normalized cuts. In ACM International Conference on Knowledge Discovery and Data Mining (pp. 551–556).
Iam-on, N., & Garrett, S. (2010). Linkclue: A MATLAB package for link-based cluster ensembles. Journal of Statistical Software, 36, 1–36.
Article Google Scholar
Fard, M. M., Thonet, T., & Gaussier, E. (2020). Deep k-means: Jointly clustering with k-means and learning representations. Pattern Recognition Letters.
Hull, J. J. (1994). A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 550–554.
Article Google Scholar
Wohlberg, B. (2017). SPORCO: A Python package for standard and convolutional sparse representations. In Python in Science Conference (pp. 1–8).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 886–893). volume 1.
Ojala, T., Pietikainen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29, 51–59.
Article Google Scholar
Haghighat, M., Zonouz, S., & Abdel-Mottaleb, M. (2015). CloudID: Trustworthy cloud-based and cross-enterprise biometric identification. Expert Systems with Applications, 42, 7905–7916.
Article Google Scholar
Moody, G. B., & Mark, R. G. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20, 45–50.
Article Google Scholar
Kachuee, M., Fazeli, S., & Sarrafzadeh, M. (2018). ECG heartbeat classification: A deep transferable representation. In IEEE International Conference Healthcare Informatics (pp. 443–444).
Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., & Batista, G. (2015). The UCR Time Series Classification Archive.
Barthelemy, Q., Larue, A., Mayoue, A., Mercier, D., & Mars, J. I. (2012). Shift & 2D rotation invariant sparse coding for multivariate signals. IEEE Transactions on Signal Processing, 60, 1597–1611.
Article MathSciNet MATH Google Scholar
Bar, L., & Sapiro, G. (2010). Hierarchical dictionary learning for invariant classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 3578–3581).
Eysenck, M. W., & Keane, M. T. (2005). Cognitive psychology: A student’s handbook. Taylor & Francis.
Gu, S., Meng, D., Zuo, W., & Zhang, L. (2017). Joint convolutional analysis and synthesis sparse representation for single image layer separation. In IEEE International Conference on Computer Vision (pp. 1708–1716).
Shekhar, S., Patel, V. M., & Chellappa, R. (2014). Analysis sparse coding models for image-based classification. In IEEE International Conference Image Processing (pp. 5207–5211).
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45, 2673–2681.
Article Google Scholar
Sak, H., Senior, A., & Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Annual Conference International Speech Communication Association.
Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20, 61–80.
Article Google Scholar
Arora, S., Du, S. S., Li, Z., Salakhutdinov, R., Wang, R., & Yu, D. (2019). Harnessing the power of infinitely wide deep nets on small-data tasks. arXiv preprint arXiv:1910.01663
Lee, J., Bahri, Y., Novak, R., Schoenholz, S. S., Pennington, J., & Sohl-Dickstein, J. (2017). Deep neural networks as Gaussian processes. arXiv preprint arXiv:1711.00165
Hazan, T., Polak, S., & Shashua, A. (2005). Sparse image coding using a 3D non-negative tensor factorization. In IEEE International Conference on Computer Vision (pp. 50–57). volume 1.
Kolda, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review, 51, 455–500.
Article MathSciNet MATH Google Scholar
Duan, G., Wang, H., Liu, Z., Deng, J., & Chen, Y.-W. (2012). K-CPD: Learning of overcomplete dictionaries for tensor sparse coding. In IEEE International Conference on Pattern Recognition (pp. 493–496).
Wang, J., Li, J., Han, X.-H., Lin, L., Hu, H., Xu, Y., et al. (2020). Tensor-based sparse representations of multi-phase medical images for classification of focal liver lesions. Pattern Recognition Letters, 130, 207–215.
Article Google Scholar
Caiafa, C. F., & Cichocki, A. (2013). Computing sparse representations of multidimensional signals using Kronecker bases. Neural Computation, 25, 186–220.
Article MathSciNet MATH Google Scholar
Caiafa, C. F., & Cichocki, A. (2012). Block sparse representations of tensors using Kronecker bases. In IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 2709–2712).
Peng, Y., Meng, D., Xu, Z., Gao, C., Yang, Y., & Zhang, B. (2014). Decomposable nonlocal tensor dictionary learning for multispectral image denoising. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2949–2956).
Qi, N., Shi, Y., Sun, X., Wang, J., & Yin, B. (2013). Two dimensional synthesis sparse model. In IEEE International Conference Multimedia Expo (pp. 1–6).
Roemer, F., Del Galdo, G., & Haardt, M. (2014). Tensor-based algorithms for learning multidimensional separable dictionaries. In IEEE International Conference Acoustics, Speech, Signal Processing (pp. 3963–3967).
Huang, F., & Anandkumar, A. (2015). Convolutional dictionary learning through tensor factorization. In Feature extraction: Modern Questions and Challenges (pp. 116–129).
Moxey, C. E., Sangwine, S. J., & Ell, T. A. (2003). Hypercomplex correlation techniques for vector images. IEEE Transactions on Signal Processing, 51, 1941–1953.
Article MathSciNet MATH Google Scholar
Xu, Y., Yu, L., Xu, H., Zhang, H., & Nguyen, T. (2015). Vector sparse representation of color image using quaternion matrix analysis. IEEE Transactions on Image Processing, 24, 1315–1329.
Article MathSciNet MATH Google Scholar
Kilmer, M. E., & Martin, C. D. (2011). Factorization strategies for third-order tensors. Linear Algebra and its Applications, 435, 641–658.
Article MathSciNet MATH Google Scholar
Mairal, J., Elad, M., & Sapiro, G. (2007). Sparse representation for color image restoration. IEEE Transactions on Image Processing, 17, 53–69.
Article MathSciNet Google Scholar
Hirose, A. (2012). Complex-valued neural networks volume 400. Springer Science & Business Media.
Isokawa, T., Kusakabe, T., Matsui, N., & Peper, F. (2003). Quaternion neural network and its application. In Int. Conf. Knowledge-based Intell. Inf. Eng. Syst. (pp. 318–324).
Nitta, T. (2003). Solving the XOR problem and the detection of symmetry using a single complex-valued neuron. Neural Networks, 16, 1101–1105.
Article Google Scholar
Chen, X., Song, Q., & Li, Z. (2017). Design and analysis of quaternion-valued neural networks for associative memories. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 48, 2305–2314.
Article Google Scholar
Lazendic, S., De Bie, H., & Pizurica, A. (2018a). Octonion sparse representation for color and multispectral image processing. In European Signal Processing Conference (pp. 608–612).
Popa, C.-A. (2016). Octonion-valued neural networks. In International Conference on Artificial Neural Networks (pp. 435–443).
Lazendic, S., Pizurica, A., & De Bie, H. (2018b). Hypercomplex algebras for dictionary learning. In Conference Applied Geometric Algebra Computing in Engineering (pp. 57–64).
Wang, R., Wang, K., Cao, W., & Wang, X. (2019). Geometric algebra in signal and image processing: A survey. IEEE Access, 7, 156315–156325.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Izmir University of Economics, Izmir, 35330, Turkey
Yigit Oktar
Department of Electrical and Electronics Engineering, Izmir University of Economics, Izmir, 35330, Turkey
Mehmet Turkan

Authors

Yigit Oktar
View author publications
You can also search for this author inPubMed Google Scholar
Mehmet Turkan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mehmet Turkan.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A Brief Discussion on the Spatio-temporal Information Preservation

Variations on Neural Networks

Convolution with a kernel in the input side of a layer corresponds to a locally connected structure instead of a traditional fully connected one. Neighboring cells now occur in a relation, preserving the original spatial configuration. As an alternative to the convolutional approach then, neighboring cells in the input or the output side of a neural network layer can also be put in relation with direct edges in-between, as another way of preserving the original spatial configuration that the input cells have. Possibility of edges in-between in the same layer might force to think of a neural network as a more general directed graph. In fact, this line of logic leads to an alternative structure known as recurrent neural networks (RNN). In most general sense, RNNs represent directed graphs. Note that it is possible to build upon basic RNN structure through bidirectional logic [34] and long-short term memory concept [35].

On the other hand, empirical evaluation suggests that temporal convolution, or in other words 1D convolutional-logic surpasses the capacity of recurrent architectures in sequence modeling [36]. It is still an open question whether temporal dimension should be regarded as just another spatial dimension or whether a hybrid approach is better. This is rather a deep issue related to properties of space and time. Instead, considering neural networks of any structure as directed and possibly cyclic graphs, or in other words as neural graphs, might pave way to better understanding of the brain. Note that this concept is rather different than graph neural networks which use graphs as inputs [37].

Another generalization for neural networks is possible by considering infinite width neural networks [38]. Recent results suggest that deep neural networks that are allowed to become infinitely wide converge to models called Gaussian processes [39]. However, such studies do not consider the case when there are in-between connections within layers. Considering the existence of these connections, this can further lead to having an infinite but continuous (input or output) layers, which is indeed applicable mathematically and practically. A generalization of neural network layer cases in this sense is depicted in Fig. 9. The third case in this figure is important in that, it leads to the concept of functional machine learning. This alone may not be enough to preserve the spatial configuration of the input layer. Therefore, additional locally connected versions of these structures can also be proposed.

Tensor-based Sparse Representations

The fact is that images are not vectors, thus vectorization breaks the spatial coherency of images which is investigated by [40]. This line of thought is centralized around tensor factorization as a generalization. The study in [40] reports that by treating training images as a 3D cube and performing a non-negative tensor factorization (NTF); higher efficiency, discrimination and representation power can be achieved when compared to non-negative matrix factorization (NMF).

There are two main branches of tensor decomposition. In the first branch, studies are based on canonical polyadic decomposition (CPD), sometimes also referred to as CANDECOMP/PARAFAC [41]. The most relevant example from literature is K-CPD [42], an algorithm of overcomplete dictionary learning for tensor sparse coding based on a multilinear version of OMP and CANDECOMP/PARAFAC decomposition. K-CPD surpasses conventional methods in a series of image denoising experiments. Most recently, a similar framework is also successfully utilized in tensor-based sparse representations for classification of multiphase medical images [43]. The second branch is centered around the Tucker decomposition model instead, which is a more general model than CPD [44]. The study in [45] presents the foundations of the Tucker decomposition model by defining the Tensor-OMP algorithm which computes a block-sparse representation of a tensor with respect to a Kronecker basis. In [44], authors report that a block-sparse structure imposed on a core tensor through subtensors provide significant results. The Tucker model together with block-sparsity restriction may work significantly well, since the higher dimensional block structure is meaningfully applied on the original sparse tensor in the form of subtensors. There are many other studies in literature specifically based on the Tucker model of sparse representations with or without block-sparsity and additionally including dictionary learning [46,47,48].

Certain parallels can be drawn between convolutional dictionary learning and tensor-based sparse representations. As an example, the study in [49] proposes a novel framework for learning convolutional models through tensor decomposition and shows that cumulant tensors have a CPD whose components correspond to convolutional filters and their circulant shifts.

On the other side, tensor-based approaches (both CPD and Tucker models) do not still provide a solution to 1D case. Without loss of generality, let us assume that the signal is in the form of a column vector $\mathbf{s}$. Since the signal is one-dimensional, there will be a single matrix $\mathbf{D}$ for that single dimension in the Tucker model. Therefore, the model attained is $\mathbf{s} = \mathbf{x} \times _{1}{} \mathbf{D}$ in Eq. (5). It is also possible to show that $\mathbf{x}\times _{1}{} \mathbf{D} = \mathbf{D}{} \mathbf{x}$. From the CPD model perspective, there is equivalently $\sum _{i}{x_{i} \mathbf{d}_{i}^{(1)}}$ where $x_{i}$ is the single sparse coefficient associated with $i^{th}$ atom $\mathbf{d}_i$. Hence, one arrives at a standard formulation in Eq. (5), namely Tucker and CPD models are equivalent in one-dimensional case, all corresponding to conventional orthogonal sparse representation.

$$\begin{aligned} \mathbf{s} = \mathbf{x}\times _{1}{} \mathbf{D} = \mathbf{D}{} \mathbf{x} = {\sum _{i}{x_{i} \mathbf{d}_{i}^{(1)}}} \end{aligned}$$

(5)

The above observation brings up an important question onto the table. Although tensor-based approaches provide advantage when the signals are multidimensional, these formulations will not provide an edge for 1D signals. The remedy may come from considering a 1D signal, not as a 1D vector of elements solely. In other words, a 1D complex vector can be formed by coding the cell positions in the imaginary parts to overcome the orthogonality problem in standard 1D vector representation as depicted in Fig. 10. This paves way to performing sparse representations of complex valued data, or even quaternion valued data, to accommodate more information in cases of higher dimensionality. Utmost generalization is achieved through geometric algebra as a generalization of hypercomplex numbers.

Complex, Hypercomplex and Geometric Algebra Based Approaches

Note that quaternion algebra is the first hypercomplex number system to be devised that is similar to real and complex number systems [50]. The study in [51] states that a quaternion-based model can achieve more structured representation when compared to a tensor-based model. Comparisons between quaternion-SVD and tensor-SVD [52] provide their equivalence, but superiority of quaternion-SVD arises when it is combined with the sparse representation model. It is possible to formulate a quaternion-valued sparse representation of color images that surpasses the conventional logic [51].

There are four possible models to represent color images as suggested in [51]. The first one is the monochromatic model, in which each color channel is represented separately. The second one is the concatenation model, where a single vector is formed by concatenating three color channels [53]. The third is the tensor-based model, where the color image is thought of as a 3D cube of values. The last one is the quaternion-based model, where each color channel is assigned to each imaginary value, i.e., r,g,b to i,j,k respectively. Most importantly, all these models are analytically unified.

There is also one more possible model that is subtler. As depicted in Fig. 10, one can encode a mono audio as a vector of complex numbers where imaginary values indicate the timed position, in a similar way one can encode a grayscale image as a quaternion-valued vector where imaginary parts are allocated to indicate the pixel positions. While thinking of a color image as a 3D cube, there is a possible quaternion-based model in which imaginary units encode the position within this cube and the scalar denotes the value of that cell. The same quaternion-based encoding can be applied to any 3D scalar data.

For further machine learning in this proposed scheme, a hypercomplex to real feature extraction layer is required since current mainstream classification algorithms need real-valued data. Another option is to consult classification algorithms that can directly handle hypercomplex values. This line of logic paves way to consider complex/hypercomplex valued neural networks as viable tools [54, 55]. As a future work, comparison of spatio-temporally encoded hypercomplex neural networks with conventional convolutional or recurrent neural networks may lead to deeper understanding of the deep learning concept. As a motivation, a single complex-valued neuron can solve the XOR problem [56]. In addition, the fact that quaternions can be used to implement associative memory in neural networks is promising [57].

Another line of generalization can deal with the case when the data has more than three dimensions. In such a case, a quaternion is not enough to designate the cell position and its value. As an extension, octonion algebra can accommodate up to seven imaginary channels [58, 59]; however, loses the associativity property. The study in [60] reports that all algebras of dimension larger than eight lose important properties, since they contain algebras of smaller dimension as subalgebras. This might be an issue related to physics of space and time, which is out of scope of this study. The important fact is that the domain dealing with generalization of hypercomplex numbers is called “geometric algebra” and is gaining great attention lately [61].

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Oktar, Y., Turkan, M. Preserving Spatio-Temporal Information in Machine Learning: A Shift-Invariant k-Means Perspective. J Sign Process Syst 94, 1471–1483 (2022). https://doi.org/10.1007/s11265-022-01818-8

Download citation

Received: 28 March 2022
Revised: 19 September 2022
Accepted: 25 September 2022
Published: 30 September 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11265-022-01818-8

Keywords

Profiles

Yigit Oktar View author profile
Mehmet Turkan View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Preserving Spatio-Temporal Information in Machine Learning: A Shift-Invariant k-Means Perspective

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Discriminative Sparse Representations

Unsupervised Feature Extraction – A CNN-Based Approach

Convolutional Transform Learning

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s Note

Appendix

Appendix

1.1 A Brief Discussion on the Spatio-temporal Information Preservation

Variations on Neural Networks

Tensor-based Sparse Representations

Complex, Hypercomplex and Geometric Algebra Based Approaches

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now