Skip to main content
Log in

Preserving Spatio-Temporal Information in Machine Learning: A Shift-Invariant k-Means Perspective

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In conventional machine learning applications, each data attribute is assumed to be orthogonal to others. Namely, every pair of dimension is orthogonal to each other and thus there is no distinction of in-between relations of dimensions. However, this is certainly not the case in real world signals which naturally originate from a spatio-temporal configuration. As a result, the conventional vectorization process disrupts all of the spatio-temporal information about the order/place of data whether it be 1D, 2D, 3D, or 4D. In this paper, the problem of orthogonality is first investigated through conventional k-means of images, where images are to be processed as vectors. As a solution, shift-invariant k-means is proposed in a novel framework with the help of sparse representations. A generalization of shift-invariant k-means, convolutional dictionary learning is then utilized as an unsupervised feature extraction method for classification. Experiments suggest that Gabor feature extraction as a simulation of shallow convolutional neural networks provides a little better performance compared to convolutional dictionary learning. Other alternatives of convolutional-logic are also discussed for spatio-temporal information preservation, including a spatio-temporal hypercomplex encoding scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

References

  1. Sandor, J. (1996). On the arithmetical functions dk(n) and d*k(n). Portugaliæ Mathematica, 53, 107–116.

    MathSciNet  MATH  Google Scholar 

  2. Jafari, M., & Molaei, H. (2014). Spherical linear interpolation and Bezier curves. General Science Research, 2, 13–17.

    Google Scholar 

  3. Oktar, Y., & Turkan, M. (2018). A review of sparsity-based clustering methods. Signal Processing, 148, 20–30.

    Article  Google Scholar 

  4. Oktar, Y., & Turkan, M. (2019). K-polytopes: A superproblem of k-means. Signal, Image, Video Processing, 13, 1207–1214.

    Article  Google Scholar 

  5. Oktar, Y., & Turkan, M. (2020). Evolutionary simplicial learning as a generative and compact sparse framework for classification. Signal Processing, 174.

  6. Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666.

    Article  Google Scholar 

  7. Pati, Y. C., Rezaiifar, R., & Krishnaprasad, P. S. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Asilomar Conference on Signals, Systems, and Computers (pp. 40–44).

  8. Engan, K., Aase, S. O., & Husoy, J. H. (1999). Method of optimal directions for frame design. In IEEE International Conference on Acoustics, Speech, & Signal Processing (pp. 2443–2446). volume 5.

  9. Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54, 4311–4322.

    Article  MATH  Google Scholar 

  10. Tang, W., Panahi, A., Krim, H., & Dai, L. (2019). Analysis dictionary learning: An efficient and discriminative solution. In IEEE International Conference Acoustics, Speech, Signal Processing (pp. 3682–3686).

  11. Zhang, Z., Jiang, W., Qin, J., Zhang, L., Li, F., Zhang, M., & Yan, S. (2017). Jointly learning structured analysis discriminative dictionary and analysis multiclass classifier. IEEE Transactions on Neural Networks and Learning Systems, 29, 3798–3814.

    Article  MathSciNet  Google Scholar 

  12. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems (pp. 2672–2680).

  13. Zhang, Z., Sun, Y., Wang, Y., Zhang, Z., Zhang, H., Liu, G., & Wang, M. (2020). Twin-incoherent self-expressive locality-adaptive latent dictionary pair learning for classification. IEEE Transactions on Neural Networks and Learning Systems.

  14. Garcia-Cardona, C., & Wohlberg, B. (2018). Convolutional dictionary learning: A comparative review and new algorithms. IEEE Transactions on Computational Imaging, 4, 366–381.

    Article  MathSciNet  Google Scholar 

  15. Pu, Y., Yuan, W., Stevens, A., Li, C., & Carin, L. (2016). A deep generative deconvolutional image model. In Statistics in Artificial Intelligence (pp. 741–750).

  16. Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. (2010). Deconvolutional networks. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2528–2535).

  17. LeCun, Y., Cortes, C., & Burges, C. J. C. (2010). MNIST Handwritten Digit Database.

  18. Dhillon, I. S., Guan, Y., & Kulis, B. (2004). Kernel k-means: Spectral clustering and normalized cuts. In ACM International Conference on Knowledge Discovery and Data Mining (pp. 551–556).

  19. Iam-on, N., & Garrett, S. (2010). Linkclue: A MATLAB package for link-based cluster ensembles. Journal of Statistical Software, 36, 1–36.

    Article  Google Scholar 

  20. Fard, M. M., Thonet, T., & Gaussier, E. (2020). Deep k-means: Jointly clustering with k-means and learning representations. Pattern Recognition Letters.

  21. Hull, J. J. (1994). A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 550–554.

    Article  Google Scholar 

  22. Wohlberg, B. (2017). SPORCO: A Python package for standard and convolutional sparse representations. In Python in Science Conference (pp. 1–8).

  23. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 886–893). volume 1.

  24. Ojala, T., Pietikainen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29, 51–59.

    Article  Google Scholar 

  25. Haghighat, M., Zonouz, S., & Abdel-Mottaleb, M. (2015). CloudID: Trustworthy cloud-based and cross-enterprise biometric identification. Expert Systems with Applications, 42, 7905–7916.

    Article  Google Scholar 

  26. Moody, G. B., & Mark, R. G. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20, 45–50.

    Article  Google Scholar 

  27. Kachuee, M., Fazeli, S., & Sarrafzadeh, M. (2018). ECG heartbeat classification: A deep transferable representation. In IEEE International Conference Healthcare Informatics (pp. 443–444).

  28. Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., & Batista, G. (2015). The UCR Time Series Classification Archive.

  29. Barthelemy, Q., Larue, A., Mayoue, A., Mercier, D., & Mars, J. I. (2012). Shift & 2D rotation invariant sparse coding for multivariate signals. IEEE Transactions on Signal Processing, 60, 1597–1611.

    Article  MathSciNet  MATH  Google Scholar 

  30. Bar, L., & Sapiro, G. (2010). Hierarchical dictionary learning for invariant classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 3578–3581).

  31. Eysenck, M. W., & Keane, M. T. (2005). Cognitive psychology: A student’s handbook. Taylor & Francis.

  32. Gu, S., Meng, D., Zuo, W., & Zhang, L. (2017). Joint convolutional analysis and synthesis sparse representation for single image layer separation. In IEEE International Conference on Computer Vision (pp. 1708–1716).

  33. Shekhar, S., Patel, V. M., & Chellappa, R. (2014). Analysis sparse coding models for image-based classification. In IEEE International Conference Image Processing (pp. 5207–5211).

  34. Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45, 2673–2681.

    Article  Google Scholar 

  35. Sak, H., Senior, A., & Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Annual Conference International Speech Communication Association.

  36. Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271

  37. Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20, 61–80.

    Article  Google Scholar 

  38. Arora, S., Du, S. S., Li, Z., Salakhutdinov, R., Wang, R., & Yu, D. (2019). Harnessing the power of infinitely wide deep nets on small-data tasks. arXiv preprint arXiv:1910.01663

  39. Lee, J., Bahri, Y., Novak, R., Schoenholz, S. S., Pennington, J., & Sohl-Dickstein, J. (2017). Deep neural networks as Gaussian processes. arXiv preprint arXiv:1711.00165

  40. Hazan, T., Polak, S., & Shashua, A. (2005). Sparse image coding using a 3D non-negative tensor factorization. In IEEE International Conference on Computer Vision (pp. 50–57). volume 1.

  41. Kolda, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review, 51, 455–500.

    Article  MathSciNet  MATH  Google Scholar 

  42. Duan, G., Wang, H., Liu, Z., Deng, J., & Chen, Y.-W. (2012). K-CPD: Learning of overcomplete dictionaries for tensor sparse coding. In IEEE International Conference on Pattern Recognition (pp. 493–496).

  43. Wang, J., Li, J., Han, X.-H., Lin, L., Hu, H., Xu, Y., et al. (2020). Tensor-based sparse representations of multi-phase medical images for classification of focal liver lesions. Pattern Recognition Letters, 130, 207–215.

    Article  Google Scholar 

  44. Caiafa, C. F., & Cichocki, A. (2013). Computing sparse representations of multidimensional signals using Kronecker bases. Neural Computation, 25, 186–220.

    Article  MathSciNet  MATH  Google Scholar 

  45. Caiafa, C. F., & Cichocki, A. (2012). Block sparse representations of tensors using Kronecker bases. In IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 2709–2712).

  46. Peng, Y., Meng, D., Xu, Z., Gao, C., Yang, Y., & Zhang, B. (2014). Decomposable nonlocal tensor dictionary learning for multispectral image denoising. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2949–2956).

  47. Qi, N., Shi, Y., Sun, X., Wang, J., & Yin, B. (2013). Two dimensional synthesis sparse model. In IEEE International Conference Multimedia Expo (pp. 1–6).

  48. Roemer, F., Del Galdo, G., & Haardt, M. (2014). Tensor-based algorithms for learning multidimensional separable dictionaries. In IEEE International Conference Acoustics, Speech, Signal Processing (pp. 3963–3967).

  49. Huang, F., & Anandkumar, A. (2015). Convolutional dictionary learning through tensor factorization. In Feature extraction: Modern Questions and Challenges (pp. 116–129).

  50. Moxey, C. E., Sangwine, S. J., & Ell, T. A. (2003). Hypercomplex correlation techniques for vector images. IEEE Transactions on Signal Processing, 51, 1941–1953.

    Article  MathSciNet  MATH  Google Scholar 

  51. Xu, Y., Yu, L., Xu, H., Zhang, H., & Nguyen, T. (2015). Vector sparse representation of color image using quaternion matrix analysis. IEEE Transactions on Image Processing, 24, 1315–1329.

    Article  MathSciNet  MATH  Google Scholar 

  52. Kilmer, M. E., & Martin, C. D. (2011). Factorization strategies for third-order tensors. Linear Algebra and its Applications, 435, 641–658.

    Article  MathSciNet  MATH  Google Scholar 

  53. Mairal, J., Elad, M., & Sapiro, G. (2007). Sparse representation for color image restoration. IEEE Transactions on Image Processing, 17, 53–69.

    Article  MathSciNet  Google Scholar 

  54. Hirose, A. (2012). Complex-valued neural networks volume 400. Springer Science & Business Media.

  55. Isokawa, T., Kusakabe, T., Matsui, N., & Peper, F. (2003). Quaternion neural network and its application. In Int. Conf. Knowledge-based Intell. Inf. Eng. Syst. (pp. 318–324).

  56. Nitta, T. (2003). Solving the XOR problem and the detection of symmetry using a single complex-valued neuron. Neural Networks, 16, 1101–1105.

    Article  Google Scholar 

  57. Chen, X., Song, Q., & Li, Z. (2017). Design and analysis of quaternion-valued neural networks for associative memories. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 48, 2305–2314.

    Article  Google Scholar 

  58. Lazendic, S., De Bie, H., & Pizurica, A. (2018a). Octonion sparse representation for color and multispectral image processing. In European Signal Processing Conference (pp. 608–612).

  59. Popa, C.-A. (2016). Octonion-valued neural networks. In International Conference on Artificial Neural Networks (pp. 435–443).

  60. Lazendic, S., Pizurica, A., & De Bie, H. (2018b). Hypercomplex algebras for dictionary learning. In Conference Applied Geometric Algebra Computing in Engineering (pp. 57–64).

  61. Wang, R., Wang, K., Cao, W., & Wang, X. (2019). Geometric algebra in signal and image processing: A survey. IEEE Access, 7, 156315–156325.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehmet Turkan.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 A Brief Discussion on the Spatio-temporal Information Preservation

Variations on Neural Networks

Convolution with a kernel in the input side of a layer corresponds to a locally connected structure instead of a traditional fully connected one. Neighboring cells now occur in a relation, preserving the original spatial configuration. As an alternative to the convolutional approach then, neighboring cells in the input or the output side of a neural network layer can also be put in relation with direct edges in-between, as another way of preserving the original spatial configuration that the input cells have. Possibility of edges in-between in the same layer might force to think of a neural network as a more general directed graph. In fact, this line of logic leads to an alternative structure known as recurrent neural networks (RNN). In most general sense, RNNs represent directed graphs. Note that it is possible to build upon basic RNN structure through bidirectional logic [34] and long-short term memory concept [35].

On the other hand, empirical evaluation suggests that temporal convolution, or in other words 1D convolutional-logic surpasses the capacity of recurrent architectures in sequence modeling [36]. It is still an open question whether temporal dimension should be regarded as just another spatial dimension or whether a hybrid approach is better. This is rather a deep issue related to properties of space and time. Instead, considering neural networks of any structure as directed and possibly cyclic graphs, or in other words as neural graphs, might pave way to better understanding of the brain. Note that this concept is rather different than graph neural networks which use graphs as inputs [37].

Another generalization for neural networks is possible by considering infinite width neural networks [38]. Recent results suggest that deep neural networks that are allowed to become infinitely wide converge to models called Gaussian processes [39]. However, such studies do not consider the case when there are in-between connections within layers. Considering the existence of these connections, this can further lead to having an infinite but continuous (input or output) layers, which is indeed applicable mathematically and practically. A generalization of neural network layer cases in this sense is depicted in Fig. 9. The third case in this figure is important in that, it leads to the concept of functional machine learning. This alone may not be enough to preserve the spatial configuration of the input layer. Therefore, additional locally connected versions of these structures can also be proposed.

Figure 9
figure 9

A generalization of neural network layer cases. (From left-to-right) Discrete-discrete (classical), discrete-continuous, continuous-discrete and continuous-continuous input and output layers.

Tensor-based Sparse Representations

The fact is that images are not vectors, thus vectorization breaks the spatial coherency of images which is investigated by [40]. This line of thought is centralized around tensor factorization as a generalization. The study in [40] reports that by treating training images as a 3D cube and performing a non-negative tensor factorization (NTF); higher efficiency, discrimination and representation power can be achieved when compared to non-negative matrix factorization (NMF).

There are two main branches of tensor decomposition. In the first branch, studies are based on canonical polyadic decomposition (CPD), sometimes also referred to as CANDECOMP/PARAFAC [41]. The most relevant example from literature is K-CPD [42], an algorithm of overcomplete dictionary learning for tensor sparse coding based on a multilinear version of OMP and CANDECOMP/PARAFAC decomposition. K-CPD surpasses conventional methods in a series of image denoising experiments. Most recently, a similar framework is also successfully utilized in tensor-based sparse representations for classification of multiphase medical images [43]. The second branch is centered around the Tucker decomposition model instead, which is a more general model than CPD [44]. The study in [45] presents the foundations of the Tucker decomposition model by defining the Tensor-OMP algorithm which computes a block-sparse representation of a tensor with respect to a Kronecker basis. In [44], authors report that a block-sparse structure imposed on a core tensor through subtensors provide significant results. The Tucker model together with block-sparsity restriction may work significantly well, since the higher dimensional block structure is meaningfully applied on the original sparse tensor in the form of subtensors. There are many other studies in literature specifically based on the Tucker model of sparse representations with or without block-sparsity and additionally including dictionary learning [46,47,48].

Certain parallels can be drawn between convolutional dictionary learning and tensor-based sparse representations. As an example, the study in [49] proposes a novel framework for learning convolutional models through tensor decomposition and shows that cumulant tensors have a CPD whose components correspond to convolutional filters and their circulant shifts.

On the other side, tensor-based approaches (both CPD and Tucker models) do not still provide a solution to 1D case. Without loss of generality, let us assume that the signal is in the form of a column vector \(\mathbf{s}\). Since the signal is one-dimensional, there will be a single matrix \(\mathbf{D}\) for that single dimension in the Tucker model. Therefore, the model attained is \(\mathbf{s} = \mathbf{x} \times _{1}{} \mathbf{D}\) in Eq. (5). It is also possible to show that \(\mathbf{x}\times _{1}{} \mathbf{D} = \mathbf{D}{} \mathbf{x}\). From the CPD model perspective, there is equivalently \(\sum _{i}{x_{i} \mathbf{d}_{i}^{(1)}}\) where \(x_{i}\) is the single sparse coefficient associated with \(i^{th}\) atom \(\mathbf{d}_i\). Hence, one arrives at a standard formulation in Eq. (5), namely Tucker and CPD models are equivalent in one-dimensional case, all corresponding to conventional orthogonal sparse representation.

$$\begin{aligned} \mathbf{s} = \mathbf{x}\times _{1}{} \mathbf{D} = \mathbf{D}{} \mathbf{x} = {\sum _{i}{x_{i} \mathbf{d}_{i}^{(1)}}} \end{aligned}$$
(5)

The above observation brings up an important question onto the table. Although tensor-based approaches provide advantage when the signals are multidimensional, these formulations will not provide an edge for 1D signals. The remedy may come from considering a 1D signal, not as a 1D vector of elements solely. In other words, a 1D complex vector can be formed by coding the cell positions in the imaginary parts to overcome the orthogonality problem in standard 1D vector representation as depicted in Fig. 10. This paves way to performing sparse representations of complex valued data, or even quaternion valued data, to accommodate more information in cases of higher dimensionality. Utmost generalization is achieved through geometric algebra as a generalization of hypercomplex numbers.

Figure 10
figure 10

An encoding scheme to preserve spatio-temporal information for (top) 1D mono audio and (bottom) 2D grayscale image cases.

Complex, Hypercomplex and Geometric Algebra Based Approaches

Note that quaternion algebra is the first hypercomplex number system to be devised that is similar to real and complex number systems [50]. The study in [51] states that a quaternion-based model can achieve more structured representation when compared to a tensor-based model. Comparisons between quaternion-SVD and tensor-SVD [52] provide their equivalence, but superiority of quaternion-SVD arises when it is combined with the sparse representation model. It is possible to formulate a quaternion-valued sparse representation of color images that surpasses the conventional logic [51].

There are four possible models to represent color images as suggested in [51]. The first one is the monochromatic model, in which each color channel is represented separately. The second one is the concatenation model, where a single vector is formed by concatenating three color channels [53]. The third is the tensor-based model, where the color image is thought of as a 3D cube of values. The last one is the quaternion-based model, where each color channel is assigned to each imaginary value, i.e., r,g,b to i,j,k respectively. Most importantly, all these models are analytically unified.

There is also one more possible model that is subtler. As depicted in Fig. 10, one can encode a mono audio as a vector of complex numbers where imaginary values indicate the timed position, in a similar way one can encode a grayscale image as a quaternion-valued vector where imaginary parts are allocated to indicate the pixel positions. While thinking of a color image as a 3D cube, there is a possible quaternion-based model in which imaginary units encode the position within this cube and the scalar denotes the value of that cell. The same quaternion-based encoding can be applied to any 3D scalar data.

For further machine learning in this proposed scheme, a hypercomplex to real feature extraction layer is required since current mainstream classification algorithms need real-valued data. Another option is to consult classification algorithms that can directly handle hypercomplex values. This line of logic paves way to consider complex/hypercomplex valued neural networks as viable tools [54, 55]. As a future work, comparison of spatio-temporally encoded hypercomplex neural networks with conventional convolutional or recurrent neural networks may lead to deeper understanding of the deep learning concept. As a motivation, a single complex-valued neuron can solve the XOR problem [56]. In addition, the fact that quaternions can be used to implement associative memory in neural networks is promising [57].

Another line of generalization can deal with the case when the data has more than three dimensions. In such a case, a quaternion is not enough to designate the cell position and its value. As an extension, octonion algebra can accommodate up to seven imaginary channels [58, 59]; however, loses the associativity property. The study in [60] reports that all algebras of dimension larger than eight lose important properties, since they contain algebras of smaller dimension as subalgebras. This might be an issue related to physics of space and time, which is out of scope of this study. The important fact is that the domain dealing with generalization of hypercomplex numbers is called “geometric algebra” and is gaining great attention lately [61].

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oktar, Y., Turkan, M. Preserving Spatio-Temporal Information in Machine Learning: A Shift-Invariant k-Means Perspective. J Sign Process Syst 94, 1471–1483 (2022). https://doi.org/10.1007/s11265-022-01818-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-022-01818-8

Keywords

Navigation