Abstract
In conventional machine learning applications, each data attribute is assumed to be orthogonal to others. Namely, every pair of dimension is orthogonal to each other and thus there is no distinction of in-between relations of dimensions. However, this is certainly not the case in real world signals which naturally originate from a spatio-temporal configuration. As a result, the conventional vectorization process disrupts all of the spatio-temporal information about the order/place of data whether it be 1D, 2D, 3D, or 4D. In this paper, the problem of orthogonality is first investigated through conventional k-means of images, where images are to be processed as vectors. As a solution, shift-invariant k-means is proposed in a novel framework with the help of sparse representations. A generalization of shift-invariant k-means, convolutional dictionary learning is then utilized as an unsupervised feature extraction method for classification. Experiments suggest that Gabor feature extraction as a simulation of shallow convolutional neural networks provides a little better performance compared to convolutional dictionary learning. Other alternatives of convolutional-logic are also discussed for spatio-temporal information preservation, including a spatio-temporal hypercomplex encoding scheme.








Similar content being viewed by others
References
Sandor, J. (1996). On the arithmetical functions dk(n) and d*k(n). Portugaliæ Mathematica, 53, 107–116.
Jafari, M., & Molaei, H. (2014). Spherical linear interpolation and Bezier curves. General Science Research, 2, 13–17.
Oktar, Y., & Turkan, M. (2018). A review of sparsity-based clustering methods. Signal Processing, 148, 20–30.
Oktar, Y., & Turkan, M. (2019). K-polytopes: A superproblem of k-means. Signal, Image, Video Processing, 13, 1207–1214.
Oktar, Y., & Turkan, M. (2020). Evolutionary simplicial learning as a generative and compact sparse framework for classification. Signal Processing, 174.
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666.
Pati, Y. C., Rezaiifar, R., & Krishnaprasad, P. S. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Asilomar Conference on Signals, Systems, and Computers (pp. 40–44).
Engan, K., Aase, S. O., & Husoy, J. H. (1999). Method of optimal directions for frame design. In IEEE International Conference on Acoustics, Speech, & Signal Processing (pp. 2443–2446). volume 5.
Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54, 4311–4322.
Tang, W., Panahi, A., Krim, H., & Dai, L. (2019). Analysis dictionary learning: An efficient and discriminative solution. In IEEE International Conference Acoustics, Speech, Signal Processing (pp. 3682–3686).
Zhang, Z., Jiang, W., Qin, J., Zhang, L., Li, F., Zhang, M., & Yan, S. (2017). Jointly learning structured analysis discriminative dictionary and analysis multiclass classifier. IEEE Transactions on Neural Networks and Learning Systems, 29, 3798–3814.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems (pp. 2672–2680).
Zhang, Z., Sun, Y., Wang, Y., Zhang, Z., Zhang, H., Liu, G., & Wang, M. (2020). Twin-incoherent self-expressive locality-adaptive latent dictionary pair learning for classification. IEEE Transactions on Neural Networks and Learning Systems.
Garcia-Cardona, C., & Wohlberg, B. (2018). Convolutional dictionary learning: A comparative review and new algorithms. IEEE Transactions on Computational Imaging, 4, 366–381.
Pu, Y., Yuan, W., Stevens, A., Li, C., & Carin, L. (2016). A deep generative deconvolutional image model. In Statistics in Artificial Intelligence (pp. 741–750).
Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. (2010). Deconvolutional networks. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2528–2535).
LeCun, Y., Cortes, C., & Burges, C. J. C. (2010). MNIST Handwritten Digit Database.
Dhillon, I. S., Guan, Y., & Kulis, B. (2004). Kernel k-means: Spectral clustering and normalized cuts. In ACM International Conference on Knowledge Discovery and Data Mining (pp. 551–556).
Iam-on, N., & Garrett, S. (2010). Linkclue: A MATLAB package for link-based cluster ensembles. Journal of Statistical Software, 36, 1–36.
Fard, M. M., Thonet, T., & Gaussier, E. (2020). Deep k-means: Jointly clustering with k-means and learning representations. Pattern Recognition Letters.
Hull, J. J. (1994). A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 550–554.
Wohlberg, B. (2017). SPORCO: A Python package for standard and convolutional sparse representations. In Python in Science Conference (pp. 1–8).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 886–893). volume 1.
Ojala, T., Pietikainen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29, 51–59.
Haghighat, M., Zonouz, S., & Abdel-Mottaleb, M. (2015). CloudID: Trustworthy cloud-based and cross-enterprise biometric identification. Expert Systems with Applications, 42, 7905–7916.
Moody, G. B., & Mark, R. G. (2001). The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20, 45–50.
Kachuee, M., Fazeli, S., & Sarrafzadeh, M. (2018). ECG heartbeat classification: A deep transferable representation. In IEEE International Conference Healthcare Informatics (pp. 443–444).
Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., & Batista, G. (2015). The UCR Time Series Classification Archive.
Barthelemy, Q., Larue, A., Mayoue, A., Mercier, D., & Mars, J. I. (2012). Shift & 2D rotation invariant sparse coding for multivariate signals. IEEE Transactions on Signal Processing, 60, 1597–1611.
Bar, L., & Sapiro, G. (2010). Hierarchical dictionary learning for invariant classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 3578–3581).
Eysenck, M. W., & Keane, M. T. (2005). Cognitive psychology: A student’s handbook. Taylor & Francis.
Gu, S., Meng, D., Zuo, W., & Zhang, L. (2017). Joint convolutional analysis and synthesis sparse representation for single image layer separation. In IEEE International Conference on Computer Vision (pp. 1708–1716).
Shekhar, S., Patel, V. M., & Chellappa, R. (2014). Analysis sparse coding models for image-based classification. In IEEE International Conference Image Processing (pp. 5207–5211).
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45, 2673–2681.
Sak, H., Senior, A., & Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Annual Conference International Speech Communication Association.
Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20, 61–80.
Arora, S., Du, S. S., Li, Z., Salakhutdinov, R., Wang, R., & Yu, D. (2019). Harnessing the power of infinitely wide deep nets on small-data tasks. arXiv preprint arXiv:1910.01663
Lee, J., Bahri, Y., Novak, R., Schoenholz, S. S., Pennington, J., & Sohl-Dickstein, J. (2017). Deep neural networks as Gaussian processes. arXiv preprint arXiv:1711.00165
Hazan, T., Polak, S., & Shashua, A. (2005). Sparse image coding using a 3D non-negative tensor factorization. In IEEE International Conference on Computer Vision (pp. 50–57). volume 1.
Kolda, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review, 51, 455–500.
Duan, G., Wang, H., Liu, Z., Deng, J., & Chen, Y.-W. (2012). K-CPD: Learning of overcomplete dictionaries for tensor sparse coding. In IEEE International Conference on Pattern Recognition (pp. 493–496).
Wang, J., Li, J., Han, X.-H., Lin, L., Hu, H., Xu, Y., et al. (2020). Tensor-based sparse representations of multi-phase medical images for classification of focal liver lesions. Pattern Recognition Letters, 130, 207–215.
Caiafa, C. F., & Cichocki, A. (2013). Computing sparse representations of multidimensional signals using Kronecker bases. Neural Computation, 25, 186–220.
Caiafa, C. F., & Cichocki, A. (2012). Block sparse representations of tensors using Kronecker bases. In IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 2709–2712).
Peng, Y., Meng, D., Xu, Z., Gao, C., Yang, Y., & Zhang, B. (2014). Decomposable nonlocal tensor dictionary learning for multispectral image denoising. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 2949–2956).
Qi, N., Shi, Y., Sun, X., Wang, J., & Yin, B. (2013). Two dimensional synthesis sparse model. In IEEE International Conference Multimedia Expo (pp. 1–6).
Roemer, F., Del Galdo, G., & Haardt, M. (2014). Tensor-based algorithms for learning multidimensional separable dictionaries. In IEEE International Conference Acoustics, Speech, Signal Processing (pp. 3963–3967).
Huang, F., & Anandkumar, A. (2015). Convolutional dictionary learning through tensor factorization. In Feature extraction: Modern Questions and Challenges (pp. 116–129).
Moxey, C. E., Sangwine, S. J., & Ell, T. A. (2003). Hypercomplex correlation techniques for vector images. IEEE Transactions on Signal Processing, 51, 1941–1953.
Xu, Y., Yu, L., Xu, H., Zhang, H., & Nguyen, T. (2015). Vector sparse representation of color image using quaternion matrix analysis. IEEE Transactions on Image Processing, 24, 1315–1329.
Kilmer, M. E., & Martin, C. D. (2011). Factorization strategies for third-order tensors. Linear Algebra and its Applications, 435, 641–658.
Mairal, J., Elad, M., & Sapiro, G. (2007). Sparse representation for color image restoration. IEEE Transactions on Image Processing, 17, 53–69.
Hirose, A. (2012). Complex-valued neural networks volume 400. Springer Science & Business Media.
Isokawa, T., Kusakabe, T., Matsui, N., & Peper, F. (2003). Quaternion neural network and its application. In Int. Conf. Knowledge-based Intell. Inf. Eng. Syst. (pp. 318–324).
Nitta, T. (2003). Solving the XOR problem and the detection of symmetry using a single complex-valued neuron. Neural Networks, 16, 1101–1105.
Chen, X., Song, Q., & Li, Z. (2017). Design and analysis of quaternion-valued neural networks for associative memories. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 48, 2305–2314.
Lazendic, S., De Bie, H., & Pizurica, A. (2018a). Octonion sparse representation for color and multispectral image processing. In European Signal Processing Conference (pp. 608–612).
Popa, C.-A. (2016). Octonion-valued neural networks. In International Conference on Artificial Neural Networks (pp. 435–443).
Lazendic, S., Pizurica, A., & De Bie, H. (2018b). Hypercomplex algebras for dictionary learning. In Conference Applied Geometric Algebra Computing in Engineering (pp. 57–64).
Wang, R., Wang, K., Cao, W., & Wang, X. (2019). Geometric algebra in signal and image processing: A survey. IEEE Access, 7, 156315–156325.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 A Brief Discussion on the Spatio-temporal Information Preservation
Variations on Neural Networks
Convolution with a kernel in the input side of a layer corresponds to a locally connected structure instead of a traditional fully connected one. Neighboring cells now occur in a relation, preserving the original spatial configuration. As an alternative to the convolutional approach then, neighboring cells in the input or the output side of a neural network layer can also be put in relation with direct edges in-between, as another way of preserving the original spatial configuration that the input cells have. Possibility of edges in-between in the same layer might force to think of a neural network as a more general directed graph. In fact, this line of logic leads to an alternative structure known as recurrent neural networks (RNN). In most general sense, RNNs represent directed graphs. Note that it is possible to build upon basic RNN structure through bidirectional logic [34] and long-short term memory concept [35].
On the other hand, empirical evaluation suggests that temporal convolution, or in other words 1D convolutional-logic surpasses the capacity of recurrent architectures in sequence modeling [36]. It is still an open question whether temporal dimension should be regarded as just another spatial dimension or whether a hybrid approach is better. This is rather a deep issue related to properties of space and time. Instead, considering neural networks of any structure as directed and possibly cyclic graphs, or in other words as neural graphs, might pave way to better understanding of the brain. Note that this concept is rather different than graph neural networks which use graphs as inputs [37].
Another generalization for neural networks is possible by considering infinite width neural networks [38]. Recent results suggest that deep neural networks that are allowed to become infinitely wide converge to models called Gaussian processes [39]. However, such studies do not consider the case when there are in-between connections within layers. Considering the existence of these connections, this can further lead to having an infinite but continuous (input or output) layers, which is indeed applicable mathematically and practically. A generalization of neural network layer cases in this sense is depicted in Fig. 9. The third case in this figure is important in that, it leads to the concept of functional machine learning. This alone may not be enough to preserve the spatial configuration of the input layer. Therefore, additional locally connected versions of these structures can also be proposed.
Tensor-based Sparse Representations
The fact is that images are not vectors, thus vectorization breaks the spatial coherency of images which is investigated by [40]. This line of thought is centralized around tensor factorization as a generalization. The study in [40] reports that by treating training images as a 3D cube and performing a non-negative tensor factorization (NTF); higher efficiency, discrimination and representation power can be achieved when compared to non-negative matrix factorization (NMF).
There are two main branches of tensor decomposition. In the first branch, studies are based on canonical polyadic decomposition (CPD), sometimes also referred to as CANDECOMP/PARAFAC [41]. The most relevant example from literature is K-CPD [42], an algorithm of overcomplete dictionary learning for tensor sparse coding based on a multilinear version of OMP and CANDECOMP/PARAFAC decomposition. K-CPD surpasses conventional methods in a series of image denoising experiments. Most recently, a similar framework is also successfully utilized in tensor-based sparse representations for classification of multiphase medical images [43]. The second branch is centered around the Tucker decomposition model instead, which is a more general model than CPD [44]. The study in [45] presents the foundations of the Tucker decomposition model by defining the Tensor-OMP algorithm which computes a block-sparse representation of a tensor with respect to a Kronecker basis. In [44], authors report that a block-sparse structure imposed on a core tensor through subtensors provide significant results. The Tucker model together with block-sparsity restriction may work significantly well, since the higher dimensional block structure is meaningfully applied on the original sparse tensor in the form of subtensors. There are many other studies in literature specifically based on the Tucker model of sparse representations with or without block-sparsity and additionally including dictionary learning [46,47,48].
Certain parallels can be drawn between convolutional dictionary learning and tensor-based sparse representations. As an example, the study in [49] proposes a novel framework for learning convolutional models through tensor decomposition and shows that cumulant tensors have a CPD whose components correspond to convolutional filters and their circulant shifts.
On the other side, tensor-based approaches (both CPD and Tucker models) do not still provide a solution to 1D case. Without loss of generality, let us assume that the signal is in the form of a column vector \(\mathbf{s}\). Since the signal is one-dimensional, there will be a single matrix \(\mathbf{D}\) for that single dimension in the Tucker model. Therefore, the model attained is \(\mathbf{s} = \mathbf{x} \times _{1}{} \mathbf{D}\) in Eq. (5). It is also possible to show that \(\mathbf{x}\times _{1}{} \mathbf{D} = \mathbf{D}{} \mathbf{x}\). From the CPD model perspective, there is equivalently \(\sum _{i}{x_{i} \mathbf{d}_{i}^{(1)}}\) where \(x_{i}\) is the single sparse coefficient associated with \(i^{th}\) atom \(\mathbf{d}_i\). Hence, one arrives at a standard formulation in Eq. (5), namely Tucker and CPD models are equivalent in one-dimensional case, all corresponding to conventional orthogonal sparse representation.
The above observation brings up an important question onto the table. Although tensor-based approaches provide advantage when the signals are multidimensional, these formulations will not provide an edge for 1D signals. The remedy may come from considering a 1D signal, not as a 1D vector of elements solely. In other words, a 1D complex vector can be formed by coding the cell positions in the imaginary parts to overcome the orthogonality problem in standard 1D vector representation as depicted in Fig. 10. This paves way to performing sparse representations of complex valued data, or even quaternion valued data, to accommodate more information in cases of higher dimensionality. Utmost generalization is achieved through geometric algebra as a generalization of hypercomplex numbers.
Complex, Hypercomplex and Geometric Algebra Based Approaches
Note that quaternion algebra is the first hypercomplex number system to be devised that is similar to real and complex number systems [50]. The study in [51] states that a quaternion-based model can achieve more structured representation when compared to a tensor-based model. Comparisons between quaternion-SVD and tensor-SVD [52] provide their equivalence, but superiority of quaternion-SVD arises when it is combined with the sparse representation model. It is possible to formulate a quaternion-valued sparse representation of color images that surpasses the conventional logic [51].
There are four possible models to represent color images as suggested in [51]. The first one is the monochromatic model, in which each color channel is represented separately. The second one is the concatenation model, where a single vector is formed by concatenating three color channels [53]. The third is the tensor-based model, where the color image is thought of as a 3D cube of values. The last one is the quaternion-based model, where each color channel is assigned to each imaginary value, i.e., r,g,b to i,j,k respectively. Most importantly, all these models are analytically unified.
There is also one more possible model that is subtler. As depicted in Fig. 10, one can encode a mono audio as a vector of complex numbers where imaginary values indicate the timed position, in a similar way one can encode a grayscale image as a quaternion-valued vector where imaginary parts are allocated to indicate the pixel positions. While thinking of a color image as a 3D cube, there is a possible quaternion-based model in which imaginary units encode the position within this cube and the scalar denotes the value of that cell. The same quaternion-based encoding can be applied to any 3D scalar data.
For further machine learning in this proposed scheme, a hypercomplex to real feature extraction layer is required since current mainstream classification algorithms need real-valued data. Another option is to consult classification algorithms that can directly handle hypercomplex values. This line of logic paves way to consider complex/hypercomplex valued neural networks as viable tools [54, 55]. As a future work, comparison of spatio-temporally encoded hypercomplex neural networks with conventional convolutional or recurrent neural networks may lead to deeper understanding of the deep learning concept. As a motivation, a single complex-valued neuron can solve the XOR problem [56]. In addition, the fact that quaternions can be used to implement associative memory in neural networks is promising [57].
Another line of generalization can deal with the case when the data has more than three dimensions. In such a case, a quaternion is not enough to designate the cell position and its value. As an extension, octonion algebra can accommodate up to seven imaginary channels [58, 59]; however, loses the associativity property. The study in [60] reports that all algebras of dimension larger than eight lose important properties, since they contain algebras of smaller dimension as subalgebras. This might be an issue related to physics of space and time, which is out of scope of this study. The important fact is that the domain dealing with generalization of hypercomplex numbers is called “geometric algebra” and is gaining great attention lately [61].
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Oktar, Y., Turkan, M. Preserving Spatio-Temporal Information in Machine Learning: A Shift-Invariant k-Means Perspective. J Sign Process Syst 94, 1471–1483 (2022). https://doi.org/10.1007/s11265-022-01818-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-022-01818-8
Keywords
Profiles
- Yigit Oktar View author profile
- Mehmet Turkan View author profile