Backtransformation: a new representation of data processing chains with a scalar decision function

Krell, Mario Michael; Straube, Sirko

doi:10.1007/s11634-015-0229-3

Backtransformation: a new representation of data processing chains with a scalar decision function

Regular Article
Published: 23 December 2015

Volume 11, pages 415–439, (2017)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Mario Michael Krell¹ &
Sirko Straube²

305 Accesses
3 Citations
Explore all metrics

Abstract

Data processing often transforms a complex signal using a set of different preprocessing algorithms to a single value as the outcome of a final decision function. Still, it is challenging to understand and visualize the interplay between the algorithms performing this transformation. Especially when dimensionality reduction is used, the original data structure (e.g., spatio-temporal information) is hidden from subsequent algorithms. To tackle this problem, we introduce the backtransformation concept suggesting to look at the combination of algorithms as one transformation which maps the original input signal to a single value. Therefore, it takes the derivative of the final decision function and transforms it back through the previous processing steps via backward iteration and the chain rule. The resulting derivative of the composed decision function in the sample of interest represents the complete decision process. Using it for visualizations might improve the understanding of the process. Often, it is possible to construct a feasible processing chain with affine mappings which simplifies the calculation for the backtransformation and the interpretation of the result a lot. In this case, the affine backtransformation provides the complete parameterization of the processing chain. This article introduces the theory, provides implementation guidelines, and presents three application examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Derivation of fast algorithms via binary filtering of signals

Article 01 October 2016

Efficient 2D Processing of 1D Sensor Signals

Multiresolution analysis for linear canonical S transform

Article 04 October 2021

Notes

Further methods are presented but they are tailored to functional magnetic resonance imaging (fMRI) data.
The respective derivatives are constant for every sample and as such not depending on it.
The notation of data and its components differs from the notation in classification tasks. Here, we look at one data sample \(x^{(0)}\) with its different processing stages \(x^{(l)}\) and the respective changes in each component of the data \({\left( x^{(l)}_{gh}\right) }\). The double index notation is applied to account for different axes in the data as in time series (different sensors and time points) or images.
With \(n_{k+1}:=1\) it holds that \(\frac{\partial F_l}{\partial y^{(l)}}\in \mathbb {R}^{n_l\times n_{l+1}}\) and the dimensions of \(B_l\) are a consequence of the recursion. Another reason for the dimensions of \(B_l\) is that \(B_l\) corresponds to the mapping of \(x^{(l)}\) to the scalar output \(x^{\text {out}}\).
Note that no matrix inversion is required even though one might expect that, because the goal is to find out what the original mapping was doing with the data which sounds like an inverse approach.
A weighted sum of classifiers preserves linearity/differentiability. A majority vote will result in a non-differentiable classifier but when the score is the sum of the voters for the selected class, the resulting function will still be locally linear/differentiable.
http://pyspace.github.io/pyspace/.
Nevertheless, the resulting graphics look reasonable.
A standard extended 10–20 electrode layout has been chosen with 128 electrodes: http://www.brainproducts.com/filedownload.php?path=downloads/actiCAP-128-channel-Standard-2_1201.

References

Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459. doi:10.1002/wics.101
Article Google Scholar
Aksoy S, Haralick RM (2001) Feature normalization and likelihood-based similarity measures for image retrieval. Pattern Recognit Lett 22(5):563–582. doi:10.1016/S0167-8655(00)00112-4
Article MATH Google Scholar
Baehrens D, Schroeter T, Harmeling S, Kawanabe M, Hansen K, Müller KR (2010) How to explain individual classification decisions. J Mach Learn Res 11:1803–1831
MathSciNet MATH Google Scholar
Blankertz B, Tomioka R, Lemm S, Kawanabe M, Müller KR (2008) Optimizing spatial filters for robust EEG single-trial analysis. IEEE Signal Process Mag 25(1):41–56. doi:10.1109/MSP.2008.4408441
Article Google Scholar
Blankertz B, Lemm S, Treder M, Haufe S, Müller KR (2011) Single-trial analysis and classification of ERP components—a tutorial. NeuroImage 56(2):814–825. doi:10.1016/j.neuroimage.2010.06.048
Article Google Scholar
Chang CC, Lin CJ (2011) LIBSVM. ACM Trans Intell Syst Technol 2(3):1–27. doi:10.1145/1961189.1961199
Article Google Scholar
Chen Ch, Härdle W, Unwin A (2008) Handbook of data visualization. Springer Handbooks of Computational Statistics, Springer
MATH Google Scholar
Clarke F (1990) Optimization and nonsmooth analysis. Society for Industrial and Applied Mathematics, Philadelphia. doi:10.1137/1.9781611971309
Book MATH Google Scholar
Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive–aggressive algorithms. J Mach Learn Res 7:551–585
MathSciNet MATH Google Scholar
Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87. doi:10.1145/2347736.2347755
Article Google Scholar
Feess D, Krell MM, Metzen JH (2013) Comparison of sensor selection mechanisms for an ERP-based brain-computer interface. PLoS One 8(7):e67,543. doi:10.1371/journal.pone.0067543
Article Google Scholar
Ghaderi F, Straube S (2013) An adaptive and efficient spatial filter for event-related potentials. In: Proceedings of the 21st European signal processing conference (EUSIPCO)
Griewank A, Walther A (2008) Evaluating derivatives: principles and techniques of algorithmic differentiation. Society for Industrial and Applied Mathematics, Philadelphia
Book MATH Google Scholar
Haufe S, Meinecke F, Görgen K, Dähne S, Haynes JD, Blankertz B, Bießmann F (2014) On the interpretation of weight vectors of linear models in multivariate neuroimaging. NeuroImage 87:96–110. doi:10.1016/j.neuroimage.2013.10.067
Article Google Scholar
Johanshahi M, Hallett M (eds) (2003) The Bereitschaftspotential: movement-related cortical potentials. Kluwer Academic/Plenum Publishers, New York
Google Scholar
Jutten C, Herault J (1991) Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Process 24(1):1–10. doi:10.1016/0165-1684(91)90079-X
Article MATH Google Scholar
Kirchner EA, Kim SK, Straube S, Seeland A, Wöhrle H, Krell MM, Tabie M, Fahle M (2013) On the applicability of brain reading for predictive human–machine interfaces in robotics. PLoS One 8(12):e81,732. doi:10.1371/journal.pone.0081732
Article Google Scholar
Krell MM (2015) Generalizing, decoding, and optimizing support vector machine classification. PhD thesis, University of Bremen, Bremen. http://nbn-resolving.de/urn:nbn:de:gbv:46-00104380-12
Krell MM, Wöhrle H (2015) New one-class classifiers based on the origin separation approach. Pattern Recogn Lett 53:93–99. doi:10.1016/j.patrec.2014.11.008
Article Google Scholar
Krell MM, Straube S, Seeland A, Wöhrle H, Teiwes J, Metzen JH, Kirchner EA, Kirchner F (2013) pySPACE—a signal processing and classification environment in Python. Front Neuroinform 7(40). doi:10.3389/fninf.2013.00040
Krell MM, Tabie M, Wöhrle H, Kirchner EA (2013b) Memory and processing efficient formula for moving variance calculation in EEG and EMG signal processing. In: Proceedings of international congress on neurotechnology, electronics and informatics (NEUROTECHNIX 2013), ScitePress, Vilamoura, Portugal, pp 41–45. doi:10.5220/0004633800410045
Krell MM, Feess D, Straube S (2014a) Balanced relative margin machine the missing piece between FDA and SVM classification. Pattern Recogn Lett 41:43–52. doi:10.1016/j.patrec.2013.09.018
Article Google Scholar
Krell MM, Straube S, Wöhrle H, Kirchner F (2014b) Generalizing, optimizing, and decoding support vector machine classification. In: ECML/PKDD 2014 PhD session proceedings, Nancy
LaConte S, Strother S, Cherkassky V, Anderson J, Hu X (2005) Support vector machines for temporal classification of block design fMRI data. NeuroImage 26(2):317–329. doi:10.1016/j.neuroimage.2005.01.048
Article Google Scholar
Lagerlund TD, Sharbrough FW, Busacker NE (1997) Spatial filtering of multichannel electroencephalographic recordings through principal component analysis by singular value decomposition. J Clin Neurophysiol 14(1):73–82
Article Google Scholar
Lal TN, Schröder M, Hinterberger T, Weston J, Bogdan M, Birbaumer N, Schölkopf B (2004) Support vector channel selection in BCI. IEEE Eng Med Biol Soc 51(6):1003–1010. doi:10.1109/TBME.2004.827827
Article Google Scholar
Le QV, Ranzato M, Monga R, Devin M, Chen K, Corrado GS, Dean J, Ng AY (2012) Building high-level features using large scale unsupervised learning. In: International conference on machine learning
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. doi:10.1109/5.726791
Article Google Scholar
Lew E, Chavarriaga R, Zhang H, Seeck M, del Millan J (2012) Self-paced movement intention detection from human brain signals: invasive and non-invasive EEG. In: 2012 annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 3280–3283
Lin HT, Lin CJ, Weng RC (2007) A note on Platts probabilistic outputs for support vector machines. Mach Learn 68(3):267–276. doi:10.1007/s10994-007-5018-6
Article Google Scholar
Metzen JH, Kirchner EA (2011) Rapid adaptation of brain reading interfaces based on threshold adjustment. In: Proceedings of the 2011 conference of the German classification society (GfKl-2011), Frankfurt, Germany, p 138
Mika S, Rätsch G, Müller KR (2001) A mathematical programming approach to the kernel fisher algorithm. In: Advances in neural information processing systems 13 (NIPS 2000), MIT Press, pp 591–597
Oppenheim AV, Schafer RW (2009) Discrete-time signal processing, 3rd edn. Prentice Hall Press, Upper Saddle River
MATH Google Scholar
Platt JC (2000) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers, vol 10. MIT Press, Cambridge, pp 61–74
Press W (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge
MATH Google Scholar
Rieger J, Kosar K, Lhotska L, Krajca V (2004) Eeg data and data analysis visualization. In: Barreiro J, Martn-Snchez F, Maojo V, Sanz F (eds) Biological and medical data analysis, lecture notes in computer science, vol 3337. Springer, Berlin, pp 39–48. doi:10.1007/978-3-540-30547-7_5
Rivet B, Souloumiac A, Attina V, Gibert G (2009) xDAWN algorithm to enhance evoked potentials: application to brain–computer interface. IEEE Trans Biomed Eng 56(8):2035–2043. doi:10.1109/TBME.2009.2012869
Article Google Scholar
Rockafellar RT, Wets RJB (2009) Variational analysis, vol 317. Springer, Berlin, Heidelberg
MATH Google Scholar
Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of the 2012 IEEE Conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, pp 3642–3649
Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471. doi:10.1162/089976601750264965
Article MATH Google Scholar
Seeland A, Wöhrle H, Straube S, Kirchner EA (2013) Online movement prediction in a robotic application scenario. In: 6th international IEEE EMBS conference on neural engineering (NER), San Diego, USA, pp 41–44. doi:10.1109/NER.2013.6695866
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222. doi:10.1023/B:STCO.0000035301.49549.88
Article MathSciNet Google Scholar
Steinwart I, Christmann A (2008) Support vector machines. Springer, New York
MATH Google Scholar
Straube S, Feess D (2013) Looking at ERPs from another perspective: polynomial feature analysis. Perception 42 ECVP abstract supplement:220
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In: International conference on learning representations
Tabie M, Kirchner EA (2013) EMG onset detection—comparison of different methods for a movement prediction task based on EMG. In: Alvarez S, Solé-Casals J, Fred A, Gamboa H (eds) Proceedings of the 6th international conference on bio-inspired systems and signal processing (BIOSIGNALS-13). SciTePress, Barcelona, Spain, pp 242–247. doi:10.5220/0004250102420247
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Varewyck M, Martens JP (2011) A practical approach to model selection for support vector machines with a Gaussian kernel. IEEE Trans Syst Man Cybern Part B Cybern 41(2):330–340. doi:10.1109/TSMCB.2010.2053026
Article Google Scholar
Verhoeye J, de Wulf R (1999) An image processing chain for land-cover classification using multitemporal ERS-1 data. Photogramm Eng Remote Sens 65(10):1179–1186
Google Scholar
Woehrle H, Krell MM, Straube S, Kim SK, Kirchner EA, Kirchner F (2015) An adaptive spatial filter for user-independent single trial detection of event-related potentials. IEEE Trans Biomed Eng. doi:10.1109/TBME.2015.2402252

Download references

Acknowledgments

The authors thank David Feess, Marc Tabie, Anett Seeland, Frank Kirchner, Su Kyoung Kim, Hendrik Wöhrle, and Bertold Bongardt for highly valuable discussions and input. This work was supported by the German Federal Ministry of Economics and Technology (BMWi, Grants FKZ 50 RA 1012 and FKZ 50 RA 1011).

Author information

Authors and Affiliations

Robotics Lab, Faculty 3 Mathematics and Computer Science, University of Bremen, Robert-Hooke-Str.1, 28359, Bremen, Germany
Mario Michael Krell
DFKI Bremen, Robotics Innovation Center, German Research Center for Artificial Intelligence, Robert-Hooke-Str.1, 28359, Bremen, Germany
Sirko Straube

Authors

Mario Michael Krell
View author publications
You can also search for this author in PubMed Google Scholar
Sirko Straube
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mario Michael Krell.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krell, M.M., Straube, S. Backtransformation: a new representation of data processing chains with a scalar decision function. Adv Data Anal Classif 11, 415–439 (2017). https://doi.org/10.1007/s11634-015-0229-3

Download citation

Received: 08 September 2014
Revised: 29 September 2015
Accepted: 10 December 2015
Published: 23 December 2015
Issue Date: June 2017
DOI: https://doi.org/10.1007/s11634-015-0229-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Backtransformation: a new representation of data processing chains with a scalar decision function

Abstract

Access this article

Similar content being viewed by others

Derivation of fast algorithms via binary filtering of signals

Efficient 2D Processing of 1D Sensor Signals

Multiresolution analysis for linear canonical S transform

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Backtransformation: a new representation of data processing chains with a scalar decision function

Abstract

Access this article

Similar content being viewed by others

Derivation of fast algorithms via binary filtering of signals

Efficient 2D Processing of 1D Sensor Signals

Multiresolution analysis for linear canonical S transform

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation