Modelling gene interaction networks from time-series gene expression data using evolving spiking neural networks

Capecci, Elisa; Lobo, Jesus L.; Laña, Ibai; Espinosa-Ramos, Josafath I.; Kasabov, Nikola

doi:10.1007/s12530-019-09269-6

Modelling gene interaction networks from time-series gene expression data using evolving spiking neural networks

Original Paper
Published: 25 February 2019

Volume 11, pages 599–613, (2020)
Cite this article

Download PDF

Evolving Systems Aims and scope Submit manuscript

Modelling gene interaction networks from time-series gene expression data using evolving spiking neural networks

Download PDF

Elisa Capecci ORCID: orcid.org/0000-0001-5624-7238¹,
Jesus L. Lobo²,
Ibai Laña²,
Josafath I. Espinosa-Ramos¹ &
…
Nikola Kasabov¹

463 Accesses
2 Citations
Explore all metrics

Abstract

The genetic mechanisms responsible for the differentiation, metabolism, morphology and function of a cell in both normal and abnormal conditions can be uncovered by the analysis of transcriptomes. Mining big data such as the information encoded in nucleic acids, proteins, and metabolites has challenged researchers for several years now. Even though bioinformatics and system biology techniques have improved greatly and many improvements have been done in these fields of research, most of the processes that influence gene interaction over time are still unknown. In this study, we apply state-of-the art spiking neural network techniques to model, analyse and extract information about the regulatory processes of gene expression over time. A case study of microarray profiling in human skin during elicitation of eczema is used to examine the temporal association of genes involved in the inflammatory response, by means of a gene interaction network. Spiking neural network techniques are able to learn the interaction between genes using information encoded from the time-series gene expression data as spikes. The temporal interaction is learned, and the patterns of activity extracted and analysed with a gene interaction network. Results demonstrated that useful knowledge can be extracted from the data by using spiking neural network, unlocking some of the possible mechanisms involved in the regulatory process of gene expression.

Modelling and Analysis of Temporal Gene Expression Data Using Spiking Neural Networks

Spiking Neural Networks for Cancer Gene Expression Time Series Modelling and Analysis

Analysis, Classification and Marker Discovery of Gene Expression Data with Evolving Spiking Neural Networks

1 Introduction

In the era of big data, processing complex information that belongs to a high-dimensional space is a common challenge for the research community (Angelov and Kasabov 2005; Trevisan et al. 2010; Kelly et al. 2010; Angelov et al. 2010; Angelov and Yager 2013; Trevisan et al. 2014). Novel artificial intelligence techniques have been developed that can learn complex big data and produce accurate results. These techniques process the information available using the same mechanisms involved in memory formation and synaptic plasticity in the human brain and are called spiking neural network (SNN) techniques (Maass 1997; Gerstner 1995, 2001; Gerstner et al. 2012). They have already been applied to data modelling, pattern recognition and predictive systems of complex temporal or spatio-temporal data (e.g. Ghosh-Dastidar and Adeli 2007; Kasabov et al. 2016; Capecci et al. 2015; Espinosa-Ramos et al. 2017; Marks 2017) demonstrating to be an effective alternative to traditional machine learning techniques, and their application in the field of bioinformatics and system biology research is gaining interest rapidly.

Bioinformatics data, such as gene expression data, possess a high level of complexity in shape and size. This data consists of thousands of genes measured for a small number of samples. Problems of high-dimensional/scarce data sets, such as gene expression data, are numerous, and they are mainly connected with high variance. Besides, over-fitting becomes difficult to avoid, outliers are critical and noise in the data generates real issues. To tackle these problems, we need to consider both data augmentation and feature selection approaches. Data augmentation is used to address sample size planning and feature selection to identify a relevant subset of genes that carry the information needed to study a particular event.

The interest around gene expression data, beyond their modelling complications, arouses on their ability to induce protein synthesis, and their implication in several pathologies, when external and/or genetic factors alter their composition and behaviour (Kumar and Mann 2009; Pertea and Salzberg 2010; Ezkurdia et al. 2014). By identifying which genes are turned on in a particular cell, in which amount and when, can help us uncover the process behind the cell life, processes and understanding of how diseases work (Shen et al. 2012). In the era of system biology, the most popular techniques used to quantify transcriptome data are: DNA hybridization-based microarrays and next-generation sequencing (RNA-Seq) technologies (Panda et al. 2003; Wang et al. 2008). Even though RNA-seq technique has gained increasingly popularity for measuring transcriptome of organisms (Mortazavi et al. 2008), DNA microarrays are still widely used for targeting the identification of already known common allele variants, and public databases, such as the National Center for Biotechnology Information (NCBI, make this data available to the entire community. The transcriptome data provided can be considered representative of the entire community, as genome sequence similarity over individuals is really high (about 99.9%), with the same genes in the same location (Feuk et al. 2006). People are unique by phenotype; however, every human being shares the same genetic blueprint (Shen et al. 2012).

While the measurement and analysis of steady-state microarray data are routine, the analysis of time-series gene expression profiling is of growing interest, as they can shed some light on the complex relationships between genes and how they work together over time, to determine adaptive phenotype and transcription factors that are associated with certain diseases.

Here is where SNN techniques can play a pivotal role, as they are able to identify and process only relevant changes encoded in the raw data, and learn how to reliably tell these changes apart. By extracting meaningful patterns from raw time-series data, we aim at understanding the mechanisms involved in the regulatory expression of genes.

In our study, we used time-series microarray data measured while studying the elicitation of eczema in the human skin. After modelling the data to obtain a desirable sample size and a relevant subset of genes, we classify and analyse the data using an evolving SNN model. To evaluate the significance of the classification results, we compare them with traditional machine learning methods. In addition, we derive the interaction between genes over time, by analysing the intensity of the information transmitted between clusters of genes represented as a gene interaction network (GIN). This is a novel approach that could lead to biologically realistic models and to a better understanding of the phenomenon of study. More importantly, by applying this methodology, we aim at identifying the clinical implication behind the regulatory processes involved in the expression of certain genes over time.

The rest of the manuscript is organized as follows: Sect. 2 describes in detail the materials and methods used to carry out the study, whereas Sect. 3 examines experimental results. Finally, Sect. 4 closes with conclusions and future works.

2 A general description of the proposed methodology

The methodology is graphically represented in Fig. 1. In this section, we explain the general procedures of this methodology and in the following Sect. 3 we present the methodology in its computational and algorithmic details and demonstrate it on a case study example.

2.1 Dimensionality reduction

Gene expression profiling includes the expression of thousand of genes, however among those there are some that show minor variation in their profile. This implies that not all genes are differentially expressed over time. Since we are interested only in those that exhibit temporal interdependency, we should remove unregulated genes from the original data. This can be done by filtering the data that show low variance (Sebastiani et al. 2003; Kohane et al. 2002). Also, the dynamic phenomena behind temporal gene expression data can be derived by permutation entropy (Sun et al. 2010), as low Entropy characterises genes with small fluctuation across time points. Thus, genes that express low entropy with an absolute value lower than a certain percentile can be filtered as well (Kohane et al. 2002). This problem, coined as the curse of dimensionality (Verleysen and Franois 2005), is well known in machine learning. It provokes poor performance of algorithms, and due to their high complexity associated with data having a large number of dimensions/features, frequently makes the target function unnecessarily complex, and may lead to an overfitted model as long as often the data set lies on the lower dimensionality manifold.

2.2 Data augmentation

In order to properly train and test a classifier, the size of both training and testing samples need to be taken into account. Small sample size affects the ability of the model to find a solution of multi-class ill-posed problems.

Dealing with sample-size planning is a central part of the model design especially when statistically independent samples are limited and expensive to obtain (Beleites et al. 2013). This problem is major for high-throughput sequencing, where shape and size of gene expression data represent complex phenomena, such as the molecular mechanisms involved in gene expression and the sub sequentially synthesis of proteins, consisting of hundreds of genes measured for a small number of samples.

This issue has already been tackled using empirically fitted learning curves (Mukherjee et al. 2003) and parametric probability models (Dobbin and Simon 2006; Dobbin et al. 2008) to estimate the training set needed to classify gene expression data. Also, learning curves of trained data have been used for train and testing size planning by taking into account the confidence interval for an expected sensitivity (Beleites et al. 2013). The latter found that a reasonable precision in the validation of a classifier can be achieved with a size between 75–100 samples.

Sample size is therefore and important aspect of the experiments that needs to be considered. Simulated data can be produced based on the real data set using a neural network, such a simple autoencoder, to generate a compact representation of data from the empirical one. Autoencoder approaches have been demonstrated to be straightforward and more efficient when compared to other data augmentation techniques (Ferreira et al. 2017). Autoencoders are low level neural networks that consist of an encoder and a decoder. The input data are fed into an encoder, where parameterized non-linear transformations are used to convert the data into a new representation; then, the decoder reconstruct this data into the original input by non-linear transformations (DeVries and Taylor 2017). This makes autoencoders able to predict new data in a compact way (Møller 1993; Olshausen and Field 1997; The MathWorks Inc. 2018a).

2.3 Feature selection

Each microarray is a high-dimensional space that describes the expression of thousands of genes at the same time. The number of feature-space (the genes) directly influences the number of training data needed to ensure the correct functioning of the classifier. To solve this issue, we need to apply feature selection techniques to establish which genes plays a pivotal role in the regulation of the process being studied. This will in turn adequately describe the distribution of the data and avoid over-fitting of the model.

Features selection techniques for static bioinformatics data use either filters, wrappers or embedded methods combined with artificial neural network (ANN) or support vector machine (SVM) techniques to rank and score a set of features (McLachlan et al. 2005a; Causton et al. 2009; Roffo et al. 2017; Huang et al. 2018). Filter-based techniques rank features by extracting inherent properties from the data and they are independent from the classifier used. A popular filter algorithm is the Laplacian score (LS-FS) feature selection (He et al. 2006), which uses a nearest neighbour graph to evaluate the relevance of a feature according to its power of preserving locality. Recently, another graph-based feature selection algorithm, called the eigenvector centrality feature selection (EC-FS) (Roffo et al. 2017) has been proposed. Wrapper and embedded methods depend on the training of an SVM classifier to rank a set of features. A popular embedded method is feature selection via concave minimization (FSV) (Bradley and Mangasarian 1998).

With the increase of temporal bioinformatics data, other works have been proposed that attempted to engineering features taking the temporality of gene expression into consideration (Argyriou et al. 2007; Tomašev et al. 2015; Radovic et al. 2017). Radovic et al. (2017) developed the temporal minimum redundancy—maximum relevance (TMRMR) feature selection algorithm for temporal gene expression data. TMRMR takes into account the temporal component of the features while measuring their relevance and redundancy by means of a greedy search based on a mutual information quotient (MIQ) criterion (Tapia and Perez 2013). In Radovic et al. (2017), TMRMR is compared with popular approaches for static feature selection and the C variant of TMRMR algorithm (TMRMR-C) outperformed most the other methods. Additionally, it was already applied as a temporal feature selection approach for classifying transcriptome data using SNN Koefoed et al. (2018).

2.4 Evolving SNN model: the NeuCube architecture

To model and analyse the dynamic behaviour of a complex biological process, such as the interaction between thousands of genes over time, we have designed a novel methodology based on the NeuCube SNN architecture. NeuCube is a SNN computational modelling framework that constitutes a set of tools for spatio-temporal data modelling and understanding. NeuCube has been implemented in the Knowledge Engineering and Discovery Research Institute of the Auckland University of Technology (Kasabov 2012; Chen et al. 2013; Kasabov 2014; Tu et al. 2014; Kasabov et al. 2016).

The NeuCube general framework is comprised of a spike-time encoding module, to encode continuous value input information into spike trains; a recurrent 3D SNN module; and an evolving SNN as an output classification module (Kasabov 2014). When compared with traditional statistical and machine learning methods, the NeuCube has also demonstrated improvement of classification results (Kasabov and Capecci 2015); a better analysis of the spatial and temporal activity modelled in the network (Marks 2017); and better understanding of the data and the processes that are measured through it including the genetic processes that regulate the data (Capecci et al. 2015; Espinosa-Ramos et al. 2017).

3 A detailed presentation of the proposed methodology and a case study illustration

3.1 SNN architecture

Figure 1 shows a graphical representation of the novel SNN methodology designed for time-series gene expression data analysis.

This SNN system comprises four major modules:

Gene expression data modelling.
Input module, where the transcriptomics data is first modelled and then encoded into trains of spikes, each spike representing a change in the gene expression over time.
3-D SNN cube (SNNc) module (the Cube), where the encoded time-series data is entered and learned.
Output module, where data is classified and new knowledge discovered by means of a GIN.

In the following section, this procedure is explained in detail.

3.2 Transcriptomics data description

As a case study for our problem, gene expression data collected during elicitation of allergic contact dermatitis (ACD) over time has been used. Data was obtained from the publicly available Gene Expression Omnibus (GEO) repository of functional genomics data [NCBI GEO (Edgar et al. 2002; Barrett et al. 2012) accession GSE6281; (Pedersen et al. 2007)]. Data was collected from a control group, and a group of people that had shown an inflammatory response to a nickel patch test. Expression profiling was analysed by hybridization of high density oligonucleotide arrays obtained after skin biopsies of 7 nickel-allergic female subjects, and 5 non-allergic female controls. Biopsies were taken over four time periods: 0 h, 7 h, 48 h and 96 h. The control group did not show eczema at any time-point, while nickel allergic patients reacted with eczema at 48 and 96 h only. Samples were analysed using microarray technology that measured transcript levels in biopsies.

3.3 Gene expression data modelling

To extract the relevant number of genes that interact with each other over time and develop a computational model of GIN, we need to evaluate how relevant the original features of the model are, when using classification accuracy as an objective function to the problem. To achieve this, we filtered the original data first, to remove those genes with low variance and low Entropy. Secondly, we applied feature selection to both the static and the temporal component of the data.

3.3.1 Data filtering

Firstly, data was normalised to the $\log _2$ to measure proportional changes only of gene expression over time, as they are considered biologically more relevant (McLachlan et al. 2005a). Especially, $\log _2$ transformations are widely recommended in bioinformatics literature, as these translate to a binary representation of fold changes of microarray-detectable expression levels. Normalised data was filtered by low variance and Entropy lower than the 30th percentile. Figure 2 shows the expression profile variance before and after the filtered were applied to the original data.

3.3.2 Data augmentation

In our experiments, data was augmented twice using an autoencoder NN:

firstly, we increased and balanced the number of samples from the original data to achieve a reasonable precision in the classifier validation Beleites et al. (2013);
secondly, the time-point of data collection was augmented to be able to encode enough spikes to stimulate the SNN model.

Data was divided into four subsets, according to the time of collection (i.e. 0, 7, 48 and 96 of hours nickel exposure). An autoencoder, implemented in the Matlab’s Statistics and Machine Learning Toolbox (2018a), was used to augment and balance the data to obtain a size of 50 samples per class and per time point from the original 7–10 available. This means that we generated a total of 100 samples per time point by combining real and simulated data. This size is necessary to achieve a reasonable precision in the validation of the model (Beleites et al. 2013). The autoencoder was trained using a scaled conjugate gradient back-propagation.

The performance ability of the neural network was measured by calculating the mean squared error with L2 and sparsity regularizers to control that the minimum gradient was reached by the model at a given epoch (Møller 1993; Olshausen and Field 1997).

Figure 3 shows an example of the application of the autoencoder method to generate synthetic data to increase the number of samples.

The temporal component of the data was augmented from the original four data points (0 h, 7 h, 48 h and 96 h) up to 20 time-points by using an autoencoder, as explained in the previous section (Fig. 4).

3.3.3 Feature selection

Features were also selected, at two different stages of the experimental process:

firstly, a combination of three popular feature selection methods [LS-FS (He et al. 2006), EC-FS (Roffo et al. 2017) and FSV (Bradley and Mangasarian 1998) algorithms] was used to subtract the most relevant features out of the static component of the data;
secondly, a recently proposed temporal features selection technique [the TMRMR-C approach )Radovic et al. 2017)] was used to select the sub-set of genes responsible for the temporal regulation of the molecular mechanisms involved in ACD.

Popular feature selection approaches based on filters, wrappers, and embedded methods were applied to the static component of the augmented data, which was originally divided into four subsets, according to the time of collection (i.e. 0, 7, 48 and 96 of hours nickel exposure).

Experiments were carried out by using the publicly available feature selection code library (FSLib) (Roffo 2018; Roffo et al. 2015, 2017). For each subset, we reduced the number of features from 344,34 to 200 by applying the LS-FS, EC-FS and FSV algorithms. In the original paper (Roffo et al. 2017) it is reported that, when applied to microarray data, these methods generated the best accuracy results with this number of features. The quality of each subset of features was estimated by means of classification accuracy of a Linear-SVMs classifier trained with 80% of randomly partitioned data and tested on the remaining 20%. The first 200 highest ranked features by LS-FS, EC-FS and FSV approaches were intersected to assess the validity of the selected genes. These three approaches were only used, as they scored the highest accuracy for all four subsets of data (i.e. time 0–96 h). A total of 151 unique features resulted to intersect across the subsets. Finally, data was sorted according to the time of collection to study the regulation of gene expression patterns over time. As it has been found that about 30–50 genes are the optimum number for studying a molecular process (Beer et al. 2002) and many authors worked out that the accuracy of a classifier improved around these values and, similarly as we aim to achieve, it was used as a reference to draw network graphs of features interacting with each other (Radovic et al. 2017; Roffo et al. 2017; Koefoed et al. 2018). We applied the TMRMR-C approach to identify the features that out of the selected not only influenced the gene expression at every occurrence of data collection, but also the genes that are temporally interdependent. Once the temporal component was augmented, the applied TMRMR-C method resulted in the selection of 45 genes across the 151 available. These genes were: MICU3, RPGRIP1L, PLGLB1, ZEB1, MGP, SCN3A, GNG11, SLITRK6, PAK3, AW590588, TCEAL7, NCOA2, ANKRD29, MMRN1, LOC101930404, PDE4DIP, EIF4E3, COL28A1, DSEL, ENPP5, PWAR6, LRRC2, MPPED2, CNTN4, AI222435, LOC653602, PDCD4, WISP1, CLIC5, ABCC9, FMO3, TMEM255A, SNX16, EBF1, RFTN2, C1QTNF7, SUSD5, EPHA3, FIGN, AU156772, RBMS3, LEPROT, BF224430, LOC100507376 and SIM1.

The subsequent experiments were performed on the final data set of $20\times 45\times 100$ (time-points/features/samples).

3.4 Encoding and mapping of data into the SNN cube for unsupervised learning

3.4.1 Encoding algorithm

The gene expression data was first sorted as a sequence of vectors, which was fed to the SNN cube encoding module. Every vector was converted into spike trains by applying a binarization algorithm. Several algorithms can be used depending on the data and the scope of the study. In this case, we have chosen the Bens Spiker Algorithm (BSA) (Schrauwen and Van Campenhout 2003; Nuntalid et al. 2011), which produces unipolar spike trains with two states (firing or non-firing). BSA encodes only positive spikes to emulate excitatory processes observed in glutamatergic synapses. This algorithm applies a finite reconstruction filter (FIR) and for every time $\tau$ calculates two error metrics as follows:

$$\begin{aligned} \sum _{k=0}^{M} abs(s(k+\tau )-h(k))\quad and\quad \sum _{k=0}^{M} (s(k+ \tau )). \end{aligned}$$

(1)

If the first error is smaller than the second minus a threshold, then fire and subtract the filter from the input, otherwise there is no spike. BSA also depends from a threshold, which needs to be optimised (as explained in the Sect. 3.5.2). Figure 5 shows an example of data encoded into spikes by using BSA binarization procedure.

3.4.2 Spiking neurons mapping

The encoded spike trains are fed into a cube of e.g. $10\times 10\times 10$, leaky integrate-and-fire (LIF) neurons. Input features are mapped as proposed in Tu et al. (2014, 2017), as this method optimises the mapping of the 45 input variables into the SNN cube using a graph matching algorithm. The graph matching algorithm was implemented by constructing two weighted graphs (i.e. graphs with branches/edges that have numerical weigh assigned) as follows:

the neuron similarity graph (NSG), where the coordinates of the input neurons are calculated as $V_{NSG} = {(x_i, y_i, z_i)|i=1,\ldots ,v}$ and connected to their k neighbour input neurons by the multiplicative inverse of the Euclidean distance determined by their three coordinates (x, y, z).
the time-series similarity graph (SSG), where the set of spike trains of all features is $V_{SSG} = {(s_i)|i=1,\ldots ,v}$ with
$$\begin{aligned} s_i = \sum _{k=1}^{N_i} \delta (t-t_k) \end{aligned}$$
(2)
where s is the temporal sample, v temporal variable, t represents the observed time length of each sample and $\delta$ is the Dirac delta function. In this graph, every spike density function is connected to the k neuron with the highest correlated neuron and the graph branches are weighted according to their similarity. The similarity between spike trains is measured by either using spike density correlation or maximum spike coincidence.
After constructing the two NSG and SSG weighted graphs, the graph matching technique is used to determine the optimal mapping between them. The vertex and edge similarity are defined as:
$$\begin{aligned}&exp(-|\widetilde{d}(i_{NSG})-\widetilde{c}(i_{SSG})|^2/2\sigma ^2_n); \nonumber \\&\quad i_{NSG},i_{SSG} =1,\ldots ,v \end{aligned}$$
(3)
where $i_{NSG}$ and $i_{SSG}$ represents the graph edge weight vertices of the NSG and SSG graphs respectively, while $\widetilde{d}(i_{NSG})$ and $\widetilde{c}(i_{SSG})$ represent the weight to all other vertices.
$$\begin{aligned}&exp(-|a^{NSG}_{ij}-a^{SSG}_{kl}|^2/2\sigma ^2_e);\nonumber \\&\quad i,j,k,l =1,\ldots ,v \end{aligned}$$
(4)
where $a^{NSG}_{ij}$ and $a^{SSG}_{kl}$ represent the graph edge weights for both NSG and SSG, $\sigma _n$ and $\sigma _e$ are parameters that control the affinity between neurons and edges, respectively.
The factor graph matching (FGM) algorithm Zhou and De la Torre (2012) is used to find the permutation matrix P that minimizes the following objective function:
$$\begin{aligned} ^{min}_{P} || A_n - P A_s P^T || ^{2}_{F} \end{aligned}$$
(5)
where $A_n$ and $A_s$ are the adjacent matrix to the NSG and SSG graphs and $||*||_F$ denotes de Frobenius matrix norm.

Optimising this process improved the results greatly, as the mapping influenced the learning, and therefore the understanding of the time-series patterns created by the data. The spiking LIF neurons of the Cube are connected using a small-world (SW) connectivity rule. These connections are computed as the multiplication of a random number $[- \ 0.1,+ \ 0.1]$ and the reciprocal of the Euclidean distance d(i, j) between pre- (i) and post- (j) synaptic neurons (calculated according to their coordinates. Another parameter, the distance threshold $D_{thr}$, affects the d(i, j). This parameter is calculated as following:

$$\begin{aligned} D_{thr} = max~(d(i,j))~p \end{aligned}$$

(6)

where p is the parameter of the SW connectivity. Twenty percent of these weights are selected to be negative, as in the mammalian brain the number of inhibitory neurons is found to be about 20–30% (Capano et al. 2015), while 80% are positive/excitatory connections weights.

Finding the appropriate initial structural connectivity in a network of spiking neurons is important, as it allows the SNN model to accurately learn from the data and capture patterns of functional connectivity from it. This preserves the spatio-temporal relationships within the data, which is a significant source of information generally overlooked by other techniques. Unsupervised learning in the Cube is performed to modify the initially set connection weights. The SNNc learns to activate same groups of spiking neurons when similar input stimuli are presented, also known as a polychronization effect (Izhikevich 2006). Hebbian learning rules (Hebb 1949) are applied at this stage, implementing spike-timing-dependent plasticity (STDP)-like algorithms (Song et al. 2000).

3.4.3 Unsupervised learning

The STDP protocol (Hebbian-like rule) is used in the Cube for unsupervised learning. This protocol describes the connection between pre- and post- synaptic neurons as stronger as their activation lasts and repeats in time. This is computed as following:

$$\begin{aligned} w_j(t) = \left\{ \begin{array}{l l} w_j(t-1) \pm \alpha /\varDelta t &{} \quad {\text {t}}_{\text {j}} \ne {\text {t}}_{\text {i}}\\ w_j(t-1) &{} \quad {\text {t}}_{\text {j}} = {\text {t}}_{\text {i}} \end{array} \right. \end{aligned}$$

(7)

where $\alpha$ is the rate of the STDP learning protocol, $\varDelta _t$ indicates the time since last spike was emitted by the post-synaptic neuron j. If a pre-synaptic neuron i fires before a post-synaptic neuron j then, its connection weight $w_{j,i}$ increases, otherwise, it decreases. The STDP protocol defines the connections between neurons as robust as their activation continues and repeats in time. Neurons develop the ability to form new connections in the network that can then be studied. From data towards knowledge, the proposed SNN system can form an associative type of memory that can be analysed. This cannot be achieved with purely statistical or mathematical techniques.

3.5 Data classification with supervised learning

3.5.1 Supervised learning

During supervised training, the same data used for the unsupervised learning is transmitted again in the trained cube, and output neurons are formed and trained to classify the data. To learn and classify the spiking activity of the Cube, we use the dynamic evolving spiking neural network (deSNN) classifier, which allows simple class-based discrimination (Kasabov et al. 2013). The spiking activity generated throughout the learning process can be visualised and used as a biofeedback. In the SNN system designed, supervised learning is performed by using the deSNN classifier proposed in Kasabov et al. (2013). This method associates the rank-order (RO) learning rule (Thorpe and Gautrais 1998) and the STDP protocol. In other words, after training each sample is associated to an output neuron that is connected to each and every other neuron of the Cube. First of all, the connection weights $w_{i,j}$ between the input neuron i and the output neuron j are all set equal to zero. Then, they are computed according to the RO learning rule:

$$\begin{aligned} w_{i,j} = mod^{order(i,j)} \end{aligned}$$

(8)

where mod indicates a modulation factor and order(i, j) the order of the first incoming spike. Then, the subsequent connection weights will be modified as per the spike driven synaptic plasticity (SDSP) learning rule. Implemented as following:

$$\begin{aligned} w_{i,j}(t)=\left\{ \begin{array}{l l} w_{i,j}(t-1)+drift &{} {\text {S}}_{\text {j}}({\text {t}})=1 \\ w_{i,j}(t-1)-drift &{} {\text {S}}_{\text {j}}({\text {t}})=0 \end{array} \right. \end{aligned}$$

(9)

where drift indicates a parameter used to modify the weights and $S_i(t)$ is the occurrence of the spikes arriving from neuron i at a time t after the first one was emitted.

According to the literature, both mod and the drift parameter values need to be optimised. In the next section, we report the procedure followed to optimise these and the other parameters of the NeuCube system.

3.5.2 Classification

To investigate whether the system was able to correctly discriminates the two different groups of people, we classified the data into two classes: class one - patients that shown ACD—and class two—control. In total, we had 100 samples, 50 for each class, 20 time steps of data collection and 45 features/genes. Optimisation of the numerous parameters of the SNN system was performed via a grid search method that appraised the combinations between a minimum and maximum range of values randomly assigned to the each parameter to be optimised. The combination with the highest classification accuracy is then selected. This process was execute as a loop, which generated each time a random new cube evaluated using fivefold cross validation method.

Having set the parameter values to use in the experiments, the classification accuracy of the model was assessed by running experiments with randomly selected data with 75% training versus 25 testing. Optimised parameter values and the results obtained with them are reported in Table 1.

Table 1 Classification accuracy percentage obtained with the SNN model

Full size table

These results evince the ability of SNN system to discriminate between the two classes, which is a good indication that the proposed method can be used for classifying the gene expression time-series data. To validate these results, we run the same experiments using a classical machine learning method for gene data analysis: a SVM classifier.

SVM classifier is another popular method for classification of gene expression data that has been widely proposed in the literature (McLachlan et al. 2005b; Wit and McClure 2004; Causton et al. 2009). It is based on finding a hyperplane that best divides dataset into two classes. It is accurate when classifying small, clean datasets. It is not the best suited for larger and noisier data sets which biological data can be.

Here, we have utilised the error-correcting output codes (ECOC) model (The MathWorks Inc. 2018b) with one-versus-one coding design for a multiclass SVM classifiers, as this model is able to improve classification accuracy, when compared to other multiclass models (Fürnkranz 2002). Parameters of this model have been optimised by using automatic hyperparameter optimization (Fürnkranz 2002; The MathWorks Inc. 2018b), which minimise fivefold cross-validation loss. The optimised SVM classified the data with the same accuracy of the SNN model utilised (99.9% correctly classified observations). This demonstrated the validity of our experiments. However, in addition to that, the proposed methodology offers a better understanding of the data and the processes that are measured through network analysis, which is not possible with other methods, such as SVM. This enables new information and knowledge discovery through meaningful interpretation of the models and the processes that regulate gene expression over time by means of GIN analysis.

3.6 Analysis of the results and knowledge discovery

The learned spiking activity and connectivity generated by the time-series gene expression data can be analysed by means of a GIN. The input genes that are mapped in the cube are used as a source of information to define the centre of a cluster of neurons. Each one of the remaining 1000 unlabelled neurons is assigned to one of these clusters according with the amount of information exchanged during unsupervised learning with each features. The more spikes transmitted between clusters, the greater is the spatio-temporal interaction between genes over time. The obtained GIN represents a connectivity graph, where nodes represent gene variables, and the lines and their thickness define the amount of temporal information exchanged between genes. In other words, the interaction between input variables is captured in the GIN in terms of their changes in time. The information shown in the GIN is used to analyse the complex temporal patterns “hidden” in data, which can be used for the development of new methodologies of gene expression data modelling and understanding.

3.6.1 Information clustering for knowledge discovery

As detailed in Tu et al. (2014, 2017) and extended in Doborjeh et al. (2017), as the input features are the source of information in the SNN cube, each one of these are used to define the centre, number and label of clusters, where the other unlabelled neurons of the Cube are assigned by computing the following:

Firstly, according to the network theory (Chung 1997), we compute the normalised adjacent graph G by constructing vertices of input neurons V and edges E weighted by either the spiking activity $s_{ij}$ or the change in connection weights $w_{ij}$ between pre-synaptic i and post-synaptic neurons j of the Cube, depending on which information transmission we want to cluster. Stronger $E_{ij}$ represents stronger interaction and more information exchanged between neurons.
Secondly, we compute the probability $P_v$ that a random neighbour of a vertex $v \in V$ of the graph G is chosen after n steps of a random walk. The initial distribution $p^0_v$ of an input neuron’s random walk with initial vertex $v_i$ is defined as:
$$\begin{aligned} p^0_v = \left\{ \begin{array}{l l} 1 &{} if v = v_i\\ 0 &{} otherwise \end{array} \right. \end{aligned}$$
(10)
and subsequently the probability is calculated as:
$$\begin{aligned} \forall v \in V : P^{n+1}_v =\sum _{(u,v)\in E} \frac{E_{uv}}{d(u)} P^n_v \end{aligned}$$
(11)
where d(u) is the weighted degree of vertex u (i.e. sum of weights of neighbour edges). $P^n_v$ is calculated until it converges $P^{n+1}_v = P^n_v$, where every $p_v$ will belong to an input vertex and each $v \in V$ to a starting vertex with the highest $p_v$ membership.

3.6.2 GIN discovery and analysis

After the optimisation procedure, we retained the best SNN model parameters, the initial connectivity and temporal mapping of the input features. Network analysis was carried out by training the entire time series data available for each class separately. This process was iterated 100 times per class, to allow the system to learn from the short-time series data. First, the input features were clustered with other neurons of the cube according to their connection weight value. Consequently, the neural activity generated per class revealed 45 clusters of genes (Fig. 6, left). The proportion of activity generated in the entire 3-D cube is illustrated in the pie chart (Fig. 6, right).

As shown in Fig. 6, there are three/four major clusters obtained for each class. The major component for class one is given by the cluster of gene BF224430 (13% of the total spiking activity generated during learning); followed by OC100507376 and WISP1 (6%).

On the other hand, the major component for class two is given by the cluster of gene COL28A1 (10% of the total spiking activity generated during learning); followed by ENPP5 (7%), MGP and AU156772 (both with 6%). Information about these genes is summarised in Table 2.

Table 2 The table reports information about relevant genes that shows higher spiking activity after learning

Full size table

The total interaction between the 45 clusters was calculated per class and visualised as GIN (Fig. 7, left class one, right class two). Each cluster of genes is indicated by a node with a different colour. The stronger the interaction between clusters, the thicker is the edge that connects them. These edges are undirected and they have a weight (a numerical value) associated with them. In this case, this value represents the spiking activity exchanged between clusters of genes. This is used to depict the quantitative change in temporal expression that a gene induces over another and/or how closely related two genes are in terms of expression stability and regulation.

The GINs are the result of an adjacency matrix, where rows and columns are assigned to the nodes of a network (the clusters formed by the selected genes) and the presence of an edge is symbolised by a numerical value. The weights or thickness of the edges express more complex relationships between clusters of genes, such as the presence or absence of spiking activity between them and the amount of activity exchanged (thickness of the edges). This corresponds in turn to the temporal interaction or regulation that the clusters of genes have produced over time.

This study uses the information available from transcriptome data to classify and analyse the interaction of different genes over time as a GIN. For our experimental case study, the raw gene expression data was first modelled and encoded into trains of spikes. Then, patterns of temporal activity generated during learning were classified, to establish the effectiveness of the SNN methodology to separate different time-series gene expression data into different classes. Then, the patterns of temporal activity generated in the Cube after learning were analysed in terms of spiking activity and connection weight changes and new information was extracted as a GIN to study the interaction of genes expressed over time. The two GINs obtained for each class show significance differences between subjects that are sensitised to nickel versus control subjects. in comparison with GIN of class one, the GIN of class two shows higher spiking activity exchanged between clusters. One of the main actors is gene PDC4, which for class one is strongly connected with LOC100507376, RPGRIP1L and SLITRK6; while for class 2 is strongly interconnected with SUSD5. This gene is especially involved in inhibiting events important in driving invasion and may play a role in apoptosis, which is consistent with the skin cells reaction to the nickel.

4 Conclusions and future work

The analysis of patterns generated by time-series gene expression data constitutes a major goal for the area of bioinformatics and system biology. In this study, we have modelled short-time transcriptome data and revealed meaningful patterns and new knowledge from the genes expressed over time by means of a SNN methodology and a GIN. For the case study data, we have found that the exchange of temporal activity has been dominated by BF224430 and COL28A1 clusters. We have also found that there is PDC4 gene strongly interact with several other genes, especially for class one GIN. According to the information available regarding the nature of these and the other genes, it seems plausible that patients affected by ACD express genes that are revealed in skin and that these genes are also related to cell proliferation, cellular transmission and inhibition, immune response and apoptosis. All of this, appear to be the consequence of the assimilation of nickel in the body and the increased of eczematous reactions over time. We can conclude that, SNNs is a remarkable method of choice when dealing with big transcriptomics data, as it extracts only relevant patterns of temporal activity from the gene expression data. Moreover, the clinical focus of this study has demonstrated the model ability to constitute a valuable tool that can support experts in the area of bioinformatics and system biology, and in understanding and studying the interaction between genes expressed over time. As future work, we would like to highlight the following research lines:

Implementation of different feature selection methods for temporal gene expression data, such as the one proposed in Radovic et al. (2017).
Studying the possible implication of directionality between temporal interaction of genes in the network.
Comparative analysis of the results obtained with the proposed method with the ones obtained by Pedersen et al. (2007) on the same data set.
Applying the proposed model with the neuroreceptor dependent plasticity learning rule proposed in Espinosa-Ramos et al. (2017).
Developing an automated pipe line of the whole data preprocessing procedure, as an embedded platform to the main model.

References

Angelov P, Filev DP, Kasabov N (2010) Evolving intelligent systems: methodology and applications, vol 12. Wiley, New York
Google Scholar
Angelov P, Kasabov N (2005) Evolving computational intelligence systems. In: Proceedings of the 1st international workshop on genetic fuzzy systems, pp 76–82
Angelov P, Yager R (2013) Density-based averaging-a new operator for data fusion. Inf Sci 222:163–174
MathSciNet MATH Google Scholar
Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp 41–48
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M et al (2012) Ncbi geo: archive for functional genomics data sets—update. Nucleic Acids Res 41(D1):D991–D995
Google Scholar
Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG et al (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8(8):816
Google Scholar
Beleites C, Neugebauer U, Bocklitz T, Krafft C, Popp J (2013) Sample size planning for classification models. Anal Chim Acta 760:25–33
Google Scholar
Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: ICML, vol 98, pp 82–90
Capano V, Herrmann HJ, de Arcangelis L (2015) Optimal percentage of inhibitory synapses in multi-task learning. Sci Rep 5:9895
Google Scholar
Capecci E, Kasabov N, Wang GY (2015) Analysis of connectivity in neucube spiking neural network models trained on eeg data for the understanding of functional changes in the brain: A case study on opiate dependence treatment. Neural Netw 68:62–77
Google Scholar
Causton H, Quackenbush J, Brazma A (2009) Microarray gene expression data analysis: a beginner’s guide. Wiley, New York
Google Scholar
Chen Y, Hu J, Kasabov N, Hou ZG, Cheng L (2013) Neucuberehab: a pilot study for eeg classification in rehabilitation practice based on spiking neural networks. Neural Inf Process 8228:70–77
Google Scholar
Chung FR (1997) Spectral graph theory, vol 92. American Mathematical Society, Providence
MATH Google Scholar
DeVries T, Taylor GW (2017) Dataset augmentation in feature space. In: International conference on learning representations. arXiv preprint. arXiv:1702.05538
Dobbin KK, Simon RM (2006) Sample size planning for developing classifiers using high-dimensional dna microarray data. Biostatistics 8(1):101–117
MATH Google Scholar
Dobbin KK, Zhao Y, Simon RM (2008) How large a training set is needed to develop a classifier for microarray data? Clin Cancer Res 14(1):108–114
Google Scholar
Doborjeh MG, Kasabov N, Doborjeh ZG (2017) Evolving, dynamic clustering of spatio/spectro-temporal data in 3d spiking neural network models and a case study on EEG data. In: Evolving systems, pp 1–17
Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210
Google Scholar
Espinosa-Ramos JI, Capecci E, Kasabov N (2017) A computational model of neuroreceptor dependent plasticity (NRDP) based on spiking neural networks. In: IEEE transactions on cognitive and developmental systems. https://doi.org/10.1109/TCDS.2017.2776863
Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, Vazquez J, Valencia A, Tress ML (2014) Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum Mol Genet 23(22):5866–5878
Google Scholar
Ferreira J, Ferro M, Fernandes B, Valenca M, Bastos-Filho C, Barros P (2017) Extreme learning machine autoencoder for data augmentation. In: 2017 IEEE Latin American conference on computational intelligence (LA-CCI). IEEE, pp 1–6
Feuk L, Carson AR, Scherer SW (2006) Structural variation in the human genome. Nat Rev Genet 7(2):85–97
Google Scholar
Fürnkranz J (2002) Round robin classification. J Mach Learn Res 2(Mar):721–747
MathSciNet MATH Google Scholar
Gerstner W (1995) Time structure of the activity in neural network models. Phys Rev E 51(1):738
MathSciNet Google Scholar
Gerstner W (2001) Plausible neural networks for biological modelling, what’s different with spiking neurons?. Kluwer Academic Publishers, Dordrecht
Google Scholar
Gerstner W, Sprekeler H, Deco G (2012) Theory and simulation in neuroscience. Science 338(6103):60–65
Google Scholar
Ghosh-Dastidar S, Adeli H (2007) Improved spiking neural networks for eeg classification and epilepsy and seizure detection. Integr Comput Aided Eng 14(3):187–212
Google Scholar
He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems, pp 507–514
Hebb DO (1949) The organization of behavior: a neuropsychological approach. Wiley, New York
Google Scholar
Huang S, Cai N, Pacheco PP, Narandes S, Wang Y, Xu W (2018) Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genom Proteom 15(1):41–51
Google Scholar
Izhikevich EM (2006) Polychronization: computation with spikes. Neural Comput 18(2):245–282
MathSciNet MATH Google Scholar
Kasabov N (2012) Neucube evospike architecture for spatio-temporal modelling and pattern recognition of brain signals. In: Mana N, Schwenker F, Trentin E (eds) Artificial neural networks in pattern recognition, lecture notes in computer science, vol 7477. Springer, Berlin, pp 225–243
Google Scholar
Kasabov N (2014) Neucube: a spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data. Neural Netw 52:62–76
Google Scholar
Kasabov N, Capecci E (2015) Spiking neural network methodology for modelling, recognition and understanding of EEG spatio-temporal data measuring cognitive processes during mental tasks. Inf Sci 294:565–575
Google Scholar
Kasabov N, Dhoble K, Nuntalid N, Indiveri G (2013) Dynamic evolving spiking neural networks for on-line spatio- and spectro-temporal pattern recognition. Neural Netw 41:188–201
Google Scholar
Kasabov N, Scott NM, Tu E, Marks S, Sengupta N, Capecci E, Othman M, Doborjeh MG, Murli N, Hartono R et al (2016) Evolving spatio-temporal data machines based on the neucube neuromorphic framework: design methodology and selected applications. Neural Netw 78:1–14
MATH Google Scholar
Kelly JG, Angelov PP, Trevisan J, Vlachopoulou A, Paraskevaidis E, Martin-Hirsch PL, Martin FL (2010) Robust classification of low-grade cervical cytology following analysis with ATR–FTIR spectroscopy and subsequent application of self-learning classifier eclass. Anal Bioanal Chem 398(5):2191–2201
Google Scholar
Koefoed L, Capecci E, Kasabov N (2018) Analysis of gene expression time series data of ebola vaccine response using the neucube and temporal feature selection. In: 2018 international joint conference on neural networks (IJCNN). IEEE, pp 1–7. https://doi.org/10.1109/IJCNN.2018.8489634
Kohane IS, Butte AJ, Kho A (2002) Microarrays for an integrative genomics. MIT Press, Cambridge
Google Scholar
Kumar C, Mann M (2009) Bioinformatics analysis of mass spectrometry-based proteomics data sets. FEBS Lett 583(11):1703–1712
Google Scholar
Maass W (1997) Networks of spiking neurons: the third generation of neural network models. Neural Netw 10(9):1659–1671
Google Scholar
Marks S (2017) Immersive visualisation of 3-dimensional spiking neural networks. Evol Syst 8(3):193–201. https://doi.org/10.1007/s12530-016-9170-8
Article Google Scholar
McLachlan G, Do KA, Ambroise C (2005a) Analyzing microarray gene expression data, vol 422. Wiley, New York
MATH Google Scholar
McLachlan GJ, Do KA, Ambroise C (2005b) Analyzing microarray gene expression data. Wiley series in probability and statistics. Wiley, New York. https://doi.org/10.1002/047172842X
Book Google Scholar
Møller MF (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6(4):525–533
Google Scholar
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods 5(7):621–628
Google Scholar
Mukherjee S, Tamayo P, Rogers S, Rifkin R, Engle A, Campbell C, Golub TR, Mesirov JP (2003) Estimating dataset size requirements for classifying DNA microarray data. J Comput Biol 10(2):119–142
Google Scholar
Nuntalid N, Dhoble K, Kasabov N (2011) EEG classification with BSA spike encoding algorithm and evolving probabilistic spiking neural network. In: Lu BL, Zhang L, Kwok J (eds) Neural information processing, lecture notes in computer science, vol 7062. Springer, Berlin, pp 451–460
Google Scholar
Olshausen BA, Field DJ (1997) Sparse coding with an overcomplete basis set: a strategy employed by v1? Vis Res 37(23):3311–3325
Google Scholar
Panda S, Sato TK, Hampton GM, Hogenesch JB (2003) An array of insights: application of dna chip technology in the study of cell biology. Trends Cell Biol 13(3):151–156
Google Scholar
Pedersen MB, Skov L, Menné T, Johansen JD, Olsen J (2007) Gene expression time course in the human skin during elicitation of allergic contact dermatitis. J Investig Dermatol 127(11):2585–2595
Google Scholar
Pertea M, Salzberg SL (2010) Between a chicken and a grape: estimating the number of human genes. Genome Biol 11(5):206. https://doi.org/10.1186/gb-2010-11-5-206
Article Google Scholar
The MathWorks Inc. (2018a) Statistics and machine learning toolbox: user’s guide release 2012b. https://au.mathworks.com/help/nnet/ref/trainautoencoder.html. Accessed June 2018
The MathWorks Inc. (2018b) Statistics and machine learning toolbox: user’s guide release 2012b. https://au.mathworks.com/help/stats/fitcecoc.html. Accessed June 2018
Radovic M, Ghalwash M, Filipovic N, Obradovic Z (2017) Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinform 18(1):9
Google Scholar
Roffo G (2018) Feature selection library [version 6.0.2018]. https://au.mathworks.com/matlabcentral/fileexchange/56937-feature-selection-library. Accessed June 2018
Roffo G, Melzi S, Castellani U, Vinciarelli A (2017) Infinite latent feature selection: a probabilistic latent graph-based ranking approach. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1398–1406
Roffo G, Melzi S, Cristani M (2015) Infinite feature selection. In: 2015 IEEE international conference on computer vision (ICCV), pp 4202–4210. https://doi.org/10.1109/ICCV.2015.478
Schrauwen B, Van Campenhout J (2003) BSA, a fast and accurate spike train encoding scheme. In: Proceedings of the international joint conference on neural networks, vol 4. IEEE, Piscataway, pp 2825–2830
Sebastiani P, Gussoni E, Kohane IS, Ramoni MF (2003) Statistical challenges in functional genomics. Stat Sci 18(1):33–70. https://doi.org/10.1214/ss/1056397486
Article MathSciNet MATH Google Scholar
Shen EH, Overly CC, Jones AR (2012) The allen human brain atlas: comprehensive gene expression mapping of the human brain. Trends Neurosci 35(12):711–714
Google Scholar
Song S, Miller KD, Abbott LF (2000) Competitive Hebbian learning through spike-timing-dependent synaptic plasticity. Nat Neurosci 3(9):919–926. https://doi.org/10.1038/78829
Article Google Scholar
Sun X, Zou Y, Nikiforova V, Kurths J, Walther D (2010) The complexity of gene expression dynamics revealed by permutation entropy. BMC Bioinform 11(1):607. https://doi.org/10.1186/1471-2105-11-607
Article Google Scholar
Tapia JE, Perez CA (2013) Gender classification based on fusion of different spatial scale features selected by mutual information from histogram of LBP, intensity, and shape. IEEE Trans Inf Forens Secur 8(3):488–499
Google Scholar
Thorpe SJ, Gautrais J (1998) Rank order coding. In: Computational neuroscience. Springer, pp 113–118
Tomašev N, Buza K, Marussy K, Kis PB (2015) Hubness-aware classification, instance selection and feature construction: survey and extensions to time-series. In: Feature selection for data and pattern recognition. Springer, pp 231–262
Trevisan J, Angelov PP, Patel II, Najand GM, Cheung KT, Llabjani V, Pollock HM, Bruce SW, Pant K, Carmichael PL et al (2010) Syrian hamster embryo (she) assay (ph 6.7) coupled with infrared spectroscopy and chemometrics towards toxicological assessment. Analyst 135(12):3266–3272
Google Scholar
Trevisan J, Park J, Angelov PP, Ahmadzai AA, Gajjar K, Scott AD, Carmichael PL, Martin FL (2014) Measuring similarity and improving stability in biomarker identification methods applied to fourier-transform infrared (FTIR) spectroscopy. J Niophoton 7(3–4):254–265
Google Scholar
Tu E, Cao L, Yang J, Kasabov N (2014) A novel graph-based k-means for nonlinear manifold clustering and representative selection. Neurocomputing 143:109–122
Google Scholar
Tu E, Kasabov N, Othman M, Li Y, Worner S, Yang J, Jia Z (2014) Neucube(st) for spatio-temporal data predictive modelling with a case study on ecological data. In: 2014 international joint conference on neural networks (IJCNN), pp 638–645. https://doi.org/10.1109/IJCNN.2014.6889717
Tu E, Kasabov N, Yang J (2017) Mapping temporal variables into the neucube for improved pattern recognition, predictive modeling, and understanding of stream data. IEEE Trans Neural Netw Learn Syst 28(6):1305–1317
MathSciNet Google Scholar
Verleysen M, François D (2005) The curse of dimensionality in data mining and time series prediction. In: International work-conference on artificial neural networks. Springer, pp 758–770
Wang X, Wu M, Li Z, Chan C (2008) Short time-series microarray analysis: methods and challenges. BMC Syst Biol 2(1):58
Google Scholar
Wit E, McClure J (2004) Statistics for microarrays: design, analysis and inference. Wiley, New York. https://doi.org/10.1002/0470011084
Book MATH Google Scholar
Zhou F, De la Torre F (2012) Factorized graph matching. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 127–134

Download references

Acknowledgements

Several people have contributed to the research that resulted in this work, especially: Dr. Y. Chen, Dr. J. Hu, L. Zhou, Dr. E. Tu and Maryam Gholami-Doborjeh. Many thanks to Caitlin Veale and Kate Steckmest for proofreading the manuscript. Funding was provided by Auckland University of Technology, New Zealand (SRIF INTERACT 2017-18).

Author information

Authors and Affiliations

Knowledge Engineering and Discovery Research Institute of the Auckland University of Technology, AUT Tower, cnr Rutland and Wakefield Street, Auckland, New Zealand
Elisa Capecci, Josafath I. Espinosa-Ramos & Nikola Kasabov
Tecnalia Research and Innovation, P. Tecnologico Bizkaia, Ed. 700, 48160, Derio, Spain
Jesus L. Lobo & Ibai Laña

Authors

Elisa Capecci
View author publications
You can also search for this author in PubMed Google Scholar
Jesus L. Lobo
View author publications
You can also search for this author in PubMed Google Scholar
Ibai Laña
View author publications
You can also search for this author in PubMed Google Scholar
Josafath I. Espinosa-Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Nikola Kasabov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elisa Capecci.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Capecci, E., Lobo, J.L., Laña, I. et al. Modelling gene interaction networks from time-series gene expression data using evolving spiking neural networks. Evolving Systems 11, 599–613 (2020). https://doi.org/10.1007/s12530-019-09269-6

Download citation

Received: 28 June 2018
Accepted: 10 January 2019
Published: 25 February 2019
Issue Date: December 2020
DOI: https://doi.org/10.1007/s12530-019-09269-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Modelling gene interaction networks from time-series gene expression data using evolving spiking neural networks

Abstract

Similar content being viewed by others

Modelling and Analysis of Temporal Gene Expression Data Using Spiking Neural Networks

Spiking Neural Networks for Cancer Gene Expression Time Series Modelling and Analysis

Analysis, Classification and Marker Discovery of Gene Expression Data with Evolving Spiking Neural Networks

1 Introduction

2 A general description of the proposed methodology

2.1 Dimensionality reduction

2.2 Data augmentation

2.3 Feature selection

2.4 Evolving SNN model: the NeuCube architecture

3 A detailed presentation of the proposed methodology and a case study illustration

3.1 SNN architecture

3.2 Transcriptomics data description

3.3 Gene expression data modelling

3.3.1 Data filtering

3.3.2 Data augmentation

3.3.3 Feature selection

3.4 Encoding and mapping of data into the SNN cube for unsupervised learning

3.4.1 Encoding algorithm

3.4.2 Spiking neurons mapping

3.4.3 Unsupervised learning

3.5 Data classification with supervised learning

3.5.1 Supervised learning

3.5.2 Classification

3.6 Analysis of the results and knowledge discovery

3.6.1 Information clustering for knowledge discovery

3.6.2 GIN discovery and analysis

4 Conclusions and future work

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation