1 Introduction

1.1 Evoked Potentials and Their Classification

The specific neural activity that arises from acoustic stimulation as a pattern of voltage fluctuations lasting approximately half a second is an auditory evoked potential [1]. Depending on the type and placement of the electrodes, the amplification of the signal, the selection of the filters and the post-stimulation period, it is possible to detect the neuronal activity that arises from different structures that span from the auditory nerve to the cerebral cortex [2, 3]. Noise reduction is the first step in most biomedical signal processing systems. The quality and accuracy of the rest of the operations carried out on the signal depend to a large extent on the quality of the noise reduction algorithms that have been used in the preprocessing of the signal. The coherent average (CA) or arithmetic mean, as it is also known, can be calculated from the ensemble matrix that is formed with the evoked responses. Where the response to the i-th stimulus is assumed as the sum of the deterministic component of the signal or response evoked s plus a random noise ri (see Eq. 1), which is asynchronous with the stimulus. Where the noise in progress is assumed to be stationary, with zero mean. Consequently, the variance of the noise must be fixed and equal in all the potentials. The average is a simple and direct method. The estimated signal can be modeled as the sum of the deterministic component plus the attenuated noise by a factor of 1/M, as show in Eq. 2.

$$ p_{i} = s + r_{i} $$
(1)
$$ \hat{s} = \frac{1}{M}\sum\limits_{i = 1}^{M} {p_{i} \left( n \right)} $$
(2)

Considering the presence of several types of noise which cause degradation in the performance of the average, the development of new methods that correctly handle these problems is justified. One of these methods that have been proposed in the literature is the weighted average. Several criteria have been used in order to determine the vector of weights that best fits the problem. One of these criteria (minimization of the mean square error) is based on the noise variance of all the cycles. A potential with a high noise level is assigned a lower weight than one with a lower noise level [4,5,6,7,8].

The average ensemble and the weighted average represent linear techniques, and consequently they perform very well when the noise is of Gaussian type. However, in the case that out-of-range artifacts appear occasionally, with large amplitude values, these techniques are limited. The ensemble average and the ensemble median can be seen as special cases within a broad family of existing estimators known as trimmed means, within which are the trimmed average, the Winsorized average, the L-trimmed average, or mean TL and the average Tanh [9,10,11,12].

1.2 Automatic Detection of Evoked Potentials

Methods for the objective detection of Auditory Brainstem Evoked Potentials (ABR) can be characterized as template-based methods and non-template based methods, as reviewed by [6] and referenced in [13]. Most used methods for detection in the frequency domain. The original test of uniform sample scores q (Q-sample uniform) [14] is a nonparametric test that uses the phase ranges of the Fourier components of Q spectral bands to test whether the phases share the same distribution. The test only uses the phase angles in the form of their ranges and rejects the spectral amplitudes.

The most powerful test in the frequency domain according to [15] is Q-sample uniform. This test uses only the phase angles in the form of their ranges while the spectral information amplitude is rejected. In [16] a modification is introduced where the spectral amplitude is also considered, this test has come to occupy a better position than its predecessor in its use for the detection of auditory evoked potentials. Another test [17] that can be considered as a special type of Q-sample test is the Watson Q-sample. This test also uses both phase angles and spectral amplitudes.

2 Methods

Two versions of the Modified Trimmed Mean [18] are adapted for their use in Brainstem Auditory Evoked Potentials (BAEP). It also analyzes the characteristics of the database and how the comparison between the different methods will be carried out.

2.1 Modified Trimmed Mean Adapted to the BAEP

The Trimmed Mean Modified in [18], proposes to determine the cut-off factor t as \( 2*\sigma_{\Gamma } \), where \( \sigma_{\Gamma } \) is the standard deviation of the average background noise, estimated from the isoelectric segment of the electrocardiographic signal as posted in [19,20,21]. In the case of evoked potentials, they do not have an isoelectric segment. However, in the literature consulted several ways of estimating the variance of background noise have been found. Following this approach, the trimming factor could be estimated as 3 or 2 standard deviations of noise, based on the fact that the variance is the square of the standard deviation. In several articles also reviewed, the estimation of background noise has been proposed as the average of the variances of several unique points, used in the estimation of the Fmp [22, 23], as a better approximation to the noise of the signal. In this paper we propose to determine the cut-off factor t as three standard deviations of the estimated background noise using the Fmp.

Another variant of modifying the cut average in order to determine the cut-off factor t was the use of the interquartile range (IQR), defined as the difference between the third quartile Q3 and the first quartile, Q1. The interquartile range is a robust measure, because it only takes into account 50% of the data.

$$ [Q_{1} - 1.5\,*\,IQR,\quad Q_{3} + 1.5\,*\,IQR] $$
(3)

The range given in this equation depends on the factor 1.5, this is an arbitrary value, but it finds its justification according to a normal distribution. In this way, two versions of the Modified trimmed mean are obtained.

2.2 Data

The database used in this study consists of Transient Auditory Evoked Potentials registered in 39 neonatal patients between 1–3 months of age born in Hospital Materno Ramón González Coro, in Havana, Cuba [27]. The signals were recorded with an AUDIX electroaudiometer. A click stimulus with duration 0.1 ms was provided at different intensities (100, 80, 70, 60, 30 dBnHL and 0 dBpSPL) via insert earphones (EarTone3A) [28, 29]. Ag/AgCl dry electrodes were used, which were fixed with electrolytic paste on the forehead (positive), ipsilateral mastoids (negative) and contralateral mastoids (earth). The impedance values were maintained below 5 kΩ. The sampling frequency used was 13.3 kHz, and the analysis windows to form the ensemble matrix P (Eq. 5) and calculate the coherent average were of approximately 15 ms, that is about 200 samples per window (N = 200). From this database, only records obtained at 100 dBnHL (78 signals) were used, where it was confirmed by specialists that a response was present. These signals were used in order to guarantee the maximum values of the quality measures for this database.

2.3 Description of the Experiment

In order to carry out the experiment, set matrices of each of the registers were formed using the time between stimuli of 15 ms as reference. With each record an ensemble matrix of approximately 2000 epochs was formed by 200 samples on average. Each ensemble matrix was averaged using the average methods described in previous sections. The automatic detection measures used to establish the comparison were measured in the frequency domain; they were the Q-Sample Uniform, the Modified Q-Sample and the Q-Sample of Watson U2.

To establish the comparison of the different methods using measures in the frequency domain, first, the records were transformed into set matrices. From each set matrix, sets of 250 epochs are randomly taken 100 times and averaged using each of the average methods that were described. A new matrix (100 × 200) composed of the average vectors obtained in the previous step was formed trying to simulate a Monte Carlo experiment. The matrices obtained for each of the average methods and for each of the different sizes of X are transformed to the frequency domain using the fft (Fast Fourier Transform, from its acronym in English), from which it is obtained an array of dimension 100 × 200 × 2, which contains the phase and amplitude values of the result of applying the transform. This arrangement is to which the different quality measures are calculated. The design of the experiment is based on the design of the experiment proposed in [32] for the comparison of detection measures in the frequency domain.

3 Results

To evaluate the results obtained, a Friedman test was performed. The multipair comparison returns an interactive plot that allows to visually determining which of the methods have differences between them as show the Fig. 1. In all cases, the test resulted in a value of p < 0.05, which suggests that there are significant differences between at least two methods. In order to identify the methods in which the differences existed, a post-hoc test was developed using the Bonferroni method.

Fig. 1.
figure 1

Multiple comparison of methods using all quality measures

Table 1 gives the values of the average range returned by the Friedman test, giving a higher score to the method with the best performance. In this case, although the significant differences are only between the Average TL and the MTM Fmp, according to the Range the best method was the MTM Fmp, followed by the MTM IQR., Which constitute the proposals of this work.

Table 1. Values of the average ranges given by the Friedman test to each of the methods.

Another analysis that is important to consider is to determine which method presented minor differences when it acted with the clean database and with the data without artifacts, values outside the range of ±5 μV. Figure 2 is a total view of all the methods under analysis, in them it can be seen that for the coherent average there are visible differences in their behavior, while the proposed methods prove to be more robust in behaving in a similar way before the two variants of the data.

Fig. 2.
figure 2

Mean values of the measures and methods.

Next, the signals obtained from a random subject are shown using the different methods. In Fig. 3 it can be seen how the MTM Fmp method is the one that comes closest to the expected while other methods move away from the expected signal and present higher noise levels.

Fig. 3.
figure 3

Answers evoked averages with 100 epochs

4 Conclusions

Despite the variety of the proposed methods, this paper describes the proposals for modifications made to the Modified Trimmed Media in order to adapt it for the reduction of noise of auditory evoked potentials. The Modified Trimmed Media is a method that proposes to combine the solutions to the main drawbacks of the coherent average. The Friedman test used showed that the best method was the MTM using Fmp, however, the differences were significant only with respect to the Trimmed Media TL. From the review of the bibliography, the measurements in the frequency domain were selected to establish the comparison between the results obtained by each method. The measure that behaved better was modified Q-sample.

As a consequence of the analysis carried out, it is recommended that the experiment be developed with a data with higher noise levels, which is carried out in a more exhaustive way, checking the results in a Monte Carlo experiment for smaller steps in the formation of the set matrix trying to simulate a real recording environment. It is also recommended that the results obtained be compared with similar results, but in the time domain.