Keywords

1 Introduction

In recent years, benefiting from the rapid development of imaging spectroscopy, the Fourier transform spectrometer [1,2,3,4,5,6] has played an important role in both space exploration and component analysis. Compared with other spectrometer, it has many advantages in high throughput, multi-channel operation, and high resolution.

Fourier transform spectroscopy obtains abundant data containing space and spectral information of the target simultaneously. The core device of the spectrometer is the interferometer. The light from the target is separated into two coherent beams, and with the change of the optical path difference, the two interfere beams will interfere on the sensors so as we can obtain a series of interference patterns.

Generally, spectral reconstruction [7, 8] mainly includes preprocessing, apodization, phase correction and Fourier transform. An important step in preprocessing is detrending, which is to remove the slowly varying trends from the interfering signals. The purpose of apodization technique is to reduce spectrum leakage by selecting some appropriate functions. To ensure the symmetry of interferogram, phase correction is performed to correct the phase error caused by sampling position offset. Then, we would obtain the spectrum by Fourier transform.

In order to process the obtained interferograms as quickly as possible, we usually simplify the process by omitting some steps, resulting in low spectral quality and poor resolution. Specially, the fast intelligent processing system on satellite requires real-time spectrum reconstruction to save memory and bandwidth, which requires a high-speed method to replace the traditional pipeline.

Comparing with the traditional data-process pipeline, it is a new way to perform general computing on the graphics processing unit (GPU). It is particularly suitable for solving problems that can be represented as data parallel computing, i.e. the parallel execution of the same program on many data elements. Meanwhile, NVIDIA provide Compute Unified Device Architecture (CUDA) as a general-purpose parallel-computing platform and programming model to solve many computational problems more efficiently [9,10,11].

Fortunately, in the embedded field, NVIDIA also makes great contribution and it provides an embedded development kit, suitable for NVIDIA Jetson series. NVIDIA Jetson represents a series of computing processor boards from NVIDIA. All Jetson boards are carrying a Tegra processor, including Jetson TK1, TX1 and TX2 models. NVIDIA claims that it is a AI supercomputer on a module, powered by the NVIDIA Pascal architecture. Best of all, it packages this performance into a small, power-efficient form factor that is ideal for intelligent edge devices like robots, drones, smart cameras and portable medical devices. The Jetson TX2 supports all the features of the Jetson TX1 module, and enables a larger, more complex deep neural network (DNN).

In this paper, the embedded board, NVIDIA Jetson TX2, is used for our real-time embedded platform, on which our parallel interferogram processing algorithms are performed. The rest of this paper is structured as follows: Sect. 2 briefly explains the characteristics and advantages of the board. Section 3 depicts the algorithms of parallel processing running on the GPU of the embedded board. The experiments are arranged in Sect. 4. At the end of this paper, we make an analysis and draw conclusions.

2 Embedded System

NVIDIA Jetson with GPU-accelerated parallel processing is a leading embedded computing platform. The most important feature is that the Jetson series provide CUDA for developers to improve the performance of algorithms.

Jetson TX2 is a fast, power-efficient embedded AI computing device. This 7.5-watt supercomputer on a module is built around an NVIDIA Pascal-family GPU. In addition to being loaded 8 GB of memory and 59.7 GB/s of memory bandwidth, it has a variety of standard hardware interfaces that make it easy to integrate into a wide range of products and form factors. Some other parameters about TX2 is as follows (Table 1).

Table 1. Details about TX2

From the table, we could see the embedded system is running on two types of ARM with a high-performance GPU. The CPU and GPU differ in frequency in different working modes. Their work frequency is not all the same. The performance of algorithms is also different in different mode. These modes are listed in the following.

  • Mode 0: Denver 2 (2.0 GHz), ARM A57 (2.0 GHz), GPU (1.30 GHz);

  • Mode 1: ARM A57 (1.2 GHz), GPU (0.85 GHz);

  • Mode 2: Denver 2 (1.4 GHz), ARM A57 (1.4 GHz), GPU (1.12 GHz);

  • Mode 3: ARM A57 (2.0 GHz), GPU (1.12 GHz);

  • Mode 4: Denver 2 (2.0 GHz), GPU (1.12 GHz);

3 Theory

According to the basic principle of Fourier transform spectroscopy [12], spectral recovery can be achieved by Fourier transform of the interferogram. This principle could be described by the following equation:

$$\begin{aligned} I(\varDelta )=\int _{-\infty }^{+\infty }B(\sigma )e^{j2\pi \sigma \varDelta }d\sigma , \end{aligned}$$
(1)
$$\begin{aligned} B(\sigma )=\int _{-\infty }^{+\infty }I(\varDelta )e^{-j2\pi \sigma \varDelta }d\varDelta , \end{aligned}$$
(2)

where I is the interferogram, B is the spectrum, and \(\varDelta \) and \(\sigma \) mean the path difference and the wave number, respectively. We can use fast Fourier transform (FFT) instead, whose complexity is O(NlogN).

In this paper, the data-parallel-process pipeline of spectrum reconstruction is divided into three part: detrending, apodization, phase correction and Fourier transform. These parallel algorithms are similar with [13].

3.1 Detrending

Usually, the interference signal x(t) consists of a slowly varying trend superimposed on a fluctuating process y(t). It should take measures to eliminate the trend term, containing the low-frequency part. The trend term could be solved by the least square method, searching for the most appropriate function by minimizing the square errors.

For a linear model that is described by the following equation,

$$\begin{aligned} y=A\beta +b, \end{aligned}$$
(3)

the parameters could be solved by the least square method, which can be estimated by

$$\begin{aligned} \hat{\beta }=(X^TX)^{-1}X^Ty. \end{aligned}$$
(4)

If X is a full-rank matrix,

$$\begin{aligned} rank(X)=n,m\ge n,X\in R^{m\times n} \end{aligned}$$
(5)

X can be decomposed by QR decomposition (QRD), that is,

$$\begin{aligned} X=QR, \end{aligned}$$
(6)

where Q is an orthogonal matrix meaning and R is an upper triangular matrix. \(\hat{\beta }\) could be written by the following form,

$$\begin{aligned} \hat{\beta }=R^{-1}Q^Ty. \end{aligned}$$
(7)

That is, the detrending is converted to QR decomposition and matrix inversion. For parallel computing, parallel QRD and parallel matrix inversion are performed in our embedded board. These parallel algorithms in [13] could be used for our experiments.

3.2 Apodization

The ideal range of optical path difference is from negative infinity to positive infinity, which in reality is not satisfied by detectors to collect infinite data. That is, the signals we obtain are truncated. According to the Fourier theory, these truncated signals that can be seen as the multiplication of a sequence and a rectangular window, are equivalent to the convolution of the original spectrum of the signal with a sinc function in frequency domain [14]. For a continuous spectrum, the spectral resolution is limited by the sidelobe of the rectangular window due to the discontinuity of the interferogram near the maximum OPD.

Apodization is based on the point-to-point multiplication of interference sequence and a certain apodizing function to suppress the sidelobe effect in the recovery spectrum. In parallel computing, the apodization function using multi-thread can be expressed by

$$\begin{aligned} y(tid)=w(tid)x(tid), \end{aligned}$$
(8)

where tid is the current index of the thread and w is the apodization function. Some functions for apodization include the triangular function, Happ-Genzel function, Hamming function, and Bessel function.

3.3 Phase Correction and Fourier Transform

Generally phase is corrected by using Fourier transform so that phase correction are performed together with Fourier transform.

In our experiment, our interferogram is provided by our interferometer. It is a single-sided interference signal, which contains a double-sided interferogram around the zero OPD, and Mertz method is used for the phase by this double-sided interferogram. Suppose the interferogram is asymmetrical because the detector does not pick up the value at the zero OPD and this would introduce a new optical path difference \(\phi (\sigma )\), so

$$\begin{aligned} I(\varDelta )=\int _{-\infty }^{+\infty }B(\sigma )e^{-j(2\pi \sigma \varDelta +\phi (\sigma ))}d\sigma . \end{aligned}$$
(9)

And

$$\begin{aligned} B(\sigma )e^{-j\phi (\sigma )}=m_r(\sigma )+jm_i(\sigma ), \end{aligned}$$
(10)

where \(m_r(\sigma )\) is the real part of \(B(\sigma )e^{(\sigma )}\) and \(m_i(\sigma )\) is the imaginary part.

In Mertz method [15], the phase is in a low resolution so that it could be acquired to fit the low phase spectrum based on the least square method by using a high-order polynomial for high-resolution phase spectrum \(\phi _0(\sigma )\). The difference between the original phase information \(\phi (\sigma )\) and the high-resolution phase spectrum can be calculated,

$$\begin{aligned} \varDelta \phi =\phi (\sigma )-\phi _0(\sigma ). \end{aligned}$$
(11)

The final spectrum through phase correction is given by

$$\begin{aligned} B(\sigma )=\sqrt{m_r^2(\sigma )+m_i^2(\sigma )}cos(\varDelta \phi (\sigma )). \end{aligned}$$
(12)

In phase correction, we use improved Mertz method [16] in which the high-resolution phase spectrum is obtained by zero filling for the double-sided interferogram to guarantee the same length as the original signal. Furthermore, it is more efficient than Mertz method in parallel computing.

4 Experiments

Our experiments are implemented in C++ and CUDA C on the NVIDIA Jetson TX2 board. Our embedded system is shown in Fig. 1. The white light interferogram is provided by our interferometer for our experiments, as shown in Fig. 2.

Fig. 1.
figure 1

The NVIDIA Jetson TX2 board

Fig. 2.
figure 2

The white light interferogram

Fig. 3.
figure 3

Result of spectrum reconstruction

4.1 Reconstruction

Figure 3 is the reconstruction result from the Fig. 2. From the figure, our algorithms for interferogram processing has a great result.

4.2 Batch Processing in Different Work Mode

The performance in the Jetson TX2 in all working mode is listed in Tables 2, 3, 4, 5 and 6, respectively. In these tables, Nor. means the QRD and matrix inversion are included and Opt. represents a simplified process, where we record the results of QRD and matrix and use them directly for the batch processing. The Groups are the number of batches we process.

Table 2. Performance in Mode 0
Table 3. Performance in Mode 1
Table 4. Performance in Mode 2
Table 5. Performance in Mode 3
Table 6. Performance in Mode 4

From these tables, Mode 0 has the best performance because of the board working frequency is highest between the five modes. However, the board working in other modes may save more power.

4.3 Application

For the actual collected scanning data by the LASIS, as shown in Fig. 4, the size of an image is \(256\times 1024\). The interference sequence length is 128 with 16 single-sided zero-crossing samples. The wavelength is from 450 to 900 nm. For a frame of real scene, there are about 200,000 interference fringes to process. On the development board, the spectrum is reconstructed within about 47 ms. The result of spectrum reconstruction is shown in Fig. 5.

Fig. 4.
figure 4

The actual scanning scene

Fig. 5.
figure 5

The spectral cube

5 Conclusions

In this paper, the pipeline of the interferogram processing in spectrum reconstruction have been explored on the embedded NVIDIA Jetson TX2. The construction result reaches great success on the development board. For batch processing, the GPU has given obvious performance improvement, compared with the other ARMs. The processing pipeline we designed is well tested on our board. In the spectrum reconstruction, for detrending, we use parallel QRD and matrix inversion algorithms; for phase correction, an improved Mertz method has been performed for a fast phase correction. These parallel algorithms also has high performance. Especially, the more data, the higher the performance.

As high-performance processing pipeline on the embedded system, it could be considered fast and effective calculations for the interferogram process to meet actual requirements.