STDIN: Spatio-temporal distilled interpolation for electron microscope images
Introduction
The temporal information contained in consecutive images allows an increase in z-axis resolution and thus an improvement in the quality of volume reconstruction (see Fig. 1), when imaging with an electron microscope (EM). For example, images with a 4 nm z-axis resolution are interpolated to achieve a 2 nm resolution. High-resolution serial slices containing fine z-axis motion dynamics [1] build superiorly refined biological structures, which facilitates intelligent data analysis tasks in biology. Furthermore, significant surface imperfections are restored through EM image interpolation. The aforementioned illustrates the social importance of EM interpolation in microscopic imaging.
Series section imaging records the continuity of biological tissues along the z-axis, while video streams reflect the temporal progression of events. Compared with general video streams, biological serial slices have fundamental differences in appearance, including grayscale, more prominent edges, and content patterns. These distinctions stem from the SEM imaging system and impact the application of common motion estimation and compensation algorithms. Fig. 2 illustrates the distinctions between EM and natural images. First, neither grayscale nor sRGB images in natural scenes reveal the edges of objects. In contrast, EM images in grayscale hold visible tissue edges. Second, the content patterns between natural and EM images are dramatically different. EM images are dominated by simple patterns, such as membrane structures, mitochondria, and vesicles, whereas the former contains a more diverse range of textures and substances. Lastly, as highlighted in colorful boxes, the motion trend in natural images is traceable and observable. For EM images, a z-axis resolution of just 10 nm for EM images causes massive and chaotic deformations. In addition, the EM imaging system is more complex and less stable than the optical CCD, resulting in a low signal-to-noise ratio and unstable image quality. In summary, EM images exhibit more complex deformation patterns. Consequently, a single offset per pixel in the flow estimator [5], [6], [7], [8] is inadequate for describing complicated motion in EM images.
In recent years, the use of deep convolutional neural networks to interpolate video images has shown promising results. In this setting, early researches [9], [10] estimate the spatial adaptive convolution kernel per pixel, and employ separable strategies to reduce the model’s capacity. However, the motion construction with adaptive kernels is as well one of the reasons they perform inadequately in complex scenarios. The deep optical flow [5], [6], [7], [8] computes the motion relationships between frames with high accuracy. The following studies [11], [12], [13], [14], [3] interpolate video images using deep optical flow estimation, resulting in visually convenient results. However, the deep optical flow predicts a single offset for each location and warps the pixel at that point to conform to the predicted offset. Due to their limited modeling capabilities, flow-based interpolation algorithms perform inconveniently on complicated EM images, resulting in severe smoothing and artifacts. For motion estimation, deformable convolution [15], [16] provides a novel solution, which calculates multiple offsets for each position. Numerous researches [17], [18] employ deformable convolution rather than deep flow estimation to achieve implicit temporal alignment. Recently, [4] integrated deformable convolution and pyramid features [19], [20] into frame interpolation and proposed a module for interpolating pyramid temporal features. However, the feature temporal interpolation module only captures the temporal context without incorporating joint spatial correlation, leading to the degradation of intermediate features. Furthermore, the lack of spatial rectification accentuates the temporal mismatch when large motions occur. The ConvLSTM [21], used for large movements, presents significant model parameters as well as a slow run time.
To address the aforementioned shortcomings, this study proposes an efficient spatio-temporal distilled interpolation network (STDIN) for EM images. The feature spatio-temporal ensemble (STE) module handles the dynamic background and predicts the interpolated features accurately. Specifically, the STE captures pyramid temporal features and calculates the spatial correlation coefficients. Afterward, the final interpolated features are synthesized using region sampling. Although the intermediate features with dual-domain embedding show promising performance in motion processing, the interpolated features still exhibit multiple mismatches when encountering severe anisotropy and large motions. Moreover, when using interpolated features as a reference and subsequently extracting from the input features the relevant ones, the incorrect prediction of intermediate features is exacerbated due to the interference of background noise and image quality fluctuations. Therefore, this paper proposes a lightweight, stackable feedback distillation block (SFDB) to purify intermediate features and minimize the temporal mismatch caused by large deformations. The SFDB module adapts the feedback distillation in response to input features. Moreover, this study discovers that the feedback distillation correction is stackable. Hence, the resulting intermediate features become more precise as the number of stacked modules increases.
The contributions of this paper are summarized as follows:
- 1.
The work of video frame interpolation is extended to electron microscopy and proposes a simple but effective framework for interpolating EM images. This approach incorporates spatio-temporal ensemble sampling and feedback distillation. Subsequently, the interpolated frames generated from EM images are more precise than those generated by previous interpolation algorithms.
- 2.
A spatio-temporal ensemble module that includes temporal context and spatial correlated information is presented. Moreover, a novel feedback distillation module is introduced, which enables the acquirement of the best aligned intermediate features under the supervision of input images.
- 3.
Extensive experiments demonstrate that this approach achieves state-of-the-art performance on the EM benchmark datasets and outperforms the recent best frame interpolation algorithms.
Section snippets
Video Frame Interpolation
Video frame interpolation (VFI) is a technique that uses input frames to predict non-existent intermediate frames. [22] first introduce general convolutional neural networks (CNNs) [23], [24] into video frame interpolation. However, severe artifacts and blur are unavoidable when the CNNs directly synthesize interpolated frames. In this context, [25] propose deep voxel flow to warp the input frames based on triple sampling, which produces low blur but performs insufficiently in sceneries with
Proposed Method
Given two input EM frames and , that are continuous in the z-axis, our goal is to synthesize the corresponding intermediate frame . To accurately extract the deformation field from the complex EM images and deal with the unstable image quality, we propose a novel spatio-temporal distilled interpolation framework, which progressively aggregates the temporal content and spatial-related information. We first encode the input feature maps: and , using the feature extractor with a
Implementation Details
As for the implementation of this study, we randomly crop a triplet of EM image patches with the size of is randomly cropped the odd-indexed two frames are used as inputs while the corresponding frame is employed as supervision. For data augmentation, this study randomly rotates and , flips them horizontally and arbitrarily reverses their temporal order.A Pyramid, as well as Cascading and Deformable (PCD) architecture in [17] are used to employ temporal deformable alignment
Evaluation from a Biology Perspective
As stated in Section 1, the objective of EM slice interpolation is to increase Z-axis resolution, decrease anisotropy and hence improve volume reconstruction. With the rapid evolution of biomedical segmentation, large-scale volume reconstruction [44] is now able to rebuild biological tissues of human interest, such as membranes, mitochondria and synapses. Consequently, the proposed method is evaluated from a biological perspective through biomedical segmentation. More specifically, the membrane
Conclusion
This study develops a framework for interpolating EM images with complex deformations and unstable quality. This framework comprises two primary modules: one for spatio-temporal fusion and the other for feedback distillation. The spatio-temporal ensemble module estimates spatial correlation coefficients and samples similar textures based on temporal features to maintain edge continuity. Due to the inherent mismatches of temporal features, a stackable feedback distillation module is proposed for
CRediT authorship contribution statement
Zejin Wang: Methodology, Visualization, Formal analysis, Writing - original draft. Guodong Sun: Data curation, Writing - review & editing. Guoqing Li: Data curation, Writing - review & editing, Supervision. Lijun Shen: Writing - review & editing, Supervision. Lina Zhang: Visualization, Data curation. Hua Han: Resources, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Science and Technology Innovation 2030 Major Program (2021ZD0204503, 2021ZD0204500), the Strategic Priority Research Program of Chinese Academy of Science (No. XDB32030208 to H.H.), International Partnership Program of Chinese Academy of Science (No. 153D31KYSB20170059 to H.H.), Program of Beijing Municipal Science & Technology Commission (No. Z201100008420004 to H.H.), National Natural Science Foundation of China (No. 32171461 to H.H.), and the Strategic
Zejin Wang received the B.S. degree in School of Electrical and Mechanical Engineering from Wuhan University of Technology, Wuhan, in 2018. He is currently pursuing his Ph.D. degree in the Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include image restoration, video interpolation and self-supervised learning.
References (47)
- et al.
Video interpolation using optical flow and laplacian smoothness
Neurocomputing
(2017) - et al.
Enhanced fib-sem systems for large-volume 3d imaging
Elife
(2017) - et al.
Video enhancement with task-oriented flow
International Journal of Computer Vision
(2019) - et al.
Depth-aware video frame interpolation
- X. Xiang, Y. Tian, Y. Zhang, Y. Fu, J.P. Allebach, C. Xu, Zooming slow-mo: Fast and accurate one-stage space-time video...
- et al.
Flownet: Learning optical flow with convolutional networks
- et al.
Flownet 2.0: Evolution of optical flow estimation with deep networks
- et al.
Optical flow estimation using a spatial pyramid network
- D. Sun, X. Yang, M.-Y. Liu, J. Kautz, Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume, in:...
- et al.
Video frame interpolation via adaptive convolution
Video frame interpolation via adaptive separable convolution
Context-aware synthesis for video frame interpolation
Super slomo: High quality estimation of multiple intermediate frames for video interpolation
Memc-net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement
IEEE Transactions on Pattern Analysis and Machine Intelligence
Deformable convolutional networks
Deformable convnets v2: More deformable, better results
Very deep convolutional networks for large-scale image recognition
ICLR
Feature pyramid networks for object detection
Learning image matching by simply watching video
Deep residual learning for image recognition
Cited by (2)
Self-supervised medical slice interpolation network using controllable feature flow[Formula presented]
2024, Expert Systems with ApplicationsExploring the Neural Organoid in High Definition: Physics-Inspired High-Throughout Super-Resolution 3D Image Reconstruction
2023, 2023 Asia Communications and Photonics Conference/2023 International Photonics and Optoelectronics Meetings, ACP/POEM 2023
Zejin Wang received the B.S. degree in School of Electrical and Mechanical Engineering from Wuhan University of Technology, Wuhan, in 2018. He is currently pursuing his Ph.D. degree in the Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include image restoration, video interpolation and self-supervised learning.
Guodong Sun received the B.S. degree in automation from North China Electric Power University, Beijing, China, in 2019. Now he is pursuing his master’s degree in the Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include instance segmentation and active learning.
Guoqing Li received the B.S. degree from Beijing Jiaotong University, Beijing, China, in 2007 and the Ph.D. degree in National Space Science Center, Chinese Academy of Sciences, Beijing, China, in 2012. He is the image algorithms senior engineer in Hermes-Microvision Ltd. in 2013–2015. Now he is an Assistant Research Fellow with Institute of Automation, Chinese Academy of Sciences. His research interests include electron microscope imaging system, image and video processing and structure analysis and modeling of brain.
Lijun Shen received the B.S. degree from Northwestern University, Xi’an, China, in 2004, the M.S. degree from Inner Mongolia University of Technology, Hohhot, China, in 2010 and the Ph.D. degree from Macau University of Science and Technology, Macau, China, in 2021. He is currently an Assistant Research Fellow with Institute of Automation, Chinese Academy of Sciences. His research interests include massive data management and distributed computing.
Lina Zhang received her M.S. degree from China University of Geosciences (Beijing), major in materials engineering, in 2017. She is currently a microanalysis engineer, engaged in the acquisition of high-throughput microscope images in the field of brain science and material science.
Hua Han received the B.S. degree from Xi’an Jiaotong University, Xi’an, China, in 1996, the M.S. degree from Chinese Ship Research and Development Academy, Beijing, China, in 1999 and the Ph.D. degree from Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2004. He is currently a Professor with the Institute of Automation, Chinese Academy of Sciences, a member of CAS Center for Excellence in Brain Science and Intelligence Technology, and a Professor with the Future technological college, University of Chinese Academy of Sciences. His research interests include image processing, computational neuroscience and pattern recognition.