Extended-depth-of-field object detection with wavefront coding imaging system
Graphical abstract
Introduction
In recent years, deep learning has made substantial advances in many fields of computer vision, such as image classification [1], image inpainting [2], semantic segmentation [3], and motion deblurring [4]. Among the application-specific tasks, object detection [5] has extensive prospects for engineering use. This is an integrated framework containing classification, localization and detection. Traditional methods exploit handcrafted features, such as SIFT [6], HOG [7], heat kernel [8], shape model [9] and structural feature [10]. However, the results usually suffer from the dependency on specified datasets. The use of CNN has significantly improved the accuracy and robustness [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22].
Perhaps the first comparatively successful work with CNN features was the OverFeat [11] network using a sliding-window for detection. After that, Girshick et al. proposed R-CNN [12] which combines region proposals and CNNs. SPP-net [13] removed the restriction of fixed input image size with spatial pyramid pooling. Fast-RCNN [14] put forward the ROI pooling that avoided repeated calculations. Faster RCNN [15] relieved the dependence to selective search [16] by generating region proposal network (RPN) before region detection. Above all, these methods improved the accuracy rate by more than one-step detection scheme. The other branch of detection paid more attention to real-time performance with one-step end-to-end pipeline. YOLO [17] was the first architecture predicting bounding boxes and classification from one evaluation. SSD [18] successfully adopted multilayer CNN features for more accurate positioning. YOLO V2 [19] introduced the auxiliary anchors into the grid cells to enhance the accuracy while maintaining the running speed. In addition, more and more schemes [20], [21], [22], [23] were raised to increase the mean average precision (mAP) by well-designed tricks and deeper networks.
Until now, most researches focused on addressing the challenging problems caused by occlusion, deformation and small-size imaging. However, the images collected in the actual scenario rely heavily on the performance of imaging systems. Among the factors influencing the image quality, a fundamental one is the defocus value. In the case of defocus, images are usually blurred because the light rays converge at a plane which deviates from the imaging detector. The blurring effect can also greatly affect the behavior of detection performance. Fig. 1 shows the detection results with YOLO V2 by setting the score threshold as 0.2. As shown in Fig. 1(a), the best focus plane usually gets the best performance. When it comes to the plane with one wavelength deviated from the best focus position, shown in Fig. 1(b), the dog cannot be detected, and the bounding boxes of the car and the bicycle cannot be located as precisely as before.
To overcome the defects of defocus, researchers have tried to increase the DOF by employing apodizer. As a result, the optical intensity reduces too much. Meanwhile, the image resolution degrades severely. In 1995, WFC was proposed by Dowski et al. [24] by inserting a phase mask (PM) at the pupil plane. Thus parallel light rays passing through the optical system no longer converge as a point but a speckle. The optical transfer function (OTF) and point spread function (PSF) remains nearly invariant over a large range of defocus. And the intermediate images captured by the detector need subsequent processing to be sharpened and clarified. This technique has been proved to be successful in extending DOF and restraining kinds of aberrations.
During the past 20 years, the majority of the literatures on this topic have designed many kinds of PMs [25], [26], [27], [28] for improving the extension of DOF. Attempts have also been made for image quality enhancement by two PMs [29], [30] or rotation of one PM [31]. According to the profile, PMs can be divided into two categories: rotationally symmetrical and asymmetrical. Theoretical calculations show that the OTF and PSF of rotationally symmetrical PMs are sensitive to the defocus than that of asymmetrical PMs. Even though the rotationally symmetrical PM performs not as good as asymmetrical one on extension of DOF, its machining and assembly precision can be guaranteed well based on the existing process.
Instead of placing a single phase mask at the pupil plane, recent researches [32], [33] demonstrate that the first or higher order spherical aberrations can be utilized for extending the DOF. This technique is called spherical coding which can be easy to design due to its rotationally symmetric property. Inspired by this, we come up with a new wavefront coding imaging system which contains no specially designed PM. Several elements in the system are combined together for achieving the equal blurry imaging over the assigned region of DOF. We Call this method lens-combined modulated wavefront coding (LM-WFC).
This paper first introduces the basic principle of defocus, and analyzes the theory of wavefront coding. Then the simulation is given to show how the defocus value influences the detection precision, and the mAPs at defocus positions by traditional optical imaging and WFC are demonstrated. The results indicate that WFC successfully improves the accuracy rate. Then the LM-WFC method is described in detail. According to the proposed method, a LM-WFC based system is designed and machined into lens. Comparison experiments are conducted to show the improvement on detection results by applying WFC.
Section snippets
Theory of WFC
WFC is a kind of optical-digital hybrid system (Fig. 2). Two-step imaging enables its ability for extending the DOF. By inserting a PM, the light rays no longer converge as a point but spread as a uniform thin beam near the imaging plane. Over a large region of defocus, the detector can obtain defocus-insensitive sampled intermediate images. Different kinds of phase mask result in different modulation effect on intermediate images. In this study, we take the classic cubic phase mask (CPM) as
Detection results of traditional imaging and WFC in the case of defocus
Among the state-of-the-art object detection methods, we select the Faster RCNN [15] as the representative to show the influence by defocus. Faster RCNN is a single, unified network with links between RPN and detection network. By introducing the assistance of anchors, it establishes the RPN for accelerating the step of region proposal. Various scales and ratios of anchors enable it handle the changes of targets. ROI pooling helps to select scale-invariant features during the detection step. It
Design of LM-WFC system
In this section, we propose the LM-WFC technique which improves the DOF and other parameters. In view of the imaging characteristic, WFC is based on giving up the conjugate image but obtaining the fuzzy image, the tolerance for aberration of the imaging system is looser. In theory, it is possible to increase the aperture and the DOF at the same time through the method of rationally using aberration caused by the aperture increasement.
Different from the traditional WFC, there is not any single
Experiment results
In order to demonstrate how extending DOF method can help improve the detection, we manufacture the designed systems, as shown in Fig. 8. The two systems are tied together to share the same field of view, as shown in Fig. 8(c). We adopt the Fast image deconvolution with hyper-Laplacian algorithm [38] for restoring the intermediate images in LM-WFC system. After that, Faster RCNN [15] is exploited to detect the targets in the final images, and the results are presented in Fig. 7. In the first
Conclusion
In order to reduce the impact of defocus on the detection results in traditional imaging, we come up with a method by applying the WFC technique. The simulation results show that the images obtained by WFC outperform that by traditional imaging on the detection results over a large range of defocus. We also propose a new LM-WFC method to enlarge the DOF and aperture simultaneously. Start with the theory of traditional WFC, the design principle of LM-WFC system is described. Based on that, an
Declaration of competing interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in the manuscript entitled with “Extended-depth-of-field object detection with wavefront coding imaging system”.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (NSFC nos. 11774031 and 61705010) and Beijing Science and Technology Project (No. Z181100005918002).
References (38)
- et al.
Graph characteristics from the heat kernel trace
Pattern Recognit.
(2009) - K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556....
- et al.
Context encoders: feature learning by inpainting
- et al.
Fully convolutional networks for semantic segmentation
- et al.
Learning a convolutional neural network for non-uniform motion blur removal
The pascal visual object classes (VOC) challenge
Int. J. Comput. Vis.
(2010)Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
(2004)- et al.
Histograms of oriented gradients for human detection
- et al.
Object detection via structural feature selection and shape model
IEEE Trans. Image Process.
(2013) - et al.
VHR object detection based on structural feature extraction and query expansion
IEEE Trans. Geosci. Remote Sens.
(2014)