Intelligent visual localization of wireless capsule endoscopes enhanced by color information
Introduction
Wireless capsule endoscopy (WCE) is an established modality aiming to non-invasive imaging of the whole gastrointestinal (GI) tract. Capsule endoscopes (CEs) are equipped with a color video camera, light emitting diodes (LEDs) and a wireless transmitter of images. CEs are swallowable devices, the size of a large vitamin pill. CE passively travels through GI by taking advantage of both its peristaltic motion and gravity, i.e., its motion cannot be controllable. Many research efforts that propose active, robotic CEs have appeared during the current decade. It is expected that such technology would allow for more thorough examinations, easier and much more accurate lesion localization and also drug infusion [1] in the appropriate regions. However, such devices are still at the stage of development or only research prototypes. Yet, traditional WCE has gained a lot of attention and is considered to be the first-line diagnostic tool for screening of small bowel diseases [2].
During its approx. 12 h recording of the GI tract, a CE captures tenths of thousands of images. Reviewing such large volume of recorded content requires a significant human effort, since reviewer's attention should remain undivided for about 45–90 min [3]. This allows for potential errors by transforming the experience to a tiring reading, having consequences in the diagnostic aspect of the WCE [4]. To overcome these substantial drawbacks, several methods for automatic detection of lesions have been proposed [5]. These methods typically succeed achieving lower error rates than human readers, however, they still are not capable to provide accurate localization of the CE within the GI. The latter is highly desirable for maximizing the success of any subsequent medical or surgical intervention. Nevertheless, at an effort to approximate the CE's position, external radio frequency (RF) sensor arrays, are externally mounted to the human body, so as to receive the signals transmitted by the CE and estimate the position by triangulation. This way, a coarse estimation is provided although still insufficient for practical applications, due to large localization error [6]. This approach can only provide approximate transit time estimation with respect to anatomic landmarks [5], and localization in the 2D or 3D abdominal space with respect to the abdominal quadrant. Other localization approaches include magnetic localization, magnetic resonance, ultrasound and positron emission-based imaging [6], [7].
It is, however, feasible to exploit the visual content of the images that are collected by the CE for localization purposes without the need of any other internal and/or external equipment and by using only computer vision algorithms analyzing raw WCE video frames [8]. In this paper we propose a novel, intelligent approach for estimating the location of the CE within the GI using a neural network (NN). We are able to provide visual odometry (VO), i.e., we use sequential WCE video frames to estimate the actual distance travelled by the CE within the GI. The novelties of our approach are: a) it is adaptive, i.e., it does not require any prior knowledge about the geometric model of the CE and its intrinsic parameters (e.g., focal length, optical center), thus it can be used with all commercially-available CE models; b) it exploits color information, considering both the intensity of the light reflected from the lumen tissues as emitted from the light diodes (LEDs) of the CE, and the chromatic components of the luminal images. The proposed methodology constitutes a monocular non-parametric VO approach and is based on salient image points extracted by the SIFT [9] algorithm and tracked by a scheme that comprises of the well-known KLT tracker [10] and RANSAC algorithm [11]. The proposed enhancement to this approach is based on the idea that light intensity becomes lower for points that are deeper in the lumen, while the chromatic components, as they remain practically unaffected by the variations of light intensity, enable more accurate detection of correspondences between consecutive frames, that are necessary for accurate tracking of the motion of the CE. We believe that both novelties are equivalently important since a) typically, CE manufacturers do not provide any information regarding its technical details; and b) depth and motion patterns of the CE are better conceived.
The remaining of this paper is structured as follows: Next, section 2 provides a brief overview of related work in the broader field of VO and camera calibration, focusing on WCE applications or other relevant research works applicable in the context of WCE. In section 3 we present in detail the proposed intelligent approach to VO for CEs. In section 4 we briefly review the robotic-assisted experimental setup we have adopted for the sake of the evaluation of the proposed methodology [17], a traditional, baseline geometric VO approach that requires a camera calibration step and the measurements that have been collected using the adaptive NN approach. The experimental results are then discussed in section 5 in comparison with the results of other state-of-the-art approaches. Finally, conclusions are derived and directions for future work are suggested in section 6.
Section snippets
Related work
Visual odometry is an area within the broader field of computer vision. Typically, it focuses on tasks such as robot navigation. Its goal is to allow a robot to navigate within a known or unknown environment by knowing its position and orientation, solely based on the visual information (i.e., consecutive video frames) gathered typically by one (monocular odometry) or by two (stereo odometry) cameras. In some cases, omnidirectional cameras are also used. The properties of the camera(s) used are
Method
The proposed methodology adopts a feature extraction scheme that uses Lowe's Scale Invariant Feature Transform (SIFT) features [9] which are then matched using the RANdom SAmple Consensus (RANSAC) algorithm [11] and/or tracked using the Kanade-Lucas-Tomasi (KLT) tracker [10]. In this section we present details on all aforementioned algorithms. Moreover, we present in detail the proposed intelligent and adaptive VO scheme.
Experimental setup
In the context of this work and for the evaluation of the adaptive VO approach that has been discussed in section 3.4, we opted for the robotic-assisted experimental setup originally presented in Ref. [17], which approximates a bowel and the movement of the CE within it. The reason for selecting such a setup instead of using real-life WCE videos was the need for constructing an accurate ground truth for the location of the CE. Thus, we were able to acquire accurate measurements of the distance
Discussion
We proposed and validated a novel intelligent VO approach for CE localization with an ex-vivo robotic experiment enabling the validation of CE motion estimation in physical units. The results showed that it provides an improved performance over the current geometric VO approach. The errors reported in the literature using geometric VO approaches range between 2.7 and 7.2 cm using different experimental frameworks [17], [18]. Other CE localization methods have also been proposed reporting a
Conclusions
We proposed a novel intelligent approach for VO within the GI tract, using solely visual information extracted from the video captured by the CE. The contributions of this work can be summarized as follows: a) we use pixel appearance of the SIFT key-points, i.e., both their intensity and chroma to derive depth information from the video; b) we use KLT tracking in combination with RANSAC algorithm in order to eliminate outlier feature correspondences; c) we use an MLP designed for monocular VO
Acknowledgments
We would like to thank Prof. Gastone Ciuti Eng. Federico Bianchi, Sant’Anna School of Advanced Studies, Pisa, Italy, for the acquisition of the videos of the robotic experiment and the provision of the related photographic material. We would also like to acknowledge the contribution of Dr. Alexandros Karargyris, IBM Research, USA, for the design of the experimental setup, and Prof. Ervin Toth, Department of Gastroenterology, Skåne University Hospital, Malmö. Lund University, Sweden, for
References (40)
- et al.
Wireless capsule endoscopy
Gastrointest. Endosc.
(2013) - et al.
Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks
Neural Netw.
(1990) - et al.
Comparative assessment of feature extraction methods for visual odometry in wireless capsule endoscopy
Comput. Biol. Med.
(2015) - et al.
Wireless endoscopy in 2020: will it still be a capsule?
World J. Gastroenterol. WJG
(2015) - et al.
Optimizing lesion detection in small-bowel capsule endoscopy: from present problems to future solutions
Expert Rev. Gastroenterol. Hepatol.
(2015) - et al.
Detection of lesions during capsule endoscopy: physician performance is disappointing
Am. J. Gastroenterol.
(2012) - et al.
Software for enhanced video capsule endoscopy: challenges for essential progress
Nat. Rev. Gastroenterol. Hepatol.
(2015) - et al.
Design, implementation, and fundamental limits of image and RF based wireless capsule endoscopy hybrid localization
IEEE Trans. Mob. Comput.
(2016) - et al.
An effective localization method for robotic endoscopic capsules using multiple positron emission markers
IEEE Trans. Robotics
(2014) - et al.
A review of localization systems for robotic endoscopic capsules
IEEE Trans. Biomed. Eng.
(2012)
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
Detection and Tracking of Point Features
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography
Commun. ACM
Flexible camera calibration by viewing a plane from unknown orientations
A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses
IEEE Trans. pattern Anal. Mach. Intell.
Image distortion correction for wireless capsule endoscope
Capsule endoscope localization based on visual features
Video-based measurements for wireless capsule endoscope tracking
Meas. Sci. Technol.
Robotic validation of visual odometry for wireless capsule endoscopy
A video-based speed estimation technique for localizing the wireless capsule endoscope inside gastrointestinal tract
Cited by (29)
The evolving role of small-bowel capsule endoscopy
2023, Best Practice and Research: Clinical GastroenterologySmall Bowel Capsule Endoscopy and artificial intelligence: First or second reader?
2021, Best Practice and Research: Clinical GastroenterologyCitation Excerpt :Then, several studies were performed using visual odometry (VO), which is the process of determining the position and orientation of a machine by analysing the associated camera images. Dimas et al. [35] developed a novel adaptive AI methodology for visual odometry (VO) of SBCE. This system learns from the relationship between the coordinates of the image plane and those of the real-world and using pixel appearance (i.e., light intensity and chromatic information) for better comprehension of depth is finally able to estimate the motion and therefore localize the SBCE through the GI lumen.
Color masking improves classification of celiac disease in videocapsule endoscopy images
2019, Computers in Biology and MedicineCitation Excerpt :In the current study, a means was devised to mask dim areas and extraneous substances present in the videocapsule images, to determine whether this would increase significance in distinguishing images obtained from celiac patients versus controls, and whether it would enable a fixed linear discriminant function, rather than variable and/or nonlinear discriminant functions, to be used for discernment of celiac versus control images. To remove extraneous substances from the videoclip images prior to processing, color information was quantified, as has been done in prior studies [21,22]. Two color masks were created.
Deep learning and conditional random fields-based depth estimation and topographical reconstruction from conventional endoscopy
2018, Medical Image AnalysisCitation Excerpt :More advanced approaches attempt to reconstruct colon surfaces with restrictive assumptions (Hong et al., 2014), and there is currently no deep learning-based approach that robustly and accurately estimates depth from a colonoscopy video. Recent work on 3D reconstruction from capsule endoscopy video has shown favorable results by using temporal information and localization of the capsule (Dimas et al., 2017b; 2017a; Karargyris and Bourbakis, 2011). Promising results have been achieved for topographical reconstructions using a structured light approach such as those presented by Schmalz et al. (2012) and Maier-Hein et al. (2013).
Pose estimation via structure-depth information from monocular endoscopy images sequence
2024, Biomedical Optics ExpressMagnetic Localization of Wireless Ingestible Capsules Using a Belt-Shaped Array Transmitter
2023, Electronics (Switzerland)