Elsevier

Computers in Biology and Medicine

Volume 89, 1 October 2017, Pages 429-440
Computers in Biology and Medicine

Intelligent visual localization of wireless capsule endoscopes enhanced by color information

https://doi.org/10.1016/j.compbiomed.2017.08.029Get rights and content

Highlights

  • A visual localization method for wireless capsule endoscopy is proposed.

  • A multi-layer perceptron is used to solve the visual odometry problem.

  • The use of color features enhances the visual localization performance.

  • Validation is performed with a robotic-assisted experimental setup.

  • Better localization performance than geometric visual odometry is reported.

Abstract

Wireless capsule endoscopy (WCE) is performed with a miniature swallowable endoscope enabling the visualization of the whole gastrointestinal (GI) tract. One of the most challenging problems in WCE is the localization of the capsule endoscope (CE) within the GI lumen. Contemporary, radiation-free localization approaches are mainly based on the use of external sensors and transit time estimation techniques, with practically low localization accuracy. Latest advances for the solution of this problem include localization approaches based solely on visual information from the CE camera. In this paper we present a novel visual localization approach based on an intelligent, artificial neural network, architecture which implements a generic visual odometry (VO) framework capable of estimating the motion of the CE in physical units. Unlike the conventional, geometric, VO approaches, the proposed one is adaptive to the geometric model of the CE used; therefore, it does not require any prior knowledge about and its intrinsic parameters. Furthermore, it exploits color as a cue to increase localization accuracy and robustness. Experiments were performed using a robotic-assisted setup providing ground truth information about the actual location of the CE. The lowest average localization error achieved is 2.70 ± 1.62 cm, which is significantly lower than the error obtained with the geometric approach. This result constitutes a promising step towards the in-vivo application of VO, which will open new horizons for accurate local treatment, including drug infusion and surgical interventions.

Introduction

Wireless capsule endoscopy (WCE) is an established modality aiming to non-invasive imaging of the whole gastrointestinal (GI) tract. Capsule endoscopes (CEs) are equipped with a color video camera, light emitting diodes (LEDs) and a wireless transmitter of images. CEs are swallowable devices, the size of a large vitamin pill. CE passively travels through GI by taking advantage of both its peristaltic motion and gravity, i.e., its motion cannot be controllable. Many research efforts that propose active, robotic CEs have appeared during the current decade. It is expected that such technology would allow for more thorough examinations, easier and much more accurate lesion localization and also drug infusion [1] in the appropriate regions. However, such devices are still at the stage of development or only research prototypes. Yet, traditional WCE has gained a lot of attention and is considered to be the first-line diagnostic tool for screening of small bowel diseases [2].

During its approx. 12 h recording of the GI tract, a CE captures tenths of thousands of images. Reviewing such large volume of recorded content requires a significant human effort, since reviewer's attention should remain undivided for about 45–90 min [3]. This allows for potential errors by transforming the experience to a tiring reading, having consequences in the diagnostic aspect of the WCE [4]. To overcome these substantial drawbacks, several methods for automatic detection of lesions have been proposed [5]. These methods typically succeed achieving lower error rates than human readers, however, they still are not capable to provide accurate localization of the CE within the GI. The latter is highly desirable for maximizing the success of any subsequent medical or surgical intervention. Nevertheless, at an effort to approximate the CE's position, external radio frequency (RF) sensor arrays, are externally mounted to the human body, so as to receive the signals transmitted by the CE and estimate the position by triangulation. This way, a coarse estimation is provided although still insufficient for practical applications, due to large localization error [6]. This approach can only provide approximate transit time estimation with respect to anatomic landmarks [5], and localization in the 2D or 3D abdominal space with respect to the abdominal quadrant. Other localization approaches include magnetic localization, magnetic resonance, ultrasound and positron emission-based imaging [6], [7].

It is, however, feasible to exploit the visual content of the images that are collected by the CE for localization purposes without the need of any other internal and/or external equipment and by using only computer vision algorithms analyzing raw WCE video frames [8]. In this paper we propose a novel, intelligent approach for estimating the location of the CE within the GI using a neural network (NN). We are able to provide visual odometry (VO), i.e., we use sequential WCE video frames to estimate the actual distance travelled by the CE within the GI. The novelties of our approach are: a) it is adaptive, i.e., it does not require any prior knowledge about the geometric model of the CE and its intrinsic parameters (e.g., focal length, optical center), thus it can be used with all commercially-available CE models; b) it exploits color information, considering both the intensity of the light reflected from the lumen tissues as emitted from the light diodes (LEDs) of the CE, and the chromatic components of the luminal images. The proposed methodology constitutes a monocular non-parametric VO approach and is based on salient image points extracted by the SIFT [9] algorithm and tracked by a scheme that comprises of the well-known KLT tracker [10] and RANSAC algorithm [11]. The proposed enhancement to this approach is based on the idea that light intensity becomes lower for points that are deeper in the lumen, while the chromatic components, as they remain practically unaffected by the variations of light intensity, enable more accurate detection of correspondences between consecutive frames, that are necessary for accurate tracking of the motion of the CE. We believe that both novelties are equivalently important since a) typically, CE manufacturers do not provide any information regarding its technical details; and b) depth and motion patterns of the CE are better conceived.

The remaining of this paper is structured as follows: Next, section 2 provides a brief overview of related work in the broader field of VO and camera calibration, focusing on WCE applications or other relevant research works applicable in the context of WCE. In section 3 we present in detail the proposed intelligent approach to VO for CEs. In section 4 we briefly review the robotic-assisted experimental setup we have adopted for the sake of the evaluation of the proposed methodology [17], a traditional, baseline geometric VO approach that requires a camera calibration step and the measurements that have been collected using the adaptive NN approach. The experimental results are then discussed in section 5 in comparison with the results of other state-of-the-art approaches. Finally, conclusions are derived and directions for future work are suggested in section 6.

Section snippets

Related work

Visual odometry is an area within the broader field of computer vision. Typically, it focuses on tasks such as robot navigation. Its goal is to allow a robot to navigate within a known or unknown environment by knowing its position and orientation, solely based on the visual information (i.e., consecutive video frames) gathered typically by one (monocular odometry) or by two (stereo odometry) cameras. In some cases, omnidirectional cameras are also used. The properties of the camera(s) used are

Method

The proposed methodology adopts a feature extraction scheme that uses Lowe's Scale Invariant Feature Transform (SIFT) features [9] which are then matched using the RANdom SAmple Consensus (RANSAC) algorithm [11] and/or tracked using the Kanade-Lucas-Tomasi (KLT) tracker [10]. In this section we present details on all aforementioned algorithms. Moreover, we present in detail the proposed intelligent and adaptive VO scheme.

Experimental setup

In the context of this work and for the evaluation of the adaptive VO approach that has been discussed in section 3.4, we opted for the robotic-assisted experimental setup originally presented in Ref. [17], which approximates a bowel and the movement of the CE within it. The reason for selecting such a setup instead of using real-life WCE videos was the need for constructing an accurate ground truth for the location of the CE. Thus, we were able to acquire accurate measurements of the distance

Discussion

We proposed and validated a novel intelligent VO approach for CE localization with an ex-vivo robotic experiment enabling the validation of CE motion estimation in physical units. The results showed that it provides an improved performance over the current geometric VO approach. The errors reported in the literature using geometric VO approaches range between 2.7 and 7.2 cm using different experimental frameworks [17], [18]. Other CE localization methods have also been proposed reporting a

Conclusions

We proposed a novel intelligent approach for VO within the GI tract, using solely visual information extracted from the video captured by the CE. The contributions of this work can be summarized as follows: a) we use pixel appearance of the SIFT key-points, i.e., both their intensity and chroma to derive depth information from the video; b) we use KLT tracking in combination with RANSAC algorithm in order to eliminate outlier feature correspondences; c) we use an MLP designed for monocular VO

Acknowledgments

We would like to thank Prof. Gastone Ciuti Eng. Federico Bianchi, Sant’Anna School of Advanced Studies, Pisa, Italy, for the acquisition of the videos of the robotic experiment and the provision of the related photographic material. We would also like to acknowledge the contribution of Dr. Alexandros Karargyris, IBM Research, USA, for the design of the experimental setup, and Prof. Ervin Toth, Department of Gastroenterology, Skåne University Hospital, Malmö. Lund University, Sweden, for

References (40)

  • A. Wang et al.

    Wireless capsule endoscopy

    Gastrointest. Endosc.

    (2013)
  • K. Hornik et al.

    Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks

    Neural Netw.

    (1990)
  • E. Spyrou et al.

    Comparative assessment of feature extraction methods for visual odometry in wireless capsule endoscopy

    Comput. Biol. Med.

    (2015)
  • A. Koulaouzidis et al.

    Wireless endoscopy in 2020: will it still be a capsule?

    World J. Gastroenterol. WJG

    (2015)
  • A. Koulaouzidis et al.

    Optimizing lesion detection in small-bowel capsule endoscopy: from present problems to future solutions

    Expert Rev. Gastroenterol. Hepatol.

    (2015)
  • Y. Zheng et al.

    Detection of lesions during capsule endoscopy: physician performance is disappointing

    Am. J. Gastroenterol.

    (2012)
  • D.K. Iakovidis et al.

    Software for enhanced video capsule endoscopy: challenges for essential progress

    Nat. Rev. Gastroenterol. Hepatol.

    (2015)
  • Y. Geng et al.

    Design, implementation, and fundamental limits of image and RF based wireless capsule endoscopy hybrid localization

    IEEE Trans. Mob. Comput.

    (2016)
  • T.D. Than et al.

    An effective localization method for robotic endoscopic capsules using multiple positron emission markers

    IEEE Trans. Robotics

    (2014)
  • T.D. Than et al.

    A review of localization systems for robotic endoscopic capsules

    IEEE Trans. Biomed. Eng.

    (2012)
  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vis.

    (2004)
  • C. Tomasi et al.

    Detection and Tracking of Point Features

    (1991)
  • M.A. Fischler et al.

    Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

    Commun. ACM

    (1981)
  • Z. Zhang

    Flexible camera calibration by viewing a plane from unknown orientations

  • J. Kannala et al.

    A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses

    IEEE Trans. pattern Anal. Mach. Intell.

    (2006)
  • C. Hu et al.

    Image distortion correction for wireless capsule endoscope

  • D.K. Iakovidis et al.

    Capsule endoscope localization based on visual features

  • E. Spyrou et al.

    Video-based measurements for wireless capsule endoscope tracking

    Meas. Sci. Technol.

    (2013)
  • D.K. Iakovidis et al.

    Robotic validation of visual odometry for wireless capsule endoscopy

  • G. Bao et al.

    A video-based speed estimation technique for localizing the wireless capsule endoscope inside gastrointestinal tract

  • Cited by (29)

    • The evolving role of small-bowel capsule endoscopy

      2023, Best Practice and Research: Clinical Gastroenterology
    • Small Bowel Capsule Endoscopy and artificial intelligence: First or second reader?

      2021, Best Practice and Research: Clinical Gastroenterology
      Citation Excerpt :

      Then, several studies were performed using visual odometry (VO), which is the process of determining the position and orientation of a machine by analysing the associated camera images. Dimas et al. [35] developed a novel adaptive AI methodology for visual odometry (VO) of SBCE. This system learns from the relationship between the coordinates of the image plane and those of the real-world and using pixel appearance (i.e., light intensity and chromatic information) for better comprehension of depth is finally able to estimate the motion and therefore localize the SBCE through the GI lumen.

    • Color masking improves classification of celiac disease in videocapsule endoscopy images

      2019, Computers in Biology and Medicine
      Citation Excerpt :

      In the current study, a means was devised to mask dim areas and extraneous substances present in the videocapsule images, to determine whether this would increase significance in distinguishing images obtained from celiac patients versus controls, and whether it would enable a fixed linear discriminant function, rather than variable and/or nonlinear discriminant functions, to be used for discernment of celiac versus control images. To remove extraneous substances from the videoclip images prior to processing, color information was quantified, as has been done in prior studies [21,22]. Two color masks were created.

    • Deep learning and conditional random fields-based depth estimation and topographical reconstruction from conventional endoscopy

      2018, Medical Image Analysis
      Citation Excerpt :

      More advanced approaches attempt to reconstruct colon surfaces with restrictive assumptions (Hong et al., 2014), and there is currently no deep learning-based approach that robustly and accurately estimates depth from a colonoscopy video. Recent work on 3D reconstruction from capsule endoscopy video has shown favorable results by using temporal information and localization of the capsule (Dimas et al., 2017b; 2017a; Karargyris and Bourbakis, 2011). Promising results have been achieved for topographical reconstructions using a structured light approach such as those presented by Schmalz et al. (2012) and Maier-Hein et al. (2013).

    View all citing articles on Scopus
    View full text