Skin Image Analysis for Erythema Migrans Detection and Automated Lyme Disease Referral

Burlina, P.; Joshi, N.; Ng, E.; Billings, S.; Rebman, A.; Aucott, J.

doi:10.1007/978-3-030-01201-4_26

P. Burlina³⁶,
N. Joshi³⁶,
E. Ng³⁸,
S. Billings³⁶,
A. Rebman³⁷ &
…
J. Aucott³⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11041))

Included in the following conference series:

1958 Accesses
5 Citations

Abstract

This study develops approaches for the automated referral of individuals with Lyme disease using erythema migrans rash (EM) images with clinical-grade or ‘in the wild’ characteristics. We develop a pre-screener using a Deep Convolutional Neural Network (DCNN) that classifies EM vs. other conditions, including either control/unaffected skin, or skin presenting with other confuser lesions. We test and report performance metrics for the proposed approach on this dataset including Cohen’s Kappa coefficient, area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, specificity. The machine classification yields accuracy (and error margin) of 93.04% (1.49), AUC of 0.9504 (0.0156), and Kappa of 0.7549 (0.0586), which is a significant improvement over previously published state-of-the-art methods. Results also suggest substantial agreement between machine and expert clinician annotated gold standard images. The DCNN model developed for this skin classifier is made publicly available and can potentially be used by others for transfer learning to other types of skin lesion classification models including those for skin cancer.

You have full access to this open access chapter, Download conference paper PDF

A deep learning system for differential diagnosis of skin diseases

Article 18 May 2020

Dermatologist-level classification of skin cancer with deep neural networks

Article 25 January 2017

Image Classification of Skin Cancer: Using Deep Learning as a Tool for Skin Self-examinations

1 Introduction

This study aims to leverage deep learning (DL) and DCNNs [1,2,3,4,5,6, 8, 19, 20] for prescreening of Lyme disease [9,10,11,12,13,14,15]. Lyme disease is the most common vector-borne disease in the United States, with over 300,000 new cases annually. Borrelia burgdorferi is the causative bacterial agent of Lyme disease, and it is transmitted through the bite of an infected tick into the skin of the affected individual. Infection progresses through three stages, advancing from skin-limited disease to disseminated disease affecting the nervous, cardiac, and rheumatologic systems. In the majority of cases, the initial skin infection is manifested by a round or oval red skin lesion called erythema migrans (EM), which is a direct result of bacterial infection of the skin and marks the first stage of Lyme disease. Treatment with oral antibiotics is highly effective in early, uncomplicated cases. Therefore, recognition of EM is crucial to early diagnosis and treatment, and ultimately, prevention of potentially devastating long-term complications.

Erythema migrans typically occurs 1 to 3 weeks after the initial tick bite and expands centrifugally by as much as a centimeter per day. Classically, the lesion will also display central clearing as it expands, leading to the hallmark bull’s-eye rash of Lyme disease. However, many individuals will not display this finding and the majority of individuals are unable to recall a tick bite, making early diagnosis challenging. EM usually persists for weeks during which its visual recognition is the primary basis for the clinical diagnosis of early Lyme disease. Following this early period, untreated EM usually disappears or progresses to disseminated disease through the spread of infection through the bloodstream. Diagnosis of early Lyme disease is usually made based on clinical signs and symptoms and history of potential exposure to ticks, due to the lack of reliable serologic blood testing early in the disease course [9, 10]. Blood tests are insensitive during the early phase of infection and are not recommended because of the high false negative rate at the time of initial EM presentation. Only 25 to 40% will have positive results during the acute phase of infection. Direct detection of bacteria in blood or biopsy samples can be performed, but are generally unavailable in non-research settings and not practical due to the time required for results [11].

The clinical diagnosis of early Lyme disease and EM is still a challenge. This is because EM may take on a variety of appearances besides the characteristic ring-within-a-ring, or bull’s-eye rash. The majority (80%) of EM lesions in the US lack the central clearing [13] of the stereotypical bull’s eye lesion and appear uniformly red or bluish red (Fig. 1). Thus, they are often mistaken for a spider bite or bruise. A small percentage (4–8%) of skin lesions have a small central blister, which may lead to the incorrect diagnosis of shingles (herpes zoster) [14]. Approximately 20% of patients have multiple skin lesions arising from the spread of infection through the bloodstream, which often have an atypical appearance. Atypical skin lesions are often misdiagnosed, which results in delayed diagnosis and treatment and increases risk of long-term complications.

Previous studies have shown that the general population does not correctly identify EM skin lesions that lack the classic bull’s-eye appearance and misidentify this condition approximately 80% of the time. As 80% of skin lesions do not have the bull’s eye appearance [15], this means that approximately 60% of all EM lesions may be misdiagnosed by patients (80% of 80%). Machine-based prescreening of skin lesions associated with Lyme disease has the potential to identify a high percentage of both typical and atypical lesions, thereby decreasing the incidence of misdiagnosis of early Lyme disease.

Prior to 2012 and the demonstration of significant improvement in object recognition performance on ImageNet via the use of DCNNs (AlexNet [4]), object classification in computer vision was largely based on applying traditional classifiers to hand-engineered image features [18]. DCNNs have replaced these approaches for both computer vision and medical imaging tasks (e.g. [1, 2]), and recently, they have been successfully used for performing a number of medical imaging diagnostics, including identifying skin cancer [12]. To the best of our knowledge, however, Lyme disease detection from skin lesions has only been addressed thus far using classical ML approaches [16].

This study aims to expand on prior state of the art with the following novel and salient contributions: (a) we develop a novel, carefully clinician-annotated dataset called Lyme1600, which includes over 1600 images with several types of fine-grained annotations for skin lesions, mostly focused on EM, but also including other confuser lesions and clear/unaffected cases; this dataset size is over two orders of magnitude larger than prior non-public datasets previously studied (such as [16] having 143 images), and (b) we develop a baseline DCNN approach that achieves a significant performance improvement over prior state of the art, and demonstrates substantial agreement with human clinician annotations. We make the DCNN model for this classifier publicly available; it can potentially be used by others for fine tuning and transfer learning for addressing classification of other types of skin affects including skin cancer lesions.

2 Methods

Problem Statement:

We pose the problem as a 2-class classification problem, classifying images into patients that have EM (Lyme disease) vs. individuals that have no skin lesions or another skin condition, including confounding skin lesions. The main confusers that are considered in this second class include cases of herpes zoster (HZ), also known as shingles. HZ was used as the principal confuser with the rationale that the main application envisioned here is a pre-screening tool, possibly implemented as a smartphone application, that could help individuals self-identify and screen lesions suspicious for Lyme disease. An acute onset rash, such as HZ, might prompt an individual to suspect Lyme disease and seek medical attention. This application is targeted towards such individuals for whom such a tool would provide a means of disambiguation.

Data

As an annotated, and publicly available dataset for the study of machine prescreening of Lyme disease and EM is not available and as there is a paucity of clinical images having the associated consent and approval required for use in this research, an image dataset was created using publicly available images extracted from the web. This strategy was motivated by a recent study [12] on skin cancer where online images were also successfully leveraged—after careful annotation—for generating DL classification models of referable skin cancers. The online images of skin lesions leveraged in this study principally include EM, herpes zoster, other non-Lyme skin lesions, and normal skin. Such images were mined from online sources, after which clinicians (J.A., A.R., and E.N.) were tasked with carefully annotating the images based on the visual appearance and the estimated size of the skin lesions. Clinicians were asked to do a whole image classification first using a high level labeling of the pathology, followed by a fine grained annotation that included the type of specific EM that was present (e.g. simple vs. diffuse). Additional curation steps included a machine-based removal of full or near duplicates, followed by human assessment for the presence of duplicates and the removal of inappropriate images. Following this, a subset of images was selected to include images with moderate to high probability of depicting EM or herpes zoster (and other confounding skin lesions). Images with a low probability of EM or HZ diagnosis were excluded from the dataset. In the end, a 2-class partitioning of those images into affected (C0) and unaffected (C1) classes was performed (Table 1).

Table 1. Class balancing and characteristic table

Full size table

DL Approach:

Recent advances in DL performance have been realized via a number of factors including the development of large labeled datasets, the availability of markedly increased computational power via graphic processing units, and various algorithmic improvements. DCNNs, used here, form feature representations at increased levels of abstraction via multiple layers of processing [1, 2] and solve discriminative problems (e.g. classification). Here, a DCNN takes a skin image as input and produces probabilities that the image belong to one of several specific classes of pathologies (EM vs. no EM here) as output. Our study uses the ResNet50 [8] DCNN architecture. ResNet was originally conceived as a means of producing deeper networks and include specific design patterns such as bottleneck and skip connections that make the output of upstream layer directly available to downstream layers. Our implementation used the Keras and TensorFlow frameworks. We used transfer learning and fine-tuned the original ResNet50 weights using the skin classification problem addressed herein. We used stochastic gradient descent with Nesterov momentum = 0.9 for training, with initial learning rate set to 1E-3. The training scheme used an early stopping approach, which terminates training after 10 epochs of no improvement of the validation accuracy. We used a categorical cross entropy loss function. Dynamic learning rate scheduling was also used, in which we multiplied the learning rate by 0.5 when the training loss did not improve for 10 epochs. A batch size of 32 was used. Data augmentation was used and included horizontal flipping, blurring, sharpening, and changes to saturation, brightness, contrast, and color balance. We are making the DCNN model, with trained weights, available at https://github.com/neil454/lyme-1600-model.

N-Fold Validation:

The datasets were further subdivided into training and testing subsets. We used a K-fold cross-validation method, with K = 5, where four folds were employed for training and one fold was used for testing (with rotation of the folds for 5 runs). One training fold was further equally subdivided into two parts, with one used for validation and stopping conditions. In sum, the train/validation/test partition distribution was 70%/10%20%, respectively.

Performance Metrics:

The performance metrics used in this study included accuracy, F1, sensitivity, specificity, PPV (Positive predicted value), NPV, (Negative predicted value) and kappa score, which discounts chance agreement [7]. Since any classifier trades off between sensitivity and specificity, to compare methods, we used ROC (receiver operating characteristic) curves, showing detection probability (sensitivity) vs. false alarm rate (100% - specificity) and AUC (area under curve) was computed.

3 Results

Results of experiments are shown for applying the above method to the data partitioned using 5-fold cross validation. Table 1 shows the class partitions, Table 2 the resulting metrics, and Fig. 2 the resulting ROC curve. Results show promising accuracy of 93.04%. The ROC curve shows that one can operate with 90% sensitivity and above while having a specificity ranging in the 75% to 85% range, a tradeoff which suggests a potential for deployment as a pre-screener. Kappa score of 0.7549 also demonstrates substantial agreement with the human-annotated gold standard.

Table 2. Performance metrics for five-fold cross validation

Full size table

4 Discussion

Data sets of EM rashes with annotation for research or teaching purposes are not currently widely available. Only one large study of EM rash characteristics in the United States from 2002 has been done. Physician review of images in that dataset reported an unexpected diversity in the appearance of EM lesions, with only 10% of lesions having the classic central clearing and ring-within-a-ring target appearance [17]. The photos of EM lesions from that study had not been analyzed further using computerized approaches. To our knowledge, only one other study of computer-assisted detection of EM has been reported in the literature [16]. That study [16] used machine learning methods including boosting, SVM, naïve Bayes, and neural nets (but not DL) applied on hand-designed image features, and was tested with a smaller dataset of 143 EM rash images. Reported accuracies ranged from 69.23% to 80.42%. These results are a testimony to the difficulty in addressing the problem of how to discern between the varied presentations of the EM lesions. By comparison, our results, performed on a much larger dataset, and images taken ‘in the wild’, show notable enhancements in performance.

Because of the lack of publicly available labeled datasets for EM ML studies, the use of photographs from online image banks was made necessary in this study in order to obtain an adequate number of images, particularly as we addressed a less common condition such as erythema migrans. In doing so, our work followed the approach of a recently published high-impact study investigating detection of skin cancer using DCNNs [12], which also exploited online images to produce a curated training dataset and corresponding model. While our dataset is still being developed with new types of confounding pathologies and lesions such as tinea corporis, our goal is to release it in the future once the study has completed procuring all examples of confusing lesions and all annotation has been done. In the meantime, we are making the classification model available online.

One limitation of the current dataset includes the fact that individuals with dark skin are underrepresented. In addition, certain characteristics inherent to online images, such as variability in viewpoint/angle, lighting, and photo resolution, made the problem more challenging. At annotation time, the inability to verify the skin lesion through inspection at different angles or magnification in order to estimate the size of the skin lesion in some cases was an issue. However, images for which there was significant ambiguity or uncertainty in diagnosis due to these factors were excluded. We were also limited in our ability to verify diagnoses through corroborating clinical and laboratory data. However, this limitation is mitigated by the fact that diagnosis of both EM and the principal confuser considered here, HZ, are primarily clinical—that is, the diagnosis of these conditions relies primarily on visual inspection and suspicion. There is no universally accepted “gold standard” diagnostic test for Lyme disease given the variable reliability of serologic testing and the impractical nature of culture identification of the organism in the clinical setting. Meanwhile, the gold standard for diagnosis of herpes zoster consists of PCR or culture detection of varicella zoster virus from skin lesions, but this is usually not performed for diagnosis given the characteristic clinical appearance and symptoms associated with the rash.

In sum, considering all of the elements above, our study was able to substantially advance the state of the art in automated Lyme prescreening with DL models that have significant promise for clinical deployment as pre-screeners. Such an application would prove to be of great utility given the challenges of diagnosing Lyme disease at an early stage when treatment is effective and can prevent the otherwise serious long-term complications associated with advanced Lyme disease. Based on our results, an application using DL is likely more sensitive than patient self-assessment and may even be more accurate than diagnosis by a general non-specialist physician, who would ordinarily serve as the screening gatekeeper for acute onset rashes such as EM. Given the frequent under-diagnosis of EM, the use of automated detection would be beneficial by increasing the number of patients who seek further medical assessment for EM rashes and minimizing the number of cases that go unevaluated and undiagnosed, with an expected positive effect on patient morbidity. Future work will involve studying multi-class problems such as also trying to separately identify the HZ and other confounding classes, which may lead to improved performance for the 2-class EM problem.

5 Conclusion

We make several contributions to automated EM and Lyme disease detection: we develop the first carefully clinician-annotated large dataset for the study of ML-based diagnostics of Lyme disease, including cases of affected, confuser, and control images. We propose a pre-screener for EM using DCNNs that shows substantial agreement with expert human clinician gold standard annotations and make this model publicly available.

References

Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Computer Vision and Pattern Recognition (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv (2014)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Computer Vision and Pattern Recognition (2015)
Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Article Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: IEEE Computer Vision and Pattern Recognition (2016)
Google Scholar
Bhate, C., Schwartz, R.: Lyme disease. Part I. Advances and perspectives. J. Am. Acad. Dermatol. 64, 619–636 (2011)
Article Google Scholar
Bhate, C., Schwartz, R.: Lyme disease. Part II. Management and prevention. J. Am. Acad. Dermatol. 64, 639–653 (2011)
Article Google Scholar
Shapiro, E.: Lyme disease. N. Engl. J. Med. 370, 1724–1731 (2014)
Article Google Scholar
Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017)
Article Google Scholar
Tibbles, C.D., Edlow, J.A.: Does this patient have Erythema Migrans? JAMA 298, 1159–1160 (2007)
Article Google Scholar
Mazori, D.R., Orme, C.M., Mir, A., Meehan, S.A., Neimann, A.L.: Vesicular erythema migrans: an atypical and easily misdiagnosed form of Lyme disease. Dermatol. Online J. 21(8) (2015)
Google Scholar
Aucott, J.N., Crowder, L.A., Yedlin, V., Kortte, K.B. : Bull’s-eye and nontarget skin lesions of Lyme disease: an internet survey of identification of erythema migrans. Dermatol. Res. Pract. 2012 (2012)
Article Google Scholar
Čuk, E., et al.: Supervised visual system for recognition of erythema migrans, an early skin manifestation of Lyme borreliosis. Strojniški vestnik-J. Mech. Eng. 60, 115–123 (2014)
Article Google Scholar
Smith, R.P., Schoen, R.T., Rahn, D.W.: Clinical characterization and treatment outcomes of early Lyme disease in patients with microbiologically confirmed erythema migrans. Ann. Inter. Med. 136, 421–428 (2002)
Article Google Scholar
Kankanahalli, S., Burlina, P.M., Wolfson, Y., Freund, D.E., Bressler, N.M.: Automated classification of severity of age-related macular degeneration from fundus photographs. Invest. Ophthalmol. Vis. Sci. 54, 1789–1796 (2013)
Article Google Scholar
Burlina, P., et al.: Comparing humans and deep learning performance for grading AMD: a study in using universal deep features and transfer learning for automated AMD analysis. Comput. Biol. Med. 82, 80–86 (2017)
Article Google Scholar
Burlina, P.M., et al.: Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol. 135, 1170–1176 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Applied Physics Laboratory, The Johns Hopkins University, Baltimore, USA
P. Burlina, N. Joshi & S. Billings
Department of Rheumatology, The Johns Hopkins University, Baltimore, USA
A. Rebman & J. Aucott
Department of Dermatology, The Johns Hopkins University, Baltimore, USA
E. Ng

Authors

P. Burlina
View author publications
You can also search for this author in PubMed Google Scholar
N. Joshi
View author publications
You can also search for this author in PubMed Google Scholar
E. Ng
View author publications
You can also search for this author in PubMed Google Scholar
S. Billings
View author publications
You can also search for this author in PubMed Google Scholar
A. Rebman
View author publications
You can also search for this author in PubMed Google Scholar
J. Aucott
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Burlina .

Editor information

Editors and Affiliations

University College London, London, UK
Danail Stoyanov
University of Leeds, Leeds, UK
Zeike Taylor
University of Rennes, Rennes, France
Duygu Sarikaya
University of Western Ontario, London, ON, Canada
Jonathan McLeod
Universitat Pompeu Fabra, Barcelona, Spain
Miguel Angel González Ballester
IBM Research, Yorktown Heights, NY, USA
Noel C.F. Codella
Sunnybrook Health Science Centre, Toronto, ON, Canada
Anne Martel
German Cancer Research Center, Heidelberg, Baden-Württemberg, Germany
Lena Maier-Hein
Johns Hopkins University, Baltimore, USA
Anand Malpani
Harvard Medical School, Boston, USA
Marco A. Zenati
University of Western Ontario, London, Canada
Sandrine De Ribaupierre
Xiamen University, Xiamen, China
Luo Xiongbiao
IRCAD, Strasbourg, France
Toby Collins
KUKA Laboratories GmbH, Augsburg, Germany
Tobias Reichl
Aachen University of Applied Sciences, Julich, Nordrhein-Westfalen, Germany
Klaus Drechsler
Fraunhofer IDM@NTU, Singapore, Singapore
Marius Erdt
Children's National Health System, Washington, D.C., DC, USA
Marius George Linguraru
Fraunhofer IGD, Darmstadt, Hessen, Germany
Cristina Oyarzun Laura
Children's National Health System, Washington, D.C., DC, USA
Raj Shekhar
Fraunhofer IGD, Darmstadt, Hessen, Germany
Stefan Wesarg
University of Central Arkansas, Conway, USA
M. Emre Celebi
Rutgers University, Piscataway, USA
Kristin Dana
Memorial Sloan Kettering Cancer Center, New York, USA
Allan Halpern

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burlina, P., Joshi, N., Ng, E., Billings, S., Rebman, A., Aucott, J. (2018). Skin Image Analysis for Erythema Migrans Detection and Automated Lyme Disease Referral. In: Stoyanov, D., et al. OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis. CARE CLIP OR 2.0 ISIC 2018 2018 2018 2018. Lecture Notes in Computer Science(), vol 11041. Springer, Cham. https://doi.org/10.1007/978-3-030-01201-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-01201-4_26
Published: 02 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01200-7
Online ISBN: 978-3-030-01201-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Skin Image Analysis for Erythema Migrans Detection and Automated Lyme Disease Referral

Abstract

Similar content being viewed by others

A deep learning system for differential diagnosis of skin diseases

Dermatologist-level classification of skin cancer with deep neural networks

Image Classification of Skin Cancer: Using Deep Learning as a Tool for Skin Self-examinations

1 Introduction

2 Methods

Problem Statement:

Data

DL Approach:

N-Fold Validation:

Performance Metrics:

3 Results

4 Discussion

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Skin Image Analysis for Erythema Migrans Detection and Automated Lyme Disease Referral

Abstract

Similar content being viewed by others

A deep learning system for differential diagnosis of skin diseases

Dermatologist-level classification of skin cancer with deep neural networks

Image Classification of Skin Cancer: Using Deep Learning as a Tool for Skin Self-examinations

1 Introduction

2 Methods

Problem Statement:

Data

DL Approach:

N-Fold Validation:

Performance Metrics:

3 Results

4 Discussion

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation