Logarithmic simulated annealing for X-ray diagnosis

https://doi.org/10.1016/S0933-3657(00)00112-3Get rights and content

Abstract

We present a new stochastic learning algorithm and first results of computational experiments on fragments of liver CT images. The algorithm is designed to compute a depth-three threshold circuit, where the first layer is calculated by an extension of the Perceptron algorithm by a special type of simulated annealing. The fragments of CT images are of size 119×119 with eight bit grey levels. From 348 positive (focal liver tumours) and 348 negative examples a number of hypotheses of the type w1x1+⋯+wnxnϑ were calculated for n=14161. The threshold functions at levels two and three were determined by computational experiments. The circuit was tested on various sets of 50+50 additional positive and negative examples. For depth-three circuits, we obtained a correct classification of about 97%. The input to the algorithm is derived from the DICOM standard representation of CT images. The simulated annealing procedure employs a logarithmic cooling schedule c(k)=Γ/ln(k+2), where Γ is a parameter that depends on the underlying configuration space. In our experiments, the parameter Γ is chosen according to estimations of the maximum escape depth from local minima of the associated energy landscape.

Introduction

Since the seminal paper by Asada et al. [5], there has been a rapidly growing interest in new, unconventional types of medical knowledge-based systems which are designed as artificial neural networks, trained by examples (“positive” and “negative”) related to a specific diagnostic problem. So far, the research has been concentrating on digital X-ray-based medical diagnosis [13], [15], [22], [33], [34], [35], although there are applications in different medical branches, for instance, in electrocardiographic measurement and clinical laboratories, see [12], [24].

In [23], the detection of microcalcifications by neural networks was studied. After training a total number of almost 3000 examples, a classification rate of ≈88% was achieved on several 100 test examples.

The paper [30] introduces the assignment of fractal dimensions to tumour structures. The fractal dimensions are assigned to contours which have been extracted by commonly used filtering operations. In fact, these contours represent polygonal structures within a binary image. For example, the fractal dimensions D1=1.13 and D2=1.40 are assigned to the boundary and the interior, respectively, of a glioblastoma.

A high classification rate of nearly 98% is reported in [26], where the Wisconsin breast cancer diagnosis (WBCD) database of 683 cases is taken for learning and testing. The approach is based on feature extraction from image data and uses nine visually assessed characteristics for learning and testing. Among the characteristics are the uniformity of cell size, the uniformity of cell shape, and the clump thickness.

In the present paper, we utilize an extension of the Perceptron algorithm by a simulated annealing-based search strategy [11], [21] for the automated detection of focal liver tumours. The only input to the algorithm are the image data without any preprocessing. Since focal liver tumour detection is not part of screening procedures like the detection of microcalcifications [16], [19], [23], [26], [28], a certain effort is required to collect the image material. To our knowledge, results on neural network applications to focal liver tumour detection are not available in the literature. Therefore, we could not include comparisons to related, previous work in our paper.

During the last decade, research on the classical Perceptron algorithm has been revitalized by a number of papers, see, e.g. [6], [7], [9], [14], [17], [31]. The research on this type of classification algorithms has a long history and goes along with the efforts to find fast and reliable algorithms that solve systems of linear inequalities lj(z)=aj·z+bj≥0, j=1,,m. Agmon [2] proposed in 1954 a simple iteration procedure that starts with an arbitrary initial vector z0. When zi does not represent a solution of the system, then zi+1 is taken as the orthogonal projection of the farthest hyperplane which corresponds to a violated linear inequality: zi+1zi+t·aj0, where t=−lj0(zi)/∣aj02 and aj0 maximizes −lj(zi)/∣aj2 among the violated lj(zi).

Basically the same method is known as the classical Perceptron algorithm [29]. If the set of points can be separated by a linear function, the following convergence property can be proved for the Perceptron algorithm [25]: let S denote the set of positive and negative input vectors and w be a unit vector solution to the separation problem, i.e. w·x>0 for all [x,+]∈S and w·x<0 for all [x,−]∈S. Then the Perceptron algorithm converges in at most 1/σ2 iterations, where σ≔min[x,η]∈Sw*·x, η∈{+,−}. The parameter σ has the interpretation of cos(w,x) for the angle between w and x and the value of σ can be exponentially small in terms of the dimension n.

But in general, the much simpler Perceptron algorithm performs well even if the sample set is not consistent with any weight vector w of linear threshold functions (see, e.g. [19], [33]). When the sample set is linearly separable, Baum [6] has proved that under modest assumptions it is likely that the Perceptron algorithm will find a highly accurate approximation of a solution vector w in polynomial time.

Variants of the Perceptron algorithm on sample sets that are inconsistent with linear separation are presented in [7], [8], [9], [14]. For example, if the (average) inconsistency with linear separation is small relative to σ, then with high probability the Perceptron algorithm will achieve a good classification of samples in polynomial time [8], [9].

Our simulated annealing procedure employs a logarithmic cooling schedule c(k)=Γ/ln(k+2), i.e. the “temperature” decreases at each step. With the modified Perceptron algorithm, we performed computational experiments on fragments of liver CT images. The fragments are of size 119×119 with eight bit grey levels. From 348 positive (with focal liver tumours) and 348 negative examples we calculated independently s=5,17 hypotheses of the type THF=w1x1+⋯+wnxnϑ for n=14161. Then, we performed tests on various sets of 50 positive and negative examples, respectively, that were not presented to the algorithm in the learning phase. The test was performed on threshold circuits of depth-two and three, where in both cases the first layer consists of functions THF. For depth-two and 11 functions THF we obtained ≈91% correct classification. For depth-three circuits with three subcircuits of depth-two we achieved about 97% correct classification on the different sets of 50+50 test examples.

The input to our algorithm was derived from the DICOM standard representation of CT images [20].

The choice of the crucial parameter Γ is based on estimations of the maximum escape depth from local minima of the associated energy landscape. The estimations of Γ were obtained by preliminary computational experiments on CT images. We used this method before in [32] where logarithmic simulated annealing was applied to job shop scheduling.

Section snippets

Basic definitions

The simulated annealing-based extension comes into play when the number of misclassified examples for the new hypothesis is larger than that for the previous one. If this is the case, a random decision is made according to the rules of simulated annealing procedures. When the new hypothesis is rejected, a random choice is made among the misclassified examples for the calculation of the next hypothesis.

To describe our extension of the Perceptron algorithm in more detail, we have to define the

The logarithmic cooling schedule

We are focusing on a special type of inhomogeneous Markov chain where the value c(k) changes in accordance withc(k)=Γln(k+2),k=0,1,….

The choice of c(k) is motivated by Hajek’s Theorem [18] on logarithmic cooling schedules for inhomogeneous Markov chains. To explain Hajek’s result, we first need to introduce some parameters characterizing local minima of the objective function:

Definition 1

A configuration f′∈F is said to be reachable at height h from f∈F, (if)∃f0,f1,…, frF(f0=f∧fr=f′), such that G[fu,fu+1

Computational experiments

In most applications, simulated annealing-based heuristics are designed for homogeneous Markov chains, where the convergence to the Boltzmann distribution at fixed temperatures is important for the performance of the algorithm, see [1]. We utilized the general framework of inhomogeneous Markov chains described in [4] for the design of a pattern classification heuristic. In particular, we paid attention to the choice of the parameter Γ which is crucial to the quality of solutions as well as to

Concluding remarks

We performed computational experiments with an extension of the Perceptron algorithm by a simulated annealing-based heuristic that employs the logarithmic cooling schedule c(k)=Γ/ln(k+2), where Γ is a parameter of the underlying configuration space. The experiments were performedon on fragments of liver CT images. The image data are the only input to the algorithm, i.e. no feature extraction or preprocessing is performed. The fragments are of size 119×119 with 8 bit grey levels. From 348

Acknowledgements

The authors would like to thank Eike Hein and Daniela Melzer for preparing the image material. The research has been partially supported by the Strategic Research Programme at The Chinese University of Hong Kong under Grant No. SRP 9505, by a Hong Kong Government RGC Earmarked Grant Ref. No. CUHK 4010/98E, and by the AIF Research Programme under Grant No. FKV 0352401N7.

References (35)

  • E.B. Baum

    The perceptron algorithm is fast for nonmalicious distributions

    Neural Comput.

    (1990)
  • A. Blum et al.

    A polynomial-time algorithm for learning noisy linear threshold functions

    Algorithmica

    (1998)
  • Bylander T, Learning linear threshold functions in the presence of classification noise. In: Proceedings of the 7th ACM...
  • T. Bylander

    Learning linear threshold approximations using perceptrons

    Neural Comput.

    (1995)
  • V. Černy

    A thermodynamical approach to the travelling salesman problem: an efficient simulation algorithm. Institute of Physics and Biophysics, Comenius University, Bratislava, 1982

    J. Optim. Theory Appl.

    (1985)
  • W.K. Chan et al.

    An expert system for the detection of cervical cancer cells using knowledge-based image analyser

    Artif. Intell. Med.

    (1996)
  • Clarke LP, Computer assisted-diagnosis: advanced adaptive filters, wavelets and neural networks for image compression,...
  • Cited by (3)

    • Bounded-depth threshold circuits for computer-assisted CT image classification

      2002, Artificial Intelligence in Medicine
      Citation Excerpt :

      Due to the relatively small number of learning examples in each of the classes Cl5l, l=1,…, 5, the run-time to compute even a depth-six circuit is very short (see Table 1). When the pre-processing steps are not taken into account, the underlying circuit structure from Table 2 for depth-five corresponds to depth-three circuits from [2]. Although the test is now performed on 200 images, which is twice as large as the number of test examples in [2], the classification rate for m=3 is comparable to the 97% from [2] even for the relatively small number of k=7.

    • Medical staff scheduling using simulated annealing

      2015, Quality Innovation Prosperity
    1

    On leave from IBM T.J. Watson Research Center, P.O. Box 210, Yorktown Heights, NY, USA.

    View full text