Abstract
Population imaging studies have opened new opportunities for a comprehensive characterization of disease phenotypes by providing large-scale databases. A major challenge is related to the ability to amass automatically and accurately the vast amount of data and hence to develop streamlined image analysis pipelines that are robust to the varying image quality. This requires a generic and fully unsupervised quality assessment technique. However, existing methods are designed for specific types of artefacts and cannot detect incidental unforeseen artefacts. Furthermore, they require manual annotations, which is a demanding task, prone to error, and in some cases ambiguous. In this study, we propose a generic unsupervised approach to simultaneously detect and localize the artefacts. We learn the normal image properties from a large dataset by introducing a new image representation approach based on an optimal coverage of images with the learned visual dictionary. The artefacts are then detected and localized as outliers. We tested our method on a femoral DXA dataset with 1300 scans. The sensitivity and specificity are 81.82 % and 94.12 %, respectively.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Over recent decades, large-scale databases containing prominent imaging components in addition to omics, demographics and other metadata have increasingly underpinned national population imaging cohorts [1, 7, 10], biomedical research (cf. [10]), and pharmaceutical clinical trials (cf. [6]). Large imaging databases provide new opportunities for the comprehensive characterization of the population or disease phenotype but equally pose new challenges. Collection and process of large databases require automatic streamlined image analysis pipelines that are robust to the varying quality of images. In contrast, most image analysis methods developed in the last three decades are designed for relatively small datasets where visual quality checking and pruning is still feasible, and thereby not exposed to heterogeneous image quality. On the other side, visual quality assurance in multi-centre large-scale cohorts with several imaging sequences per subject is simply infeasible. Despite efforts to enforce strict imaging protocols to ensure consistent high-quality images, various incidental artefacts are inevitable, which will have a significant impact on the analysis techniques [8]. Detecting this small fraction of inadequate data requires an enormous amount of expert manual labour or resourcing to sampling schemes which are prone not to detect all inadequate data. Hence, automatic image quality control in population imaging studies is an emerging challenge to the medical image computing community.
In the multimedia community, image quality assessment (IQA) has been explored extensively over the last two decades [5]. However, existing algorithms are not directly applicable to biomedical images mainly for three reasons. Firstly, multimedia IQA algorithms generally quantify image quality in relationship to human visual perception rather than being task-oriented, thereby are not appropriate for detecting medical image artefacts as pointed out by Xie et al. [12]. For a description of such artefacts, see for instance, Jacobson et al. [3] for dual energy X-ray absorptiometry (DXA). Secondly, in multimedia, a limited number of artefacts such as blurring, noise, and JPEG compression are normally of interest. This is in stark contrast with much more diverse and difficult to catalogue artefacts in medical imagery. Furthermore, most of the literature on imaging artefacts has been focused on their origins in the physics of image acquisition [11] or in their qualitative description (cf. [3]), which are not helpful to develop automated techniques for their detection. Albeit some research in the medical community to address specific artefact types such as eddy currents and head motion in diffusion MRI [8], or blurring and uneven illumination in dermoscopy [12] in an automatic way, they are not general purpose and cannot detect unforeseen incidental artefacts in large-scale databases. Thirdly, artefacts in multimedia are usually easier to model mathematically, which can be used to synthetically generate annotations over arbitrarily large image sets including artefacts with varying degree of severity. In contrast, manual quality annotations in large-scale medical image databases are rare and their creation would require extensive and tedious visual assessment, which is laborious and error prone. A completely unsupervised framework would, therefore, be desired.
In this work, we propose a general purpose and fully unsupervised approach to detect images with artefacts in large-scale medical image datasets. Furthermore, our approach also localizes the artefact within the corresponding image. Our work only relies on three main assumptions. Firstly, artefacts have a local nature by observing image patches of an appropriate size albeit the extent of such patches could be, in the extreme case, the full image. Secondly, the image database is large enough so as to capture the statistics of the normal images and that of the artefacts of interest. Thirdly, the incidence of artefacts in the image database should be small enough so that artefacts remain always outliers in the statistical distribution of the database. Under these assumptions, we propose a novel image coverage technique based on a visual dictionary of image patches to represent each image with a set of visual words localized in the image domain. Artefact detection is based on the similarity between each representative visual word and the corresponding raw patch from the image. The proposed method will facilitate large-scale, high-throughput quality control, and will be applicable to different imaging modalities without the need for manual annotation.
2 Proposed Method
The basic idea is to build a model representing the normal characteristics of the image population and detect the artefacts as deviations from this model. Based on the assumptions above, the proposed method works on image patches and comprises three main constituents: a robust learning of a dictionary of patches, an optimal coverage of the images with the dictionary words, and an assessment of the similarity between each covered patch and the corresponding dictionary word. This assessment allows us to detect outliers, identifying both images with artefacts and their locations in the image.
The use of a visual dictionary has been proposed for image quality assessment in [13] and explored later as the so-called bag of visual words (BOVW) representation. In the BOVW approach, each image is modelled as a distribution over the visual words by normalizing the histogram of occurrence counts for each visual word. In this study, however, we propose a novel image representation with a subset of visual words that provides an optimal image coverage, which is a new concept and differs from the typical usage of visual codebooks. The proposed optimal image coverage allows the visual words to be positioned optimally without prefixed positions. This reduces the number of required words for a good representation, which maximizes the generalizability of the model. Artefact detection is based on the similarity between the visual words and the corresponding image patches (Fig. 1). An image patch with a small dissimilarity score can be interpreted as artefact-free only if the dictionary has already excluded all words with artefacts. Hence, being robust to outliers is crucial in the dictionary learning step and thereby common clustering algorithms like k-means are not appropriate. We propose a robust dictionary learning based on the fixed-width clustering algorithm, introduced for outlier detection in network systems [2].
2.1 Robust Dictionary Learning
The objective is to learn a Dictionary \(\mathcal {W} = \lbrace w^1, \cdots , w^N\rbrace \) with N visual words from a large pool of patches, capturing the normal shape and appearance variations in the image class while excluding outlier patches. An outlier patch is expected to lie in a sparse region of the feature space, i.e. raw intensity values here, having few or no neighbours within a typical distance r. Observe that outlier patches detected in this step cannot be used directly to identify image artefacts. Since images are not coregistered and patches are extracted from fixed locations, some proportion of outliers will be due to misalignments not necessarily representing an image artefact.
The proposed robust dictionary learning is as follows. Each image is divided into overlapping square patches of size k for 2D images, or cubic patches for 3D images, with an overlap of size \(k^\prime \) between neighbouring patches. The fixed-width clustering algorithm is then applied as follows. All the patches are shuffled randomly and the first patch would be the first visual word. For all the subsequent patches, the Euclidean distance of the patch to each visual word is computed. If a distance is less than r, then the patch is added to the corresponding cluster and its centroid is recalculated with the average of all members. Otherwise, the patch is added as the centroid of a new cluster. Observe that each patch may contribute to more than one cluster. Finally, clusters with only one member are considered as outliers and removed from the dictionary.
2.2 Optimal Image Coverage and Word Selection
A coverage of an image I is a selection of visual words placed at different locations in the image so that all pixels are covered at least once. Let us consider that the image I has P pixels and each visual word can be deployed at L locations indexed by \(\ell \in [1,L]\), where \(L \le P\) depends on the resolution with which the image is swept. The binary mask \(m_\ell \) represents the word location \(\ell \) in the image, \(d_\ell ^n\) denotes the word \(w^n\) placed at location \(\ell \) with appropriate zero-padding, and the binary variable \(z_\ell ^n\) encodes whether \(d_\ell ^n\) is selected to cover the image or not. Thus, the binary vector \(\mathbf {z}= \left( z_1^1, \cdots , z_L^N \right) \) would represent a coverage of the image if
where the left-hand side is an integer vector counting how many times each pixel is covered in the image.
We define the image coverage error as the \(L_2\)-norm of the difference between each selected visual word and the corresponding image patch,
Here, \(m_\ell \circ I\) denotes the component-wise product between the binary mask \(m_\ell \) and the image I. The optimal image coverage is defined by the minimization of the coverage error, \(\mathbf {z}={{\mathrm{argmin}}}_\mathbf {z}E\), subject to the constraint in Eq. 1.
Let us denote by \(z_\ell ^* = \sum _n z_\ell ^n\), the number of visual words placed at location \(\ell \). If two words, \(w^{n_1}\) and \(w^{n_2}\), are used at the same location \(\ell \) (\(z_\ell ^{n_1}=z_\ell ^{n_2}=1\)), then the coverage error will be always larger than using just one of them, without any effect on the constraint. Hence, the optimal solution will place at each location \(\ell \) either one single visual word (\(z_\ell ^*=1\)) or none (\(z_\ell ^*=0\)). Therefore, the optimization can be split into two independent problems. First, for each \(\ell \), the locally optimal visual word \(w^{n(\ell )}\) is selected by minimizing the local error,
Then, the optimal locations, \(\mathbf {z}^* = (z^*_1, \cdots , z^*_L)\), are selected by minimizing the total coverage error,
Equation 4 can be efficiently solved using linear integer programming packages such as Matlab optimization toolbox (Mathworks Inc., Cambridge, MA).
2.3 Artefact Detection
For a given image, a dissimilarity score is computed between each representative visual word and its corresponding raw patch. Any image patch with an associated score above an optimal threshold will identify the location of an artefact in the given image. Observe that since matching of the words is local and the best fitting locations are found after an optimal coverage without forcing a priori known locations, images do not need to be previously registered.
Dissimilarity score: The local properties of an image can be described by the set of its derivatives, which is named as local jet [9]. For a given image I and a scale \(\sigma \), the local jet of order N at point \(\mathbf {x}\) is defined as
where the \(n^{th}\)-order derivative tensors are computed by the convolution
with the corresponding derivatives of the Gaussian kernel \(G^{(\sigma )}_{i_1, i_2, \cdots , i_n}(\mathbf {x})\), and \(i_k = 1,\ldots , D\), for D-dimensional images. For each derivative order, a complete set of independent invariants under orthogonal transformations can be extracted [9]. For 2D images and second order, for example, this complete set is formed by the intensity, the magnitude of gradients \(\sqrt{\sum _i L_i^2}\), the Laplacian \(\sum _i L_{ii}\), the Hessian norm \(\sum _{i,j} L_{ij} L_{ji}\), and the second derivative along the gradient \(\sum _{i,j} L_i L_{ij} L_j\). Multiresolution description can be obtained by changing the scale \(\sigma \) in a convenient set of scale-space representations. For each invariant feature, the Euclidean distance between the visual word and the corresponding image patch is used as the dissimilarity metric.
Optimum threshold: The optimum threshold for each dissimilarity score is computed as follows. For each image in the database, the maximum score among all the representative visual words is computed. The optimum threshold is selected as \(q_3+ \nu *(q_3-q_1)\), where \(\nu = 1.5\), and \(q_1\) and \(q_3\) are the first and third quartiles, respectively. An image is then artefact-free only if all the representative visual words have a dissimilarity score below the optimum threshold with respect to all the considered features.
3 Experiments and Results
Data specifications: We have tested the proposed algorithm on a dataset of hip DXA scans collected using a Hologic QDR4500 Acclaim densitometer (Hologic Inc., Bedford, MA, USA) during a previous pharmaceutical clinical trial [6]. DXA is the standard imaging technique for bone densitometry. The current software packages for DXA analysis requires manual interaction and thereby each scan should be analysed individually. Therefore, quality control is assumed to be done online during the analysis step almost accurately. Unlike the common perception, a survey of members of the International Society for Clinical Densitometry (ISCD) indicated that errors in DXA acquisition are not rare [4]. Although qualitative description of DXA artefacts is available [3], no quantitative classification nor manually-crafted features has been introduced so far. Thus, no prior knowledge of the types of artefacts can be assumed in DXA modality. This makes automatic DXA quality control an interesting challenge. A population cohort of 5592 women was included in the clinical trial [6], from which, 1300 scans were already available.
Experimental set-up: To evaluate the method, 300 scans were randomly selected for manual annotation. We learned the visual dictionary and the set of optimum thresholds on the remaining 1000 scans. Observe that the learning step is also unsupervised, thereby we could do the training based on the whole dataset and then validate the method based on a proportion of the dataset. However, to make a more rigorous evaluation, we split the dataset into training and test sets.
Performance measures: Sensitivity and specificity of the method are reported on the test data based on a priori manual annotation. The sensitivity is defined as the proportion of images with artefacts that are detected correctly by the algorithm. The specificity is defined as the proportion of normal images that are correctly labelled as artefact-free. The localization capacity of the algorithm is measured in term of the number of patches that localize an artefact correctly divided by the number of all detected patches on those images that are truly detected as artefacts. We will refer to it as the localisation rate.
Parameter selection: The patch size and the radius r are two parameters for the proposed method. Both parameters would be data dependent. The radius r is automatically selected estimating a typical small distance between patches in the same image: For each image, all pairwise distances between the patches comprising the image are computed. Next, \(\frac{1}{n}\)-quantile of these distances are computed per image, where n is the total number of patches extracted from each image. Then, the parameter r is selected as the median of the computed quantiles in the image dataset. The patch size could be estimated based on the size of the effect that is measured. For example, in femoral DXA bones, the diameter of the femoral stem is approximately 64 pixels. We have tested the results with patches of size 32 and 64 with 8 pixel overlap. No differences were observed in the sensitivity. We presented the results with patches of size 64. Thus, the total number of 24830 patches were extracted from 1000 images. The radius \(r = 3.5\) is estimated for this dataset. The obtained dictionary contained 1146 visual words.
We tested invariant features up to the second order. However, the second order features did not provide any new information. Hence, only intensity and gradient magnitude are finally used as features. The gradient magnitude for each image patch or visual word is normalized to have Euclidean norm one. Single scale analysis with \(\sigma = 0.2\) was used. Optimum thresholds are derived as 0.37 and 4.86 for the gradient magnitude and intensity, respectively.
Results: Eleven images out of 300 are manually annotated as artefacts. Nine out of eleven are detected using the proposed algorithm. Sensitivity and specificity are 81.82 % and 94.12 %, respectively. The localization rate is \(96\,\%\). Figure 2 shows normal images and artefacts. Only 2 out of 11 image artefacts are misclassified as normal. These two scans are characterized as movement artefacts that cause subtle vertical displacement in the image. However, the algorithm managed to successfully localize other types of artefacts including the existence of an external object (key-shape object in Fig. 2).
4 Conclusion
In this paper, we proposed a completely unsupervised and generic framework to address automatic quality control in a large cohort of medical image dataset. Based on the assumption that artefacts constitute a small proportion of the dataset, a dictionary-based framework based on an optimal coverage of images was introduced to detect and localize image artefacts as outliers to the normal image class. The method computational complexity is linear in the number of input images, providing good scalability to large datasets. We have tested the method on 1300 femoral DXA scans and reported good sensitivity and specificity on the dataset.
References
Bamberg, F., Kauczor, H.U., Weckbach, S., Schlett, C.L., Forsting, M., Ladd, S.C., Greiser, K.H., Weber, M.A., Schulz-Menger, J., Niendorf, T., et al.: Whole-body MR imaging in the german national cohort: rationale, design, and technical background. Radiology 277(1), 206–220 (2015)
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection. In: Barbará, D., Jajodia, S. (eds.) Applications of Data Mining in Computer Security. AISC, pp. 77–101. Springer, New York (2002)
Jacobson, J.A., Jamadar, D.A., Hayes, C.W.: Dual X-ray absorptiometry: recognizing image artifacts and pathology. Am. J. Roentgenol. 174(6), 1699–1705 (2000)
Lewiecki, E.M., Binkley, N., Petak, S.M.: DXA quality matters. J. Clin. Densitometry 9(4), 388–392 (2006)
Manap, R.A., Shao, L.: Non-distortion-specific no-reference image quality assessment: a survey. Inf. Sci. 301, 141–160 (2015)
McCloskey, E.V., Beneton, M., Charlesworth, D., Kayan, K., de Takats, D., Dey, A., Orgee, J., Ashford, R., Forster, M., Cliffe, J., et al.: Clodronate reduces the incidence of fractures in community-dwelling elderly women unselected for osteoporosis: results of a double-blind, placebo-controlled randomized study. J. Bone Miner. Res. 22(1), 135–141 (2007)
Petersen, S.E., Matthews, P.M., Bamberg, F., Bluemke, D.A., Francis, J.M., Friedrich, M.G., Leeson, P., Nagel, E., Plein, S., Rademakers, F.E., et al.: Imaging in population science: cardiovascular magnetic resonance in 100,000 participants of UK Biobank-rationale, challenges and approaches. J. Cardiovasc. Magn. Reson. 15(1), 46 (2013)
Roalf, D.R., Quarmley, M., Elliott, M.A., Satterthwaite, T.D., Vandekar, S.N., Ruparel, K., Gennatas, E.D., Calkins, M.E., Moore, T.M., Hopson, R., et al.: The impact of quality assurance assessment on diffusion tensor imaging outcomes in a large-scale population-based cohort. NeuroImage 125, 903–919 (2016)
Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 19(5), 530–534 (1997)
Weiner, M.W., Veitch, D.P., Aisen, P.S., Beckett, L.A., Cairns, N.J., Cedarbaum, J., Donohue, M.C., Green, R.C., Harvey, D., Jack, C.R., et al.: Impact of the alzheimer’s disease neuroimaging initiative, 2004 to 2014. Alzheimer’s & Dementia 11(7), 865–884 (2015)
Willis, C.E., Weiser, J.C., Leckie, R.G., Romlein, J.R., Norton, G.S.: Optimization and quality control of computed radiography. In: Medical Imaging 1994, pp. 178–185. International Society for Optics and Photonics (1994)
Xie, F., Lu, Y., Bovik, A.C., Jiang, Z., Meng, R.: Application-driven no-reference quality assessment for dermoscopy images with multiple distortions. IEEE Trans. Biomed. Eng. 63(6), 1248–1256 (2016). IEEE
Ye, P., Doermann, D.: No-reference image quality assessment based on visual codebook. In: 18th IEEE International Conference on Image Processing (ICIP), pp. 3089–3092. IEEE (2011)
Acknowledgement
M Farzi was funded through a PhD Fellowship from the United Kingdom Medical Research Council-Arthritis Research-UK Centre for Integrated research into Musculoskeletal Ageing (CIMA).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Farzi, M., Pozo, J.M., McCloskey, E.V., Wilkinson, J.M., Frangi, A.F. (2016). Automatic Quality Control for Population Imaging: A Generic Unsupervised Approach. In: Ourselin, S., Joskowicz, L., Sabuncu, M., Unal, G., Wells, W. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science(), vol 9901. Springer, Cham. https://doi.org/10.1007/978-3-319-46723-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-46723-8_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46722-1
Online ISBN: 978-3-319-46723-8
eBook Packages: Computer ScienceComputer Science (R0)