Case studyProcessing of rock core microtomography images: Using seven different machine learning algorithms
Introduction
Numerous researchers have recently numerically determined petrophysical properties from X-ray microtomographic images. This digital rock physics (DRP) approach using rock images has allowed physical phenomena that cannot yet be measured in the laboratory to be simulated. DRP models can be used to determine realistic distributions of multi-component fluids, such as occur during imbibition and in Haines jump mechanisms (Berg et al., 2013), and to determine effective transport properties, such as the permeability tensor (Khan et al., 2012). These capabilities, coupled with the advanced computational algorithms that are available to interpret images, visualize three-dimensional (3D) images, characterize structures, and determine physical properties from images, have allowed the numerical DRP laboratory approach to be used to study the properties of real heterogeneous geomaterials (Andrä et al., 2013a, Andrä et al., 2013b).
Several important processing steps are required to allow a virtual rock-physics laboratory approach to be used. The first step is to perform a computer tomography (CT) scan of the selected rock sample at a high spatial (and eventually also temporal) resolution. Accurate phase segmentation, which can be complicated for a strongly heterogeneous material; eventually to allow an appropriate digital rock model to be built (Fusseis et al., 2014). The segmentation problem is reduced to the need to quantify the binary solid–void phase distribution (i.e., a binarization problem) when modeling fluid transport at the pore scale. However, Leu et al. (2014) recently performed a sensitivity study in which they showed that even a small bias in the accuracy of the binarization may lead to a significant error in the calculated permeability. Binarization is an essential prerequisite of DRP studies, but there are few accurate and fast binarization algorithms that are not biased by manual (subjective) interventions by the user. Choosing an appropriate scheme to binarize an image is key to characterizing a porous space with a good degree of accuracy and therefore decreasing the magnitudes of the uncertainties involved in determining the geometries of pore networks.
In general, an X-ray CT (XCT) image, or tomogram, consists of a cubic array of reconstructed linear X-ray attenuation coefficient values (also known as pixel values) that have to be quantified by analyzing the image. Analyzing the image involves four main tasks, namely filtering the image, segmentation, classification, and interpretation or modeling. In segmentation similar pixel values are clustered in to distinct group or classes, using unsupervised learning techniques. Whereas, for classification, using set of predefined features or classes (known as training data) similar pixel values are sorted out from unknown data set (testing data) using supervised learning techniques. These tasks are not independent of each other, but the classification and interpretation tasks determine which of the many available filtering and segmentation routines should be used. The accuracy of the segmentation process clearly determines the reliability of the resulting DRP model. Advanced segmentation routines can be performed when the sinograms are modified (Jovanović et al., 2013) or segmentation can be performed using clustering analysis, which is an unsupervised classification technique, where no manually specified sample regions need to be defined, or discriminant analysis, which is a supervised classification technique (Jain et al., 1999). Cortina-Januchs et al. (2011) used a novel segmentation and classification technique based on a combination of clustering analysis and an artificial neural network (ANN). Their approach offers advantages when used on large datasets, such as those with high spatial resolutions (e.g., sub-micrometer resolutions). Three different clustering algorithms (k-means, fuzzy c-means (FCM), and self-organized maps (SOM)) were used to segment the pixels in the tomographic images into groups of similar intensities. An ANN classification routine was then used, and this routine was highly modular and flexible and efficiently recognized patterns (e.g., accurately differentiating between solids and voids). Up to 97% of the pore spaces in the soils that were tested were correctly classified from the images that were acquired.
In this paper we propose a method with some modifications and improvements compared to the ones used by Cortina-Januchs et al. (2011). The particular improvements made are that the detection (segmentation) of pore space in our method is performed using 3D greyscale intensities, and three discrete machine learning algorithms are now used for the quantitative intercomparison process. It is to be noted that ─ all the investigated methods are global, i.e. only gray scale information is processed and neighborhood information is ignored (e.g. connectivity, regularity or local gradients).
A flowchart of the method is shown in Fig. 1. A comparative case study of unsupervised learning classifiers (k-means, FCM, and SOM), supervised learning classifiers (FFANN, least square support vector machines (LS-SVMs)), and ensemble classifiers (boosting and bagging) was performed. In the case of unsupervised classification, initial centroid values, membership function, topology and distance function had to be initially set. Whereas, for the supervised classification, required the user to determine representative areas for each class in order to get a priori knowledge about the class statistics. Our goal was to identify the advanced learning scheme that was best at segmenting the pore space and most accurate at determining the porosity.
Section snippets
Rock sample
An Andesite rock sample, as shown in Fig. 2, was used in the study. The sample was collected from Tongariro National Park, New Zealand. The sample had a porphyritic texture with large plagioclase crystals (up to 3 mm in diameter), pyroxene in a cryptocrystalline matrix, and isolated vesicles up to 6 mm in diameter. X-ray diffraction analysis confirmed that the sample contained 85% plagioclase and 15% pyroxene. The sample had an average grain density of 2.75 g cm−3, measured using an AccuPyc II 1340
k-Means
The k-means clustering algorithm proposed by MacQueen (1967) is one of the simplest unsupervised learning algorithms commonly used to address clustering problems. The procedure involves dividing the dataset into clusters (k) by initializing k centroid centers and then iteratively refining the clusters as described below.
Each datapoint in the dataset is assigned to its closest centroid center. Each centroid center is iteratively updated to the mean of the constituent datapoints. The algorithm
Classification and feature extraction
The intention of the classification process is to categorize every pixel in a digital image, each class of pixel being based on a specific feature. The categorized data could then be used to retrieve useful information. In this study, to compute porosity and assist in pore size distribution.
For segmentation using unsupervised techniques, a set of ten representative images were used to develop feature vector (FV). For, classification using FFANN five images out of ten were used to develop a FV
Conclusions and outlook
A unique insight into the petrophysical properties of an Andesite rock sample was gained by using XCT to scan the rock sample non-destructively, and MLA and mathematical models were then used to segment the pore spaces and solid phases. The abilities of the MLA to segment different phases were compared using qualitative visual inspections and by calculating porosities and plotting volume fraction of pore, mineral and matrix phases against each other. We found that the abilities of the
Acknowledgments
This work is partly supported by the DFG in the framework of the Excellence Initiative, Darmstadt Graduate School of Excellence Energy Science and Engineering (GSC 1070).
References (30)
- et al.
Digital rock physics benchmarks-Part I: imaging and segmentation
Comput. Geosci.
(2013) - et al.
Digital rock physics benchmarks-Part II: computing effective properties
Comput. Geosci.
(2013) - et al.
Evaluation of random forest and adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery
Remote Sens. Environ.
(2008) - et al.
Image processing with neural networks a review
Pattern Recognit.
(2002) - et al.
A brief guide to synchrotron radiation-based microtomography in (structural) geology and rock mechanics
J. Struct. Geol.
(2014) Data clustering: 50 years beyond K-means
Pattern Recognit. Lett.
(2010)- et al.
Simultaneous segmentation and beam-hardening correction in computed microtomography of rock cores
Computers & Geosciences
(2013) - et al.
Automated classification of lung bronchovascular anatomy in CT using AdaBoost
Med. Image Anal.
(2007) - et al.
Real-time 3D imaging of Haines jumps in porous media flow
Proc. Natl. Acad. Sci.
(2013) - et al.
Convergence theory for fuzzy c-means: counter examples and repairs
IEEE Trans. Syst. Man. Cybern.
(1987)
Bagging predictors
Mach. Learn.
Efficient implementation of the fuzzy c-means clustering algorithms
IEEE Trans. Pattern Anal. Mach. Intell.
Detection of pore space in CT soil images using artificial neural networks
Biogeosciences
Ensemble methods in machine learning
A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters
J. Cybern.
Cited by (87)
Neural network approach for shape-based euhedral pyrite identification in X-ray CT data with adversarial unsupervised domain adaptation
2024, Applied Computing and GeosciencesPredicting triaxial compressive strength of high-temperature treated rock using machine learning techniques
2023, Journal of Rock Mechanics and Geotechnical EngineeringCitation Excerpt :To overcome these limitations, more effective techniques for predicting rock strength at various temperatures and confining pressures are highly needed. As a strong alternative, the machine learning (ML) technique has recently emerged as a novel and powerful tool in engineering geology (Chauhan et al., 2016; Weidner et al., 2019; Miah et al., 2020; Wang et al., 2020a; Xie et al., 2021; Xu et al., 2021; Baghbani et al., 2022; Fathipour-Azar, 2022; Song et al., 2022; Soranzo et al., 2022; Zhang et al., 2022). Advancements in ML help achieve complex data analysis and pattern recognition from enormous multidimensional data that humans are unable to process or comprehend.
Applications of Computed Tomography (CT) in environmental soil and plant sciences
2023, Soil and Tillage ResearchCitation Excerpt :Wang et al. (2021a) proposed that the pore structure of soil samples usually cannot be reconstructed using a single threshold, and it is important to select the appropriate thresholds for each CT image in order to ensure segmentation accuracy. Other commonly used alternatives of segmentation methods include region growing (Gerke and Karsanina, 2021), watershed algorithms (Sun et al., 2019), unsupervised learning and supervised learning (Chauhan et al., 2016) based on clustering, etc. Supervised machine learning (e.g., Neural network, Random forest, Support vector machine, Linear and logistics regression, and Classification trees) that are trained the with well labelled data to better predict outcomes of new data/images are widely adopted (Lai and Chen, 2019; Sidorenko et al., 2021; Wang et al., 2022).