Elsevier

Computers & Geosciences

Volume 86, January 2016, Pages 120-128
Computers & Geosciences

Case study
Processing of rock core microtomography images: Using seven different machine learning algorithms

https://doi.org/10.1016/j.cageo.2015.10.013Get rights and content

Highlights

  • Testing of machine learning algorithms to process X-ray CT rock images.

  • Unsupervised, supervised, and ensemble clustering techniques were applied.

  • k-Means technique is the fastest in terms of CPU performance.

Abstract

The abilities of machine learning algorithms to process X-ray microtomographic rock images were determined. The study focused on the use of unsupervised, supervised, and ensemble clustering techniques, to segment X-ray computer microtomography rock images and to estimate the pore spaces and pore size diameters in the rocks. The unsupervised k-means technique gave the fastest processing time and the supervised least squares support vector machine technique gave the slowest processing time. Multiphase assemblages of solid phases (minerals and finely grained minerals) and the pore phase were found on visual inspection of the images. In general, the accuracy in terms of porosity values and pore size distribution was found to be strongly affected by the feature vectors selected. Relative porosity average value of 15.92±1.77% retrieved from all the seven machine learning algorithm is in very good agreement with the experimental results of 17±2%, obtained using gas pycnometer. Of the supervised techniques, the least square support vector machine technique is superior to feed forward artificial neural network because of its ability to identify a generalized pattern. In the ensemble classification techniques boosting technique converged faster compared to bragging technique. The k-means technique outperformed the fuzzy c-means and self-organized maps techniques in terms of accuracy and speed.

Introduction

Numerous researchers have recently numerically determined petrophysical properties from X-ray microtomographic images. This digital rock physics (DRP) approach using rock images has allowed physical phenomena that cannot yet be measured in the laboratory to be simulated. DRP models can be used to determine realistic distributions of multi-component fluids, such as occur during imbibition and in Haines jump mechanisms (Berg et al., 2013), and to determine effective transport properties, such as the permeability tensor (Khan et al., 2012). These capabilities, coupled with the advanced computational algorithms that are available to interpret images, visualize three-dimensional (3D) images, characterize structures, and determine physical properties from images, have allowed the numerical DRP laboratory approach to be used to study the properties of real heterogeneous geomaterials (Andrä et al., 2013a, Andrä et al., 2013b).

Several important processing steps are required to allow a virtual rock-physics laboratory approach to be used. The first step is to perform a computer tomography (CT) scan of the selected rock sample at a high spatial (and eventually also temporal) resolution. Accurate phase segmentation, which can be complicated for a strongly heterogeneous material; eventually to allow an appropriate digital rock model to be built (Fusseis et al., 2014). The segmentation problem is reduced to the need to quantify the binary solid–void phase distribution (i.e., a binarization problem) when modeling fluid transport at the pore scale. However, Leu et al. (2014) recently performed a sensitivity study in which they showed that even a small bias in the accuracy of the binarization may lead to a significant error in the calculated permeability. Binarization is an essential prerequisite of DRP studies, but there are few accurate and fast binarization algorithms that are not biased by manual (subjective) interventions by the user. Choosing an appropriate scheme to binarize an image is key to characterizing a porous space with a good degree of accuracy and therefore decreasing the magnitudes of the uncertainties involved in determining the geometries of pore networks.

In general, an X-ray CT (XCT) image, or tomogram, consists of a cubic array of reconstructed linear X-ray attenuation coefficient values (also known as pixel values) that have to be quantified by analyzing the image. Analyzing the image involves four main tasks, namely filtering the image, segmentation, classification, and interpretation or modeling. In segmentation similar pixel values are clustered in to distinct group or classes, using unsupervised learning techniques. Whereas, for classification, using set of predefined features or classes (known as training data) similar pixel values are sorted out from unknown data set (testing data) using supervised learning techniques. These tasks are not independent of each other, but the classification and interpretation tasks determine which of the many available filtering and segmentation routines should be used. The accuracy of the segmentation process clearly determines the reliability of the resulting DRP model. Advanced segmentation routines can be performed when the sinograms are modified (Jovanović et al., 2013) or segmentation can be performed using clustering analysis, which is an unsupervised classification technique, where no manually specified sample regions need to be defined, or discriminant analysis, which is a supervised classification technique (Jain et al., 1999). Cortina-Januchs et al. (2011) used a novel segmentation and classification technique based on a combination of clustering analysis and an artificial neural network (ANN). Their approach offers advantages when used on large datasets, such as those with high spatial resolutions (e.g., sub-micrometer resolutions). Three different clustering algorithms (k-means, fuzzy c-means (FCM), and self-organized maps (SOM)) were used to segment the pixels in the tomographic images into groups of similar intensities. An ANN classification routine was then used, and this routine was highly modular and flexible and efficiently recognized patterns (e.g., accurately differentiating between solids and voids). Up to 97% of the pore spaces in the soils that were tested were correctly classified from the images that were acquired.

In this paper we propose a method with some modifications and improvements compared to the ones used by Cortina-Januchs et al. (2011). The particular improvements made are that the detection (segmentation) of pore space in our method is performed using 3D greyscale intensities, and three discrete machine learning algorithms are now used for the quantitative intercomparison process. It is to be noted that ─ all the investigated methods are global, i.e. only gray scale information is processed and neighborhood information is ignored (e.g. connectivity, regularity or local gradients).

A flowchart of the method is shown in Fig. 1. A comparative case study of unsupervised learning classifiers (k-means, FCM, and SOM), supervised learning classifiers (FFANN, least square support vector machines (LS-SVMs)), and ensemble classifiers (boosting and bagging) was performed. In the case of unsupervised classification, initial centroid values, membership function, topology and distance function had to be initially set. Whereas, for the supervised classification, required the user to determine representative areas for each class in order to get a priori knowledge about the class statistics. Our goal was to identify the advanced learning scheme that was best at segmenting the pore space and most accurate at determining the porosity.

Section snippets

Rock sample

An Andesite rock sample, as shown in Fig. 2, was used in the study. The sample was collected from Tongariro National Park, New Zealand. The sample had a porphyritic texture with large plagioclase crystals (up to 3 mm in diameter), pyroxene in a cryptocrystalline matrix, and isolated vesicles up to 6 mm in diameter. X-ray diffraction analysis confirmed that the sample contained 85% plagioclase and 15% pyroxene. The sample had an average grain density of 2.75 g cm−3, measured using an AccuPyc II 1340

k-Means

The k-means clustering algorithm proposed by MacQueen (1967) is one of the simplest unsupervised learning algorithms commonly used to address clustering problems. The procedure involves dividing the dataset into clusters (k) by initializing k centroid centers and then iteratively refining the clusters as described below.

Each datapoint in the dataset is assigned to its closest centroid center. Each centroid center Cj is iteratively updated to the mean of the constituent datapoints. The algorithm

Classification and feature extraction

The intention of the classification process is to categorize every pixel in a digital image, each class of pixel being based on a specific feature. The categorized data could then be used to retrieve useful information. In this study, to compute porosity and assist in pore size distribution.

For segmentation using unsupervised techniques, a set of ten representative images were used to develop feature vector (FV). For, classification using FFANN five images out of ten were used to develop a FV

Conclusions and outlook

A unique insight into the petrophysical properties of an Andesite rock sample was gained by using XCT to scan the rock sample non-destructively, and MLA and mathematical models were then used to segment the pore spaces and solid phases. The abilities of the MLA to segment different phases were compared using qualitative visual inspections and by calculating porosities and plotting volume fraction of pore, mineral and matrix phases against each other. We found that the abilities of the

Acknowledgments

This work is partly supported by the DFG in the framework of the Excellence Initiative, Darmstadt Graduate School of Excellence Energy Science and Engineering (GSC 1070).

References (30)

  • L. Breiman

    Bagging predictors

    Mach. Learn.

    (1996)
  • R.L. Cannon et al.

    Efficient implementation of the fuzzy c-means clustering algorithms

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1986)
  • M.G. Cortina-Januchs et al.

    Detection of pore space in CT soil images using artificial neural networks

    Biogeosciences

    (2011)
  • T.G. Dietterich

    Ensemble methods in machine learning

    (2000)
  • J.C. Dunn

    A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters

    J. Cybern.

    (1973)
  • Cited by (87)

    • Predicting triaxial compressive strength of high-temperature treated rock using machine learning techniques

      2023, Journal of Rock Mechanics and Geotechnical Engineering
      Citation Excerpt :

      To overcome these limitations, more effective techniques for predicting rock strength at various temperatures and confining pressures are highly needed. As a strong alternative, the machine learning (ML) technique has recently emerged as a novel and powerful tool in engineering geology (Chauhan et al., 2016; Weidner et al., 2019; Miah et al., 2020; Wang et al., 2020a; Xie et al., 2021; Xu et al., 2021; Baghbani et al., 2022; Fathipour-Azar, 2022; Song et al., 2022; Soranzo et al., 2022; Zhang et al., 2022). Advancements in ML help achieve complex data analysis and pattern recognition from enormous multidimensional data that humans are unable to process or comprehend.

    • Applications of Computed Tomography (CT) in environmental soil and plant sciences

      2023, Soil and Tillage Research
      Citation Excerpt :

      Wang et al. (2021a) proposed that the pore structure of soil samples usually cannot be reconstructed using a single threshold, and it is important to select the appropriate thresholds for each CT image in order to ensure segmentation accuracy. Other commonly used alternatives of segmentation methods include region growing (Gerke and Karsanina, 2021), watershed algorithms (Sun et al., 2019), unsupervised learning and supervised learning (Chauhan et al., 2016) based on clustering, etc. Supervised machine learning (e.g., Neural network, Random forest, Support vector machine, Linear and logistics regression, and Classification trees) that are trained the with well labelled data to better predict outcomes of new data/images are widely adopted (Lai and Chen, 2019; Sidorenko et al., 2021; Wang et al., 2022).

    View all citing articles on Scopus
    View full text