Forming a new small sample deep learning model to predict total organic carbon content by combining unsupervised learning with semisupervised learning
Introduction
Deep learning is a method based on representational data learning in machine learning [1], [2], [3] and constitutes a new field in machine learning research. Deep learning combines stratigraphic features to form more abstract, high-level representation attribute categories or features to discover distributed feature representations of data. The motivation behind deep learning, which mimics mechanisms of the human brain to interpret data such as images, sound, and text, is to build and simulate a neural network for human brain analysis and learning. The most powerful aspect of deep learning is that its network structure is composed of multiple hidden layers, which are well suited for extracting abstract and hierarchical features that can aid in prediction. The deep feature extraction ability of deep learning algorithms is far superior to other types of algorithms and has great potential for improving accuracy [4], [5]. Rather than using the original input, using features as the input can reduce the errors caused by data redundancy and random noise, improve the robustness of the model, improve the correlation between the input and output and improve the accuracy of the model. Classical deep learning prediction models include supervised models and semisupervised models. The most commonly used supervised model is the convolutional neural network (CNN) and its variant algorithm [6], The most commonly used semisupervised models are deep Boltzmann machines (DBMs) [7], deep belief networks (DBNs) [8], and stacked sparse autoencoders [9]. Among these models, CNNs use a large number of samples to adjust the model parameters, and the network architecture of CNNs is suitable for image recognition. DBMs, DBNs, and autoencoders (AEs) use a large amount of unlabeled data for prelearning and then use a certain amount of labeled data to fine-tune the model parameters. Considering the current big data problem, on the basis of complete training, a higher number of network layers is generally accepted to lead to higher prediction accuracy. Notably, a deep residual network using 152 layers was previously reported to significantly improve image recognition accuracy [10].
Although deep learning has achieved great success in big data, current mainstream deep learning models are not effective for small sample problems. Current deep learning is a data-hungry technology. There are many problems in the real world, including in the medical field, oil and gas exploration, and security, that are associated with an insufficient amount of labeled data (although the amount of unlabeled data in these areas is large). The prediction problem in the above fields is called a small sample problem. The application of deep learning in the above fields currently remains under exploration. Although a few models have been proposed, these models are essentially applicable only for classification; and in fact, no real small sample fields have been added, and these models are still used in fields such as image and speech recognition [11], [12], [13]. In particular, in the petroleum exploration field, deep learning applications have been rarely reported; thus, deep learning remains in its infancy.
In this paper, a small sample deep learning algorithm is developed through the small sample well logging interpretation problem. Oil exploration is aimed at rocks that are several kilometers underground, and the deep subsurface cannot be directly explored. However, taking rocks from a few kilometers deep of formation and experimenting is very costly. Therefore, the well logging technique is used to measure the physical properties of underground rock from the ground, and the well logging curve is calibrated by taking a small number of rocks from several exploration wells. Finally, the well logging technique is used to evaluate the rock parameters along the entire depth of the wells. Well logging is a very cost-effective approach and is currently the mainstream practice.
With the continuous exploration and development of conventional reservoirs, the number of reserves is decreasing, and thus, the exploration and development of shale gas remain important [14], [15], [16]. The total organic carbon (TOC) content reflects the hydrocarbon potential of shale reservoirs. Accurately obtaining the TOC content of a shale reservoir can better guide reservoir fracturing [17], [18], [19], [20]. Therefore, evaluation of the TOC content is of great significance for the exploration of shale gas reservoirs and requires further study. In this paper, we use TOC content predictions as an example to study the small sample deep learning algorithm.
Two-dimensional and three-dimensional logging techniques, such as electrical imaging logging, ultrasonic imaging logging, and nuclear magnetic resonance logging, have prohibitively high costs and result in fewer measurements; consequently, research that calculates TOC using conventional logging is meaningful. Many methods exist for predicting reservoir TOC using conventional curves. Through an investigation, the author divided these methods into 3 categories: a fitting relationship was established using a single log curve (such as the density log curve or the natural gamma log curve) to predict TOC [21], [22], [23], [24]; the TOC of the reservoir was predicted by the difference between the acoustic curve response and the resistivity curve response [25], [26], [27], [28], [29], and a machine learning algorithm was used to determine the relationship between the TOC and the responses of various conventional log curves and to perform the prediction. Among these 3 methods, the use of machine learning methods to predict TOC is an attempt to determine the functional relationship between various individual curves and TOC, thereby providing the greatest increase in model accuracy. We discuss TOC prediction using exclusively machine learning algorithms.
Many scholars have studied TOC prediction using machine learning methods. For example, Huang and Williamson [30] proposed the use of a back propagation neural network (BPNN) to evaluate conventional log curve responses and TOC approximations and determined the relationship between conventional log curves and the TOC to evaluate reservoir TOC. Kadkhodaie et al. [31] proposed the use of fuzzy neural network technology and a genetic algorithm to evaluate the TOC of a reservoir. Khoshnoodkia [32] combined the TOC values calculated by a fuzzy neural network technique with the TOC values calculated by the logR method. Sfidari et al. [33] combined hierarchical clustering with an artificial neural network and used TOC for calculations; the combined algorithm showed better results than artificial neural networks and improved the TOC prediction accuracy. Tan et al. [34] introduced a method for calculating TOC of shale reservoirs using a radial basis function neural network (RBFNN) algorithm. Then, Tan et al. [35] studied whether support vector regression (SVR) techniques are helpful for TOC prediction problems. In [36], Ouadfeul and Aliouane proposed a multilayer perceptron neural network with the Levenberg Marquardt training algorithm for calculating the TOC of shale gas reservoirs. Shi et al. [37] introduced the advantages of extreme learning machine (ELM) algorithms for TOC prediction problems and compared these algorithms with other algorithms such as artificial neural networks. Zhu et al. [38] proposed a technique to predict the TOC of source rocks based on an improved rainforest fuzzy neural network model; the prediction results showed that their method predicted the TOC better than a BP neural network algorithm. In 2017, Liu et al. proposed a method for predicting the TOC of shale oil and gas reservoirs using the discrete process neural network [39] and developed a method for predicting the TOC using a ridgelet process based on a neural network [40]. Mahmoud et al. [41] used artificial neural networks to successfully improve the TOC prediction ability. Yu et al. [42] proposed a method to evaluate the TOC using Gaussian process regression (GPR). Tahmasebi et al. [43] proposed a hybrid machine learning algorithm method to evaluate the TOC; by evaluating the TOC, it was considered that the prediction effect is improved when a suitable kernel function is selected. In addition to the above methods, many other methods have incorporated various machine learning algorithms to predict the reservoir TOC. Nevertheless, the study of deep neural networks for logging interpretation is still in its infancy, and no relevant studies have been reported. We simply evaluated the reliability of deep learning when calculating reservoir permeability [44].
Earlier, we discussed the fact that deep learning has greater potential for shallow learning and may aid in solving small sample problems, while traditional deep learning methods cannot be directly used because these models are not small sample deep learning algorithms. Therefore, it is necessary to study small sample deep networks. Based on the above reasoning, the present study combines the characteristics of small sample problems and proposes for the first time a small sample integrated deep learning algorithm that combines unsupervised learning with semisupervised learning (including the unsupervised learning stage and supervised learning stage). The novel algorithm accurately applies the characteristics of unsupervised feature extraction, semisupervised multihidden layer neural network training and integrated learning to enhance network diversity. When combined with the actual needs of small sample problems, the algorithm is specifically improved. From the perspective of sample usage, the algorithm makes full use of unlabeled samples and labeled samples, allowing the algorithm to solve small sample problems. This algorithm is used for the quantitative prediction of TOC, resulting in a deep learning algorithm that can also be used for well logging interpretation problems, and improved results are obtained. The results of this algorithm prove that small samples and deep learning are not opposing concepts.
The remainder of this article is organized as follows. Section 2 introduces the background and model structure of the stacked extreme learning machine sparse autoencoder (SELM-SAE) algorithm, the background and model structure of the DBM algorithm, and the integration of the models based on the integrated algorithms. Section 3 introduces the sources and limits of the data used in this study, the model parameter determination, and a comparison of the prediction performance among existing methods. A detailed discussion of the results is provided in Section 4. Section 5 summarizes the essential contributions of this study and presents several concluding observations.
Section snippets
Extreme learning machine sparse autoencoder (ELM-SAE)
The ELM is a rapid supervised learning algorithm that was proposed by Huang Guangbin in 2004 [45]. Since the introduction of this algorithm, it has received a great deal of attention and has been rapidly developed. The ELM-SAE, proposed in 2016 [46], is a feature extraction algorithm similar to the restricted Boltzmann machine (RBM) and the autoencoder (AE). This algorithm is based on the 2013 ELM-AE [47], which enables the unsupervised training of models using unlabeled samples. According to
Data sets and Quantitative Metrics Descriptions
To evaluate the prediction results of the algorithm, we preferred to utilize a data set originating from at least one area of the relatively rich shale gas reserves in the Sichuan Basin of southwestern China. The Lower Silurian Longmaxi Formation, the Upper Ordovician Wufeng Formation and the Lower Cambrian Qiongzhusi Formation contain a large amount of dark gray mudstone and black carbonaceous mudstone. Dark gray silty mudstone and black carbonaceous mudstone are used as reservoirs. The
Discussion
Since the advent of the deep learning algorithms, favorable results have been achieved for many engineering problems with a large number of samples. However, deep learning theory can be employed to construct multihidden layer models to extract more abstract features to improve the generalizability of the model. Deep learning may also be able to solve small sample problems, but to this end, few studies have been conducted to date, and this problem has received little attention. In the context of
Conclusions
In this study, we investigate the use of small sample deep learning algorithms to solve small sample prediction problems. In particular, we propose a new framework called the IDLM as a deep learning algorithm for well logging interpretation problems and apply this framework to TOC calculations for shale reservoirs. To the best of our knowledge, this is the first time that small sample deep learning has been systematically studied and applied to small samples and large unlabeled samples for well
Declaration of Competing Interest
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.asoc.2019.105596.
Acknowledgments
This project was supported by the National Natural Science Foundation of China (Nos. 41404084, 41504094), the Natural Science Foundation of Hubei Province (Nos. 2013CFB396), Open Fund of Key Laboratory of Exploration Technologies for Oil and Gas Resources (Yangtze University), Ministry of Education (Nos. K2015-06, 2016-09, K2017-01, K2018-11), National Science and Technology Major Project (Nos. 2017ZX05032003-005), China National Petroleum Corporation Major Projects (Nos. 2013E-38-09), Yangtze
References (67)
- et al.
Deep correspondence restricted Boltzmann machine for cross-modal retrieval
Neurocomputing
(2015) - et al.
A sparse-response deep belief network based on rate distortion theory
Pattern Recognit.
(2014) - et al.
Geological characteristics and resource potential of shale gas in China
Petrol Explor. D
(2010) - et al.
Log interpretations and the application of core testing technology in the shale-gas: Taking the exploration and development of the sichuan basin as an example
ACTA Petrol. Sin.
(2011) - et al.
Artificial neural network modelling as an aid to source rock characterization
Mar. Pet. Geol.
(1996) - et al.
A committee machine with intelligent systems for estimation of total organic carbon content from petrophysical data: An example from Kangan and Dalan reservoirs in South Pars Gas Field, Iran
Comput. Geol.
(2009) - et al.
Toc determination of Gadvan Formation in South Pars Gas field, using artificial intelligent systems and geochemical data
J. Petrol. Sci. Engl.
(2011) - et al.
Comparison of intelligent and statistical clustering approaches to predicting total organic carbon using intelligent systems
J. Petro. Sci. Engl.
(2012) - et al.
Support-vector-regression machine technology for total organic carbon content prediction from wireline logs in organic shale: Acomparative study
J. Natur. Gas Sci. Engl.
(2015) - et al.
Application of extreme learning machine and neural networks in total organic carbon content prediction in organic shale with wire line logs
J. Natur. Gas Sci. Engl.
(2016)
Total organic carbon content prediction of shale reservoirs based on discrete process neural network
J. China Univ. Petrol.
Determination of the total organic carbon (TOC) based on conventional well logs using artificial neural network
Internet. J. Coal Geol.
Data mining and machine learning for identifying sweet spots in shale reservoirs
Expert Syst. Appl.
Extreme learning machine: Theory and applications
Neurocomputing
Network anomaly detection with the restricted Boltzmann machine
Neurocomputing
Formation and enrichment mode of Jiaoshiba shale gas field, Sichuan Basin
Pertol. Explor. D
The shale characteristics and shale gas exploration prospects of the Lower Silurian Longmaxi shale, Sichuan Basin, South China
J. Natur. Gas Sci. Engl.
Efficient leave-one-out cross-validation of kernel fisher discriminant classifiers
Pattern Recognit.
Optimization method based extreme learning machine for classification
Neurocomputing
Deep learning
Nature
Deep learning: methods and applications, found
Trends Signal. Process.
Why does unsupervised pretraining help deep learning?
J. Mach. Learn. Res.
Unsupervised deep feature extraction for remote sensing image classification
IEEE Trans. Geol. Sci. Remote Sens.
Deep fully convolutional network-based spatial distribution prediction for hyperspectral image classification
IEEE Trans. Geol. Sci. Remote Sens.
ABCNN: Attention-based convolutional neural network for modeling sentence pairs
Comput. Sci.
Stacked denoise autoencoder based feature extraction and classification for hyperspectral images
J. Sensor.
Breakthrough and prospect of shale gas exploration and development in China
Natur. Gas Ind.
Progress and prospects of shale gas exploration and development in China
ACTA Petrol. Sin.
Method and advance of shale gas formation evaluation by means of well logging
Geol. Bull. China
Cited by (55)
Estimating total organic carbon of potential source rocks in the Espírito Santo Basin, SE Brazil, using XGBoost
2024, Marine and Petroleum GeologyLong short-term memory models of water quality in inland water environments
2023, Water Research XTotal organic carbon content logging prediction based on machine learning: A brief review
2023, Energy GeoscienceCitation Excerpt :For example, the deep Boltzmann machine uses unsupervised pretraining methods to ensure that the model has appropriate weights related to the data. Convolutional neural networks reduce the total number of weights in the network through weight sharing and other methods, which makes the weights of multi-hidden-layer networks easier to train (Ye et al., 2018; Zhu et al., 2019a,b, 2020; Wang et al., 2020; Asante-Okyere et al., 2021). Judging from the application effects of current deep learning algorithms in TOC content prediction, the use of these methods to train multi-hidden-layer neural networks has achieved good results, indicating that these techniques can improve the TOC content prediction accuracy.
Effect of lithological variations on the performance of artificial intelligence techniques for estimating total organic carbon through well logs
2023, Journal of Petroleum Science and EngineeringData-driven diagenetic facies classification and well-logging identification based on machine learning methods: A case study on Xujiahe tight sandstone in Sichuan Basin
2022, Journal of Petroleum Science and Engineering