Data mining and machine learning for identifying sweet spots in shale reservoirs

doi:10.1016/j.eswa.2017.07.015

Expert Systems with Applications

Volume 88, 1 December 2017, Pages 435-447

https://doi.org/10.1016/j.eswa.2017.07.015 Get rights and content

Highlights

•
An automatic procedure that can use most of the available information.
•
The proposed method can be updated rapidly when new data are available.
•
Data can be integrated with different scales and lengths.
•
Sweet-spot location can be identified using all available data.

Abstract

Due to its complex structure, production form a shale-gas formation requires more drillings than those for the traditional reservoirs. Modeling of such reservoirs and making predictions for their production also require highly extensive datasets. Both are very costly. In-situ measurements, such as well-logging, are one of most indispensable tools for providing considerable amount of information and data for such unconventional reservoirs. Production from shale reservoirs involves the so-called fracking, i.e. injection of water and chemicals into the formation in order to open up flow paths for the hydrocarbons. The measurements and any other types of data are utilized for making critical decisions regarding development of a potential shale reservoir, as it requires hundreds of millions of dollar initial investment. The questions that must be addressed include, does the region under study can be used economically for producing hydrocarbons? If the response to the first question is affirmative, then, where are the best places to carry out hydro-fracking? Through the answers to such questions one identifies the sweet spots of shale reservoirs, which are the regions that contain high total organic carbon (TOC) and brittle rocks that can be fractured. In this paper, two methods from data mining and machine learning techniques are used to aid identifying such regions. The first method is based on a stepwise algorithm that determines the best combination of the variables (well-log data) to predict the target parameters. However, in order to incorporate more training, and efficiently use the available datasets, a hybrid machine-learning algorithm is also presented that models more accurately the complex spatial correlations between the input and target parameters. Then, statistical comparisons between the estimated variables and the available data are made, which indicate very good agreement between the two. The proposed model can be used effectively to estimate the probability of targeting the sweet spots. In the light of an automatic input and parameter selection, the algorithm does not require any further adjustment and can continuously evaluate the target parameters, as more data become available. Furthermore, the method is able to optimally identify the necessary logs that must be run, which significantly reduces data acquisition operations.

Introduction

Due to the recent progress in multistage hydraulic fracturing, horizontal drilling and advanced recovery methods, shales, recognized as unconventional reservoirs, have become a promising source of energy. They are formed by fine-grained organic-rich matters and were previously considered as source and seal rock that, due to gas production and high pressures in the conventional reservoirs were traditionally called the trouble zones. Such regions were usually ignored, which explains why no comprehensive data are available for them. Thus, their characterization is still a massive task (Tahmasebi et al., 2015a, Tahmasebi et al., 2015b, Tahmasebi et al., 2016b, Tahmasebi and Sahimi, 2015, Tahmasebi et al., 2015a). Furthermore, shales exhibit highly variable structures and complexities from basin to basin, and even in small fields. They host very small pores, and have low-matrix permeability and heterogeneity, both at the laboratory and field scales. Due to such difficulties and given the fact that new methods for characterization of shale reservoirs are still being developed, application of the characterization and modeling methods for the traditional reservoir to shales is of great importance. In particular, accurate characterization of such reservoirs entails integrating various information, including petrophysical, geochemical, geomechanical, and reservoir data (Tahmasebi and Sahimi, 2016a, Tahmasebi and Sahimi, 2016b Tahmasebi et al., 2016a, Tahmasebi et al., 2017, Tahmasebi et al., 2016c).

Apart from its type, efficient drilling may be thought of as targeting the most productive zones of a reservoir with maximum exposure. For shale reservoirs, this concept is equivalent to areas with high total organic carbon (TOC) and high fracability, i.e. brittleness, which calls for comprehensive characterization of such complex formations. High TOC and fracable index (FI) reflect high quality of shale-gas reservoirs. The role of the TOC is clear, as it is one of the main factors for identifying an economical shale reservoir. Higher TOC, ranging from 2% to 10%, represents richer organic contents and, consequently, higher potential for gas production. Since natural gas is trapped in both organic and inorganic matters, the TOC denotes the entire organic carbons, and is a direct measure of the volume and maturity of the reservoir.

The FI influences the flow of hydrocarbons in a shale reservoir and any future fracking in it. Thus, identifying the layers in a reservoir with high FI is of great importance. The FI controls a shale reservoir's production since it strongly influences the wells’ production. It also provides very useful insight into where and how new wells should be placed and spaced. In fact, unlike conventional reservoirs that depend on long-range connectivity of the permeable zones, optimal well locations and spacing control the performance of shale reservoirs and future fracking operations in them. Thus, separating the brittle and ductile zones of rock is a key aspect of successful characterization of shale-gas reservoirs. Moreover, brittle shale has high potentials for being naturally fractured and, consequently, exhibits good response to fracking treatments.

Past successful experience indicated that characterization of shale reservoirs need accurate identification of the so-called sweet spots, i.e. the zones that present the best production or the potential for high production, and the potential fracable zones, which are critical to maximizing the production and future recovery. The placement of most of the wells is closely linked with the sweet spots, as well as the fracable zones for hydraulic fracturing. For example, the TOC represents the ability of a shale reservoir in storing and producing hydrocarbons. Fracability is controlled mainly by mineralogy and elastic properties, such as the Young's and bulk moduli and the Poisson's ratio (Sullivan Glaser et al., 2013). Therefore, identification of the sweet spots is of great importance to shale reservoirs. Such spots are characterized through high kerogen content, low water saturation, high permeability, high Young's modulus and low Poisson's ratio.

One of the primary, as well as most affordable, methods for characterizing complex reservoirs is coring and collecting petrophysical data, as well as well logs. The latter can be integrated with former in order to develop a more reliable model. In principle, well-log data can be provided continuously and, thus, they represent a real-time resource. Because of a huge number of wells in a typical shale reservoir, we refer to such datasets as big data. Clearly, such information is very useful when it is coupled with some techniques that help better identify the sweet spots and fracable zones. Eventually, the questions that must be addressed are: where one should/should not drill new wells? Where are the zones with high/low fracability index?

Aside from such critical questions, another issue regarding the available big data is the fact that new data are continuously obtained as the production proceeds. Thus, any algorithm for the analysis of big data should be flexible enough for rapid adaptation of new data. Furthermore, another important feature of the algorithm should be its ability to use the available information to create a “training platform” for forecasting the important parameters.

In this paper, a very large database consisting of well logs, x-ray diffraction (XRD) data, and experimental core analysis is used to develop a model that reduces the cost and increases the probability of identifying the sweet spots. First, we describe a method of data mining called stepwise regression (Efromyson, 1960, Montgomery et al., 2012) for identifying the correlations between the target (i.e., dependent) parameters – the TOC and FI - and the available well-log data (the independent variables). Then, a hybrid method borrowed from machine learning and artificial intelligence is proposed for accurate predictions of the parameters. Both methods can be tuned rapidly, and can use the older database to accurately characterize shale reservoirs.

Section snippets

Methodology

As mentioned earlier, two very different methods are used in this paper. The first is borrowed from data-mining field by which the correlation between an independent variable and a series of dependent variables is constructed and used for future forecasting. Next, a method of machine learning for developing a more robust model is introduced that recognizes the complex relations between the variables.

Results and discussion

As discussed, due to the extensive variability of shale reservoirs, an extensive amount of information is required for their characterization and, hence, it helps to reduce the uncertainty and improve real-time recovery operations. Thus, since well logs provide useful information, and at the same time are widely available, the objective of this study is to use such data to predict two important properties of shales, namely, the TOC and FI. Clearly, none of the well logs can by itself predict

Summary and conclusions

Due to their highly complex structures, shale-gas reservoirs require very accurate modeling. Vertical and lateral variability necessitate more drilling, which consequently leads to significant increase in the cost of the operations. Wireline well logging is one of the most accessible and affordable approaches to continuously monitor such complexities. Shale-gas repositories are associated with extensive/big data. Obviously, analysis of the uncertainty and risk assessment for future development

Acknowledgements

PT thanks the financial support from the University of Wyoming for this research. FJ would like to thank the support from Nano Geosciences lab and the Mudrock Systems Research Laboratory (MSRL) consortium at the Bureau of Economic Geology, The University of Texas at Austin. MSRL member companies are Anadarko, BP, Cenovus, Centrica, Chesapeake, Cima, Cimarex, Chevron, Concho, ConocoPhillips, Cypress, Devon, Encana, Eni, EOG, EXCO, ExxonMobil, Hess, Husky, Kerogen, Marathon, Murphy, Newfield, Penn

References (43)

C. Cernuda et al.
NIR-based quantification of process parameters in polyetheracrylat (PEA) production using flexible non-linear fuzzy systems
Chemometrics and Intelligent Laboratory Systems
(2011)
O. Cordón et al.
Ten years of genetic fuzzy systems: Current framework and new trends
Fuzzy Sets and Systems
(2004)
H. Dashtian et al.
Analysis of cross correlations between well logs of hydrocarbon reservoirs
Transport in Porous Media
(2011)
H. Li et al.
Computer simulation of gas generation and transport in landfills: VI – Dynamic updating of the model using the ensemble Kalman filter
Chemical Engineering Science
(2012)
H. Li et al.
Ensembles-based and GA-based optimization for landfill gas production
AIChE Journal
(2014)
E. Lughofer
(2011)
P. Tahmasebi et al.
Reconstruction of nonstationary disordered materials and media: Watershed transform and cross-correlation function
Physical Review E
(2015)
P. Tahmasebi et al.
Image-based modeling of granular porous media
Geophysical Research Letters
(2017)
P. Tahmasebi et al.
Pore-scale simulation of flow of CO2 and brine in reconstructed and actual 3D rock cores
Journal of Petroleum Science and Engineering
(2016)
J. Abonyi
Fuzzy model identification
Fuzzy model identification for control
(2003)

C.M. Bishop

Neural networks for pattern recognition

(1995)

C. Cernuda et al.

Hybrid adaptive calibration methods and ensemble strategy for prediction of cloud point in melamine resin production

Chemometrics and Intelligent Laboratory Systems

(2013)

K. Deb et al.

Understanding interactions among genetic algorithm parameters

In Foundations of Genetic Algorithms

(1999)

A.W.F. Edwards

Statistical methods in scientific inference

Nature

(1969)

M.A. Efromyson

Multiple regression analysis

Mathematical methods for digital computers

(1960)

D.E. Goldberg et al.

Genetic algorithms and machine learning

Machine Learning

(1988)

F. Gomide

Fundamentals of fuzzy set theory

Handbook on computational intelligence

(2016)

Y. Haitovsky

Missing data in regression analysis

Journal of the Royal Statistical Society. Series B (Methodological)

(1968)

J.-S.R. Jang

Self-learning fuzzy controllers based on temporal backpropagation

IEEE Transactions on Neural Networks

(1992)

J.-S.R. Jang

ANFIS: Adaptive-network-based fuzzy inference system

IEEE Transactions on Systems, Man, and Cybernetics

(1993)

D.M. Jarvie et al.

Unconventional shale-gas systems: The Mississippian Barnett Shale of north-central Texas as one model for thermogenic shale-gas assessment

AAPG Bulletin

(2007)

Cited by (0)

View full text

Data mining and machine learning for identifying sweet spots in shale reservoirs

Highlights

Abstract

Introduction

Section snippets

Methodology

Results and discussion

Summary and conclusions

Acknowledgements

Chemometrics and Intelligent Laboratory Systems

Fuzzy Sets and Systems

Transport in Porous Media

Chemical Engineering Science

AIChE Journal

Physical Review E

Geophysical Research Letters

Journal of Petroleum Science and Engineering

Fuzzy model identification

Fuzzy model identification for control

Neural networks for pattern recognition

Hybrid adaptive calibration methods and ensemble strategy for prediction of cloud point in melamine resin production

Chemometrics and Intelligent Laboratory Systems

Understanding interactions among genetic algorithm parameters

In Foundations of Genetic Algorithms

Statistical methods in scientific inference

Nature

Multiple regression analysis

Mathematical methods for digital computers

Genetic algorithms and machine learning

Machine Learning

Fundamentals of fuzzy set theory

Handbook on computational intelligence

Missing data in regression analysis

Journal of the Royal Statistical Society. Series B (Methodological)

Self-learning fuzzy controllers based on temporal backpropagation

IEEE Transactions on Neural Networks

ANFIS: Adaptive-network-based fuzzy inference system

IEEE Transactions on Systems, Man, and Cybernetics

Unconventional shale-gas systems: The Mississippian Barnett Shale of north-central Texas as one model for thermogenic shale-gas assessment

AAPG Bulletin