The classification in metamorphic rocks using modified fuzzy cluster analysis from geophysical log data: evidence from Chinese Continental Scientific Drilling Main Hole
The classification in metamorphic rocks using modified fuzzy cluster analysis from geophysical log data: evidence from Chinese Continental Scientific Drilling Main Hole
Huaijie Yang 0
Heping Pan 0
Miao Luo 0
Gang Li 0
Jing Yao 0
0 Institute of Geophysics and Geomatic, China University of Geosciences , Wuhan 430074, Hubei Province , China
Lithology is one of the most important data in evaluating reservoir, and is mainly carried out by cores recovery in laboratory which is very expensive, and its interpretation is time consuming. Accurate identification of lithology is fundamentally crucial to evaluate reservoir from geophysical log data. Pattern recognition and statistical analysis have been proved to be the most powerful methods for constructing optimal model in lithology recognition. To address this issue, a fast and practical K-means clustering algorithm is proposed in order to better deal with lithology recognition from geophysical log data. Based on the traditional K-means clustering algorithm, Euclidean distance is replaced by Mahalanobis distance; the initial cluster centers are acquired from the average of characteristic values but not selected randomly, in addition, adding weight value in each characteristic value of the objective function, and thus a lithology recognition model named modified K-means clustering is established. The method is applied to identify the Chinese Continental Scientific Drilling Main Hole (CCSD-MH) metamorphic rocks. Compared with the traditional K-means clustering, the accurate rate of the modified K-means clustering in lithologic identification has improved for the same 45 samples, raised 11.11 %. According to the modified K-means cluster algorithm, nine kinds of lithology cluster centers are acquired from 45 samples. The classes of the samples can be determined by analyzing the hamming Huaijie Yang
Lithology recognition; CCSD-MH; K-means clustering; Hamming approach degree; Cluster center; Geophysical log data; Weight value
-
approach degree curves, which is calculated by the
undetermined samples and 9 cluster centers. The predicted
results and the core recovery are exactly the same by
comparison. The hamming approach degree can identify
the whole well of CCSD-MH lithology effectively and
accurately. This model may be made applications to other
areas.
Fuzzy theory was proposed by cybernetic professor L.
A. Zadeh in University of California in 1965 (Gao 2004)
and has been widely used in the natural sciences and social
sciences fields in the following 50 years. Fuzzy clustering
analysis is a branch of fuzzy mathematics, and its range of
applications involves time series prediction (Ryoke et al.
1995), neural networks training (Karayiannies and Mi
1997), nonlinear system identification (Runkler et al.
1996), parameter estimation (Gath and Geva 1989),
medical diagnosis (Bezdek and Fordon 1979), weather forecast
(Newton 1992), food classification (Windham 1985), and
water quality analysis (Mukherjee 1995).
The limitations of traditional fuzzy clustering analysis are
several controlling factors, such as the choice of initial cluster
centers, the correlation between samples, the trade-off
between iteration times, and solutions accuracy. To solve these
problems, many researchers had proposed many modified
algorithms, such as K-means clustering, C-means clustering,
fuzzy clustering neural network, and fuzzy clustering genetic.
Overview of worldwide, the workers had made many
researches about the lithology identification of
CCSDMH; however, the database of CCSD-MH core data was
still incomplete and inaccurate. Xu et al. (2006) analyzed
magnetic susceptibility and density of different rocks
from CCSD-MH in the depth Section 02000 miles,
identified the lithology with SPSS statistical software.
Jing et al. (2007) summed up 11 kinds of eclogites into 6
kinds based on multivariate statistic methods. Gu et al.
(2009) constructed the lithology recognition model
combining the logging response and several well logs of
different rocks with the method of cluster analysis and
stepwise discriminant analysis. Luo and Pan (2010) used
core-log correlation and cross-plotting methods, and the
results allowed the authors to conclude that the lithology
is mainly comprised orthogneiss, paragneiss, eclogite,
amphibolite, and ultramafic rocks. Bosch et al. (2013)
used fuzzy logic for lithology prediction from well log
data of the German Continental Deep Drilling Program
(KTB). Results showed that this fuzzy logic-based method
was suited for rapidly and reasonably suggesting a
lithology column from KTB well log data. The above
authors heavily focused on approaches such as visual
inspection, cross-plotting technology, and discriminate
function analysis, and not formed a method that can
neatly identify the main units and refine the classification
of the CCSD-MH whole well.
Reservoir evaluation needs the data of many kinds of
rocks, which have a much more different porosity and
permeability. The well logs have varieties of responses
based on different kinds of rocks characteristic. And the
lithology data is mainly carried out by cores recovery in lab
which is very expensive and its interpretation is time
consuming, so accurate identification of lithology from
geophysical well log data plays a significant role in
reservoir evaluation.
In this study, a fast and practical K-means clustering
algorithm was proposed in order to better deal with
lithology recognition of CCSD-MH from geophysical log
data. Based on the traditional K-means clustering
algorithm, Euclidean distance was replaced by Mahalanobis
distance, and the initial cluster centers were acquired from
the average of characteristic values, in addition, added
weight value in each characteristic value of the objective
function. The model was applied to classify CCSD-MH
metamorphic rocks and get the cluster centers of each
class. The cluster centers, as well as weight values, were
used to calculate the hamming approach degree, which can
neatly identify the main units and refine the classification
of the CCSD-MH whole well.
Modified K-means clustering
Cluster center
Let us choose m objects, and each object has n
characteristic values that may be classified into z classes. According
to the fuzzy theory, the fuzzy matrix involving the above
objects can be constructed as X xij , (i 1; 2; . . .; m;
j 1; 2; . . .; n). The cluster center matrix is defined as
C ckj , and k (k 1; 2; . . .; z), j (j 1; 2; . . .; n). And
m z. The two matrixes and their compositions are as
follows:
X fX1; X2; . . .; Xng and
C fC1; C2; . . .; Czg
and Ck fck1; ck2; . . .; ckng:
The traditional K-means algorithm will increase iteration
times if initial cluster centers selected inappropriately, and
may easily fall into local optimums. In order to alleviate
this problem, we acquired the initial cluster centers from
the average of characteristics values in matrix X based on
the theory of cluster center (Liao 2013).
Therefore, an (...truncated)