Spatial statistics

The main difference between classical statistics and geostatistics is the assumption of spatial dependency. That is, the location of data elements with respect to one another plays an important role in the analysis, modeling, and estimation procedures.

Impact of spatial correlation

Consider the two images in Fig. 1, which for discussion purposes we can consider to be different porosity maps. Map A has a nearly random appearance, with only a hint of northwest/southeast preferential alignment of porosity. Map B shows a higher degree of continuity, with the axis of maximum correlation oriented northwest/southeast. Visually, these maps look quite different, but the descriptive-statistical measures, such as the mean and variance, are identical for each. This simple example illustrates the fact that classical statistical analysis cannot fully describe the nature of the data, especially when the data have a distinct, organized pattern.

Fig. 1—These images, maps A and B, appear quite different; however, the histograms of their values are identical. Classical statistical measures (e.g., mean and standard deviation) and histograms cannot depict the spatial arrangement of information, but geostatistical methods can make such a distinction.

Almost all the variables of interest in the petroleum industry (e.g., porosity, permeability, facies, saturation, net-to-gross, volumes) are the product of a number of complex physical and chemical processes that impose spatial dependency on the reservoir rocks. That is, they display distinct geographic patterns of continuity when mapped. Understanding and modeling the scales of continuity and directional information contained in the data is important for efficient hydrocarbon production. ^[1] ^[2] Attributes that exhibit spatial continuity are called regionalized variables (RV) , and their spatial continuity can be described by a statistic called the semivariogram. The introduction of the semivariogram into an estimation algorithm has resulted in what now is called kriging. ^[3] ^[4] ^[5] ^[6] ^[7] ^[8]

Table 1 shows a summary of spatial models.

Table 1—Spatial Models Summary

Properties of the regionalized variable (RV) and random functions

We have seen that two data sets can have the same univariate statistics, yet have very different spatial properties (Fig. 2). The complex attributes we deal with in the petroleum industry can be described by random functions that are combinations of regionalized and random variables. Regionalized variable theory is based on the statistics of the RV, ^[9] ^[10] ^[11] which differs from ordinary scalar random variables in its spatial continuity, yet still possesses the usual distribution statistics, such as mean and variance. The RV also differs in that it has a defined location. Two realizations (measurements) of an RV that differ in spatial location display in general a nonzero correlation; however, successive realizations of an ordinary scalar random variable are uncorrelated.^[8] Therefore, RVs and spatial correlation analysis are used to quantify the distance- and direction-related spatial properties in a sample data set.

Semivariograms and covariance

The semivariogram (informally and commonly known as the variogram or the experimental variogram) is a statistical measure of the rate of change with distance, for attributes that vary in space.^[12] The formula for calculating the experimental variogram (Eq. 1) involves terms that depend on measurements at specific locations, namely z_i and z_i+h. Unlike the mean value of a data set, which is a single value, the variogram is a continuous function of distance, calculated from discrete measurements between pairs of points whose separation distance h falls within a given distance interval called a lag. The lag is a vector, involving not only the magnitude of the separation, but also the azimuth of the line through each data pair. For a given azimuth, the squared difference of the RV is calculated for each pair in a given lag. The average value for each lag then is calculated and plotted on a graph of the mean-squared difference against the lag intervals. As we shall see later, the variogram is required in many of the geostatistical methods for prediction or simulation away from control points.

Given a sample of observations, and provided that the mean is constant as a function of h, an unbiased estimator of the variogram is

....................(1)

Now compare Eq. 1 to Eq. 2, which computes the traditional covariance statistic.

....................(2)

Fig. 2 shows the anatomy of an experimental variogram. The variogram is a measure of dissimilarity with distance at each lag before reaching a constant value (the sill). The distance h at which the unbiased estimate γ(h) reaches the sill is called the range or the scale. If the variogram does not appear to go through the origin, but instead shows a discontinuity and intersects the ordinate, the value of γ(h) at the intersection is called the nugget.

Fig. 2—The anatomy of a variogram. The dashed line at the top identifies the sill and usually is consistent with the variance of the data. The correlation range is read off the horizontal axis and occurs at the distance value where the sill is reached. The nugget occurs where the slope of the experimental variogram appears to intersect the y-axis.

In practice, the experimental variogram can be calculated and modeled, but it is implemented in the kriging algorithm using the covariance function in most software programs. If a covariance exists, the variogram and the covariance are related by

....................(3)

The covariance can be viewed as an inverse variogram. As such, the covariance function measures increasing similarity (autocovariance) with distance, rather than dissimilarity with distance.

Stationarity

In general, statistics relies on some replication notation, whereby estimates can be derived and the variation and uncertainty of the estimate can be understood from repeated observations. In spatial analysis and estimation, the idea of stationarity is used to obtain the necessary replication. Stationarity is a property of the random function model, not of the underlying spatial distribution. In its strict sense, it requires the mean value of an RV to be constant between samples and independent of location. The four degrees of stationarity considered important in geostatistics are strict stationarity, second-order stationarity, the intrinsic hypothesis, and quasi-stationarity.^[13] ^[14]

Second-order stationarity assumes that the mean and covariance are the same between pairs of points that fall within the same separation interval, no matter which two points are chosen. Thus, in second-order stationarity, the covariance is dependent only on the distance between two points, and not on the location. Intrinsic stationarity assumes that the expected values of the mean and variance (variogram) are invariant with respect to location. The intrinsic hypothesis is sufficient for most geostatistical studies. Quasi-stationarity occurs when a trend can be seen at long separation intervals, and so the covariance is smaller than the scale of the trend and there is local stationarity. Second-order and intrinsic stationarity are necessary assumptions for achieving replication to estimate the dependence rules, which then allows us to make predictions and assess uncertainty. It is the spatial information particularly in these two degrees of stationarity—the similar distance between any two points in a given lag—that provides the replication.^[13] ^[14]

The equation for computing the experimental variogram (Eq. 1) involves terms that depend on locations [ z(ui→) and z(ui→+h→) ] that occur inside the field of the regionalized variable z. The averaging generally cancels out the dependency on location, such that the dependency is based solely on the distance h. This is an assumption, though, rather than a fact—geostatistics does not have a test to verify this assumption. Strict application of the variogram requires a constant mean. A gentle, systematic variation in the mean value, such as the increase in temperature with depth, is called a drift or a trend. A regionalized variable that exhibits a drift is termed “nonstationary”; conversely, a stationary regionalized variable is drift-free. Proper variogram computation requires the removal of the drift. There are several ways to do this ^[12] ^[15] ^[16].

Structural analysis (spatial-continuity analysis)

Structural analysis (also called spatial-continuity analysis) is the computation and modeling of the patterns of spatial dependence that characterize a regionalized variable. This amounts to the study of the experimental variogram. Rarely is structural analysis the goal of a study; rather, it is the necessary first step before modeling the regionalized variable with kriging or conditional simulation techniques. Ultimately, both of these techniques will require covariance information that is supplied by the structural analysis.

There are two main steps to performing structural analysis. First, compute the experimental measures of continuity (variogram), accounting for anisotropy and azimuth, and then model the experimental variograms with a continuous function.

Computing the experimental variogram

If data are sampled on a regular grid, then the calculation search strategy for data pairs is simple. Unfortunately, though, wells rarely are drilled on a regular grid, and so to extract as much information as possible, we search for pairs of wells in lag intervals (discussed above), rather than along a simple vector. Identifying the best lag interval sometimes is frustrating but generally is an iterative process through which much is learned about the data. Several excellent texts are available on this subject.^[13] ^[17] ^[18] ^[19]

Modeling the experimental variogram

The experimental variogram is calculated only along specific interdistance vectors that correspond to angular and distance bins. To use the experimental variogram, kriging and conditional simulation applications require a model of spatial dependency. This is because the kriging system of equations requires knowledge of the covariance function for all possible distances and azimuths, and because the model smoothes the experimental statistics and introduces spatial information.

Variogram modeling is not a curve-fitting exercise in the least-squares sense. Least-squares fitting of the experimental variogram points cannot ensure a function that yields a kriging variance ≥ 0, a condition known as positive definiteness. ^[13] ^[19] Only a limited number of positive definite functions are known to fit the shapes of experimental variograms. Those most often used in commercial software are the spherical, exponential, Gaussian, and linear. A combination or nesting of functions is used to model complex experimental variograms.

Nugget effect

As mentioned previously, often the experimental variogram shows a discontinuity at the origin, which is termed the “nugget effect” (see Fig. 1). The discontinuity is a manifestation of a relative high variance at the first lag. It is caused by irreducible measurement error inherent in the data and by small-scale geologic variability that is due to incomplete sampling of the reservoir topology. ^[13] ^[19] It is our observation in reservoir geostatistics that the nugget effect is almost entirely due to small-scale geologic variability. The occurrence of a nugget effect is important and can be indicative of continuities that are smaller than the average well spacing (tantamount to the shortest lag interval). It is important to model the nugget, if present, because it will influence both kriging and conditional simulation. The latter will add more variance near wells, and the former will show more smoothing near wells.

Fig. 1—The anatomy of a variogram. The dashed line at the top identifies the sill and usually is consistent with the variance of the data. The correlation range is read off the horizontal axis and occurs at the distance value where the sill is reached. The nugget occurs where the slope of the experimental variogram appears to intersect the y-axis.

Spatial cross covariance analysis

Until now, only a single variable has been considered for spatial analysis (e.g., comparing porosity values to other nearby porosity values). The study of spatial relationships between two or more different variables requires the use of a cross-correlation statistic that defines the degree to which one variable is capable of explaining the behavior of another. The cross-variogram model is useful when performing cokriging or conditional cosimulation (e.g., integrating well and seismic data). The cross-variogram equation (Eq. 4) compares paired points that represent different variables, as in the case of the traditional covariance statistic (Eq. 2). Like the variogram, the cross-variogram is a continuous function of h.^[13] ^[19]

....................(4)

....................(2)

....................(1)

Unlike the variogram (covariance), the cross-variogram (cross-covariance) can take on negative values. This is observed when two variables are inversely correlated and have a negative correlation coefficient, such as in the porosity and acoustic impedance example given in this subsection.

Support effect

Interestingly, geostatistics was not developed originally to solve interpolation problems, but to address what is called the “support effect.” Support is the volume on which a sample is measured. Some attributes we measure can be considered point measurements, in that there is a location for each sample, such as well data. Others, such as well-test permeability, are measured over a volume and with the well location taken as the center of volume. A change in any of the characteristics of a support defines a new regionalized variable (RV). Thus, an additional precaution in structural analysis is to make certain that the data for estimating the variogram relate to the same support. In general, larger supports tend to reduce variability, producing variograms with smaller sills and larger ranges.^[18] ^[20]

The support effect tends to be overlooked when combining information that comes from variables measured over different volumes (e.g., when combining well measurements and seismic attributes) or core and well-test permeabilities. Ignoring the support effect can impart a systematic bias to estimates. There are several procedures available to account for a change in support, and doing so is critical when integrating data measured by different methods (e.g., core data vs. wireline data vs. seismic attributes). Using the cross-variance model in a cokriging or cosimulation system is one way to provide estimates or simulated values that help to account for the support effects.^[18] ^[21] ^[22] In general, geostatistical laws for managing support are well documented and need to be applied more rigorously in reservoir modeling, where it is often neglected. ^[22] ^[23]

Spatial models summary

Table 1 summarizes the benefits and limitations of spatial models.

Table 1—Spatial Models Summary

Nomenclature

	=	the mean covariance value between pairs of values whose separation interval is equal to a distance of vector ; units are those of the measured variable, squared
Cov(0)	=	the minimum covariance in the covariance function; units are those of the measured variables, squared
Cov_x,y	=	covariance (untransformed) of variables X and Y
	=	the lag or distance vector between pairs of points whose units are terms of the coordinate system
m_x	=	sample mean of X, units are those of the X variable
m_y	=	sample mean of Y, units are those of the Y variable
n	=	the total number of samples
X_i	=	the measured value of variable X, with i varying between the first and last measurements; units are those of the X variable
Y_i	=	the measured value of variable Y, with i varying between the first and last measurements; units are those of the Y variable
	=	the measured value of a regionalized variable z at location , where i varies between the first and last measurements
	=	the measured value of a regionalized variable at a location distance away from
	=	the mean-squared difference between two measured variables whose separation interval is equal to a distance vector

References

↑ Dubrule, O. 1998. Geostatistics in Petroleum Geology. AAPG Course Note Series, AAPG, Tulsa, 38, 52.
↑ Chambers, R.L., Yarus, J.M., and Hird, K.B. 2000. Petroleum Geostatistics for the Nongeostatistician—Part 1. The Leading Edge (May): 474.
↑ Krige, D.G. 1951. A Statistical Approach to Some Basic Mine Evaluation Problems on the Witwatersrand. J. Chem. Metall. Min. Soc. South Africa 52: 119.
↑ Sichel, H.S. 1952. New Methods in the Statistical Evaluation of Mine Sampling Data. Trans. Inst. Min. Metall. 61 (6): 261.
↑ Watermeyer, G.A. 1919. Applications of the Theory of Probability in the Determination of Ore Reserves. J. Chem. Metall. Min. Soc. South Africa 19: 97.
↑ Truscott, S.J. 1929. The Computation of the Probable Value of Ore Reserves from Assay Results. Trans. Inst. Min. Metall. 38: 482.
↑ de Wijs, H.J. 1951. Statistics of Ore Distribution. Part 1: Frequency Distribution of Assay Values. Geologie en Mijnbouw 13: 365.
↑ ^8.0 ^8.1 Henley, S. 1981. Nonparametric Geostatistics. Essex, UK: Elsevier Applied Science Publishers.
↑ Matheron, G. 1962. Traite de Geostatistique Appliquee, tome 1, 111. Paris, France: Editions Technip.
↑ Matheron, G. 1963. Principles of Geostatistics. Economic Geology 58: 1246.
↑ Matheron, G. 1970. Random Functions and Their Application in Geology. Geostatistics, A Colloquium, 79-88, ed. D.F. Merriam. New York City: Plenum.
↑ ^12.0 ^12.1 Olea, R.A. 1994. Fundamentals of Semivariogram Estimation, Modeling, and Usage. Stochastic Modeling and Geostatistics, 3, 27-35, ed. J.M. Yarus and R.L. Chambers. Tulsa, Oklahoma: AAPG Computer Applications in Geology, AAPG.
↑ ^13.0 ^13.1 ^13.2 ^13.3 ^13.4 ^13.5 Isaaks, E.H. and Srivastava, R.M. 1989. An Introduction to Applied Geostatistics. Oxford, UK: Oxford University Press.
↑ ^14.0 ^14.1 Hohn, M.E. 1999. Geostatistics and Petroleum Geology, second edition. Amsterdam: Kluwer Academic Publishers.
↑ Olea, R.A. 1975. Optimum Mapping Techniques Using Regionalized Variable Theory. Series on Spatial Analysis, Kansas State Geological Survey, Lawrence, Kansas, 3, 137.
↑ Christakos, G. 1992. Random Fields Models in the Earth Sciences. San Diego, California: Academic Press.
↑ Chambers, R.L., Yarus, J.M., and Hird, K.B. 2000. Petroleum Geostatistics for the Nongeostatistician—Part 1. The Leading Edge (May): 474.
↑ ^18.0 ^18.1 ^18.2 Olea, R.A. 1994. Fundamentals of Semivariogram Estimation, Modeling, and Usage. Stochastic Modeling and Geostatistics, 3, 27-35, ed. J.M. Yarus and R.L. Chambers. Tulsa, Oklahoma: AAPG Computer Applications in Geology.
↑ ^19.0 ^19.1 ^19.2 ^19.3 Hohn, M.E. 1999. Geostatistics and Petroleum Geology, second edition. Amsterdam, The Netherlands: Kluwer Academic Publishers.
↑ Cosentino, L. 2001. Integrated Reservoir Studies Paris, France: Institut Français du Pétrole Publications, Editions Technip.
↑ Clark, I. 1979. Practical Geostatistics. London, England: Applied Science Publishers.
↑ ^22.0 ^22.1 Journel, A.G. and Huijbregts, C.J. 1978. Mining Geostatistics. London, England: Academic Press.
↑ Tran, T.T.B. 1996. The Missing Scales and Dirst Simulation of Block Effective Properties. Journal of Hydrology 182: 37.

Noteworthy papers in OnePetro

Use this section to list papers in OnePetro that a reader who wants to learn more should definitely read

External links

Use this section to provide links to relevant material on websites other than PetroWiki and OnePetro

Spatial statistics

Contents

Impact of spatial correlation