Blog


The Gaussian Distribution in Geostatistics

Clayton Deutsch,

Gaussian algorithms have infected virtually all corners of geostatistics: radial basis functions (RBF) for interpolation and implicit modeling, hierarchical truncated pluriGaussian (HTPG) for categorical variable modeling, detrending non-stationary variables with a Gaussian Mixture Model (GMM), projection pursuit multivariate transformation (PPMT) for multivariate modeling, and simulating regionalized variables based on turning bands, sequential or spectral algorithms. The Gaussian distribution is the correct limit distribution based on combinatorics or additive processes (like the sum of the numbers showing on three dice); however, the multivariate Gaussian distribution infects non-Gaussian variables by appropriate transformation. The properties of the multivariate Gaussian distribution are recalled and the clever adaptations to geostatistics are discussed.

Gaussian distribution overlaid on the sum of the numbers showing on three dice.

A central problem in Geostatistics is to model the high dimensional distribution of many coregionalized variables at many locations. There are often tens of variables at millions of locations. The multivariate Gaussian distribution is remarkable in its mathematical tractability. The multivariate Gaussian distribution is fully parameterized by a mean vector and a variance-covariance matrix. In geostatistics, (1) the mean vector comes from an assumption of stationarity or a trend model, and (2) the variance covariance matrix comes from a set of direct and cross variogram models. The parametric form of the multivariate Gaussian distribution is simply expressed; however, it is the mathematical properties of the distribution that are especially virulent. Averaging random samples tends to a Gaussian distribution; turning bands and spectral techniques use this with appropriate weights to simulate with a specified covariance structure. Conditioning, sequential simulation and the direct calculation of local uncertainty take advantage of the property that all conditional distributions are Gaussian in shape with parameters that come from the normal equations – simple cokriging in geostatistics.

The use of the multivariate Gaussian distribution pervades geostatistics. Although there are heuristics and iterative algorithms, no other probability distribution model has seen practical use in the high dimensions encountered in geostatistics. There are advantages to a fully defined parametric distribution. A known distribution facilitates: (1) testing and model validation, (2) accurate and precise uncertainty calculations, (3) artifact free modeling, (4) understandable results with appropriate parameter variations, and (5) a framework for capturing parameter uncertainty. The widespread use of the multivariate Gaussian distribution has been informally referred to as the Gaussian disease. The negative connotation of this view is partially justified: the probability distributions of real data are never truly Gaussian, there are high order connected geological features to be considered and there are complex non-Gaussian multivariate relationships. The use of ad-hoc techniques with no consistent probability distribution is not a cure. The cure, if one is needed, consists of (1) a hybrid workflow involving surface modeling, multiple point statistics, and hierarchical modeling of geological features, and (2) transformation of non-Gaussian data to a Gaussian distribution.

The normal score transformation has long existed (see Normal Score Lesson). The projection pursuit multivariate transformation (PPMT) has evolved recently to support the transformation of collocated non-Gaussian multivariate data (see PPMT Lesson). In both cases, the back transformation brings back the non-Gaussian features in the final predictions. Two other transformations are suitable for categorical variables and variables with a trend. The hierarchical truncated pluriGaussian (HTPG) workflow permits complex and interesting geological features of categorical variables to be simulated. Detrending non-stationary variables by fitting a Gaussian Mixture Model (GMM) and applying a stepwise conditional transformation permits artifact free reproduction of a trend model. The results of these transformations are used in a conventional Gaussian simulation workflow. Back transformation makes the results consistent with the initial non-Gaussian features.

Despite the sensationalism of the Gaussian disease, the multivariate Gaussian distribution should be considered a constant of the universe like \(\pi\) or \(e\). Such constants are used in mathematical calculations all the time and we do not rage against them as an infection. They are some of the rare immutable constants in our lives – beautiful and reassuring.

The Resource Modeling Solutions Platform (RMSP) embraces the Gaussian distribution and implements algorithms that are practical for multicategory, multivariate and multilocation modeling. This is not to the exclusion of nonparametric distributions and heuristic algorithms, but taking full advantage of the only consistent multivariate distribution suitable for resource modeling.