Content of PetroWiki is intended for personal use only and to supplement, not replace, engineering judgment. SPE disclaims any and all liability for your use of such content. More information

# Challenges with probabilistic models

Just as there are shortcomings of deterministic models that can be avoided with probabilistic models, the latter have their associated pitfalls as well. Adding uncertainty, by replacing single estimate inputs with probability distributions, requires the user to exercise caution on several fronts. Without going into exhaustive detail we offer a couple of illustrations.

## Key differences with probabilistic models

First, the probabilistic model is more complicated. It demands more documentation and more attention to logical structure. In particular, each iteration of a Monte Carlo model should be a plausible realization. The purpose of using a range of values for each input is to acknowledge the realm of possibilities. Thus, once each of the input distributions is sampled, the resulting case should be sensible, something an expert would agree is possible.

Second, our criticism of the classical sensitivity analysis procedures (tornado charts and spider diagrams) included the notion that some of the inputs would not be independent. Thus, our probabilistic model should address any relationships between variables, which typically are handled by imposing correlation between pairs of input distributions. Each of these coefficients requires a value between –1 and +1; it is the model builder’s responsibility to assign and justify these values, which may be based on historical data or experience.

## Data availability and usefulness

Probabilistic models rely on sensible choices of input distributions. “Garbage in/garbage out” is an often-heard complaint of skeptics and bears acknowledging. While it is true of any model that the results are only as good as the inputs, Monte Carlo models seem to draw more criticism about this aspect. Harbaugh et al.[1] take an extreme position, arguing that one cannot do uncertainty analysis without adequate analogous data. One consultant specializing in Monte Carlo simulation takes another view when he tells his clients, “I don’t want to see any data. Instead, I want to build the model first, then do sensitivity analysis and find out what kind of data we really need to start collecting.” Somewhere between these extremes lies a sensible position of relying on:

• Experience (“I have had the opportunity to study data for this parameter in the past, and while I have no legitimate offset data, I know that under these circumstances the average net pay is slightly skewed right and has a coefficient of variation of about 15%.”)
• Fundamental principles (“This input can be viewed as an aggregation, so its distribution must be approximately normal.”)
• Appropriate data to form estimates of inputs to a model.

A related problem arises when the data available is simply not appropriate. It is common to collect data from different populations (lumping porosities from different facies, drilling penetration rates at different depths) and for prices in different seasons. Sometimes, simply plotting a histogram of the empirical data reveals bimodal behavior, almost always a sign of mixing samples from different populations. Naturally, data used as a basis for building a distribution should be vetted for measurement and clerical errors. However, one should be wary of tossing out extreme values to make the data look more like a familiar distribution. Rather, one should try to determine how the extreme values came about; they may be your best samples.

Novices always want to know how many data are necessary before one can reliably build a distribution based on them. This is not a simple matter. You may find a quick answer in a statistics text about significance, but in our world, we often do not have an adequate number of samples for statistical significance, and yet we must work the problem. The question comes down to how many points you need to build a “sensible” histogram. Curve-fitting software does not work very well with fewer than 15 points. Rather than relying on some automatic process, one should use common sense and experience. Among other things, one can often guess the distribution type (at least whether it is symmetric or the direction of skewness) and then look to use the minimum and maximum values for P10 and P90 or P5 and P95 as a starting point.

## Level of detail

Often, a problem can be analyzed at various levels of detail. Cost models are a good case in point. In one large Gulf of Mexico deepwater billion-dollar development, the Monte Carlo model had 1,300 line items. Another client built a high-level cost estimate for construction of a floating, production, storage, and offloading (FPSO) vessel with only 12 items. Production forecasts for fields can be done at a single-well level, then aggregated or simply done as a single forecast with a pattern of ramp-up then plateau, followed by decline. Cash-flow models tend to be large when they have small time steps of weeks or months, as opposed to years.

In every case, the model builder must choose a sensible level of detail, much like a person doing numerical reservoir simulation must decide how many gridblocks to include. Among the guidelines are these:

• Consider building two or more models—one more coarse than the other(s).
• Consider doing some modeling in stages, using the outputs of some components as inputs to the next stage. This process can lead to problems when there are significant correlations involved.
• Work at a level of detail where the experts really understand the input variables and where data may be readily accessible.

In the end, common sense and the 80/20 rule apply. You cannot generally have the luxury of making a career out of building one model; you must obtain other jobs and get the results to the decision makers in a timely fashion.

## Handling rare events

Rare events generally can be modeled with a combination of a discrete variable (Does this event occur or not?) and a continuous variable (When the event occurs, what is the range of possible implications?). Thus, “stuck pipe while drilling” (discussed in detail elsewhere in this chapter) can be described with a binomial variable with n = 1 and p = P(stuck) and “Stuck Time” and perhaps “Stuck Cost” as continuous variables. This method applies as well to downtime, delays, and inefficiencies.

## Impact of correlation

Correlation can make a difference in Monte Carlo models. As discussed in Murtha[2]:

“What does correlation do to the bottom line? Does it alter the distribution of reserves or cost or NPV, which is, after all, the objective of the model? If so, how? We can make some generalizations, but remember Oliver Wendell Holmes’s admonition, ‘No generalization is worth a damn...including this one.’
First, a positive correlation between two inputs results in more pairs of two large values and more pairs of two small values. If those variables are multiplied together in the model (e.g., a reserves model), it results in more extreme values of the output.
Even in a summation or aggregation model (aggregating production from different wells or fields, aggregating reserves, estimating total cost by summing line items, estimating total time), positive correlation between two summands causes the output to be more dispersed.
In short, in either a product model or an aggregation model, a positive correlation between two pairs of variables increases the standard deviation of the output. The surprising thing is what happens to the mean value of the output when correlation is included in the model.
For product models, positive correlation between factors increases the mean value of the output. For aggregation models, the mean value of the output is not affected by correlation among the summands. Let us hasten to add that many models are neither pure products nor pure sums, but rather complex algebraic combinations of the various inputs.”

## Impact of distribution type

A standard exercise in Monte Carlo classes is to replace one distribution type with another for several inputs to a model and compare the results. Often, the students are surprised to find that the difference can be negligible. Rather than generalizing, however, it is a good idea to do this exercise when building a model, prior to the presentation. That is, when there are competing distributions for an input parameter, one should test the effect on the bottom line of running the model with each type of distribution. Simple comparisons on the means and standard deviations of key outputs would suffice, but a convincing argument can be generated by overlying the two cumulative curves of a key output obtained from alternative distributions.

## Corporate policies

Unlike many other technical advances, uncertainty analysis seems to have met with considerable opposition. It is common in companies to have isolated pockets of expertise in Monte Carlo simulation in which the analysis results have to be reduced to single-value estimates. That is, rather than presenting a distribution of reserves, NPV, or drilling cost, only a mean value or a P50 from the respective distribution is reported. It is rare for an entire company to agree to do all their business using the language of statistics and probability.

## References

1. Harbaugh, J., Davis, J., and Wendebourg, J. 1995. Computing Risk for Oil Prospects. New York City: Pergamon Press.
2. Murtha, J.A. 2001. Risk Analysis for the Oil Industry. Supplement to Hart’s E&P (August 2001) 1–25. http://www.jmurtha.com/downloads/riskanalysisClean.pdf