2.1.7. pelicun.uq module

This module defines constants, classes and methods for uncertainty quantification in pelicun.

Contents

mvn_orthotope_density(mu, COV[, lower, upper])

Estimate the probability density within a hyperrectangle for an MVN distr.

fit_distribution(raw_samples, distribution)

Fit a distribution to samples using maximum likelihood estimation.

RandomVariable(name, distribution[, theta, …])

Description

RandomVariableSet(name, RV_list, Rho)

Description

RandomVariableRegistry()

Description

pelicun.uq.mvn_orthotope_density(mu, COV, lower=None, upper=None)[source]

Estimate the probability density within a hyperrectangle for an MVN distr.

Use the method of Alan Genz (1992) to estimate the probability density of a multivariate normal distribution within an n-orthotope (i.e., hyperrectangle) defined by its lower and upper bounds. Limits can be relaxed in any direction by assigning infinite bounds (i.e. numpy.inf).

Parameters
mu: float scalar or ndarray

Mean(s) of the non-truncated distribution.

COV: float ndarray

Covariance matrix of the non-truncated distribution

lower: float vector, optional, default: None

Lower bound(s) for the truncated distributions. A scalar value can be used for a univariate case, while a list of bounds is expected in multivariate cases. If the distribution is non-truncated from below in a subset of the dimensions, use either None or assign an infinite value (i.e. -numpy.inf) to those dimensions.

upper: float vector, optional, default: None

Upper bound(s) for the truncated distributions. A scalar value can be used for a univariate case, while a list of bounds is expected in multivariate cases. If the distribution is non-truncated from above in a subset of the dimensions, use either None or assign an infinite value (i.e. numpy.inf) to those dimensions.

Returns
——-
alpha: float

Estimate of the probability density within the hyperrectangle

eps_alpha: float

Estimate of the error in alpha.

pelicun.uq.fit_distribution(raw_samples, distribution, truncation_limits=[None, None], censored_count=0, detection_limits=[None, None], multi_fit=False, alpha_lim=0.0001)[source]

Fit a distribution to samples using maximum likelihood estimation.

The number of dimensions of the distribution are inferred from the shape of the sample data. Censoring is automatically considered if the number of censored samples and the corresponding detection limits are provided. Infinite or unspecified truncation limits lead to fitting a non-truncated distribution in that dimension.

Parameters
raw_samples: float ndarray

Raw data that serves as the basis of estimation. The number of samples equals the number of columns and each row introduces a new feature. In other words: a list of sample lists is expected where each sample list is a collection of samples of one variable.

distribution: {‘normal’, ‘lognormal’}

Defines the target probability distribution type. Different types of distributions can be mixed by providing a list rather than a single value. Each element of the list corresponds to one of the features in the raw_samples.

truncation_limits: float ndarray, optional, default: [None, None]

Lower and/or upper truncation limits for the specified distributions. A two-element vector can be used for a univariate case, while two lists of limits are expected in multivariate cases. If the distribution is non-truncated from one side in a subset of the dimensions, use either None or assign an infinite value (i.e. numpy.inf) to those dimensions.

censored_count: int, optional, default: None

The number of censored samples that are beyond the detection limits. All samples outside the detection limits are aggregated into one set. This works the same way in one and in multiple dimensions. Prescription of specific censored sample counts for sub-regions of the input space outside the detection limits is not supported.

detection_limits: float ndarray, optional, default: [None, None]

Lower and/or upper detection limits for the provided samples. A two-element vector can be used for a univariate case, while two lists of limits are expected in multivariate cases. If the data is not censored from one side in a subset of the dimensions, use either None or assign an infinite value (i.e. numpy.inf) to those dimensions.

multi_fit: bool, optional, default: False

If True, we attempt to fit a multivariate distribution to the samples. Otherwise, we fit each marginal univariate distribution independently and estimate the correlation matrix in the end based on the fitted marginals. Using multi_fit can be advantageous with censored data and if the correlation in the data is not Gaussian. It leads to substantially longer calculation time and does not always produce better results, especially when the number of dimensions is large.

alpha_lim: float, optional, default:None

Introduces a lower limit to the probability density within the n-orthotope defined by the truncation limits. Assigning a reasonable minimum (such as 1e-4) can be useful when the mean of the distribution is several standard deviations from the truncation limits and the sample size is small. Such cases without a limit often converge to distant means with inflated variances. Besides being incorrect estimates, those solutions only offer negligible reduction in the negative log likelihood, while making subsequent sampling of the truncated normal distribution very challenging.

Returns
theta: float ndarray

Estimates of the parameters of the fitted probability distribution in each dimension. The following parameters are returned for the supported distributions: normal - mean, standard deviation; lognormal - median, log standard deviation;

Rho: float 2D ndarray, optional

In the multivariate case, returns the estimate of the correlation matrix.

class pelicun.uq.RandomVariable(name, distribution, theta=None, truncation_limits=None, bounds=None, custom_expr=None, raw_samples=None, anchor=None)[source]

Bases: object

Description

Parameters
name: string

A unique string that identifies the random variable.

distribution: {‘normal’, ‘lognormal’, ‘multinomial’, ‘custom’, ‘empirical’,

‘coupled_empirical’, ‘uniform’}, optional Defines the type of probability distribution for the random variable.

theta: float scalar or ndarray, optional

Set of parameters that define the cumulative distribution function of the variable given its distribution type. The following parameters are expected currently for the supported distribution types: normal - mean, standard deviation; lognormal - median, log standard deviation; uniform - a, b, the lower and upper bounds of the distribution; multinomial - likelihood of each unique event (the last event’s likelihood is adjusted automatically to ensure the likelihoods sum up to one); custom - according to the custom expression provided; empirical and coupled_empirical - N/A.

truncation_limits: float ndarray, optional

Defines the [a,b] truncation limits for the distribution. Use None to assign no limit in one direction.

bounded: float ndarray, optional

Defines the [P_a, P_b] probability bounds for the distribution. Use None to assign no lower or upper bound.

custom_expr: string, optional

Provide an expression that is a Python syntax for a custom CDF. The controlling variable shall be “x” and the parameters shall be “p1”, “p2”, etc.

anchor: RandomVariable, optional

Anchors this to another variable. If the anchor is not None, this variable will be perfectly correlated with its anchor. Note that the attributes of this variable and its anchor do not have to be identical.

Attributes
RV_set

Return the RV_set this RV is a member of

anchor

Return the anchor of the variable (if any).

bounds

Return the assigned probability bounds.

custom_expr

Return the assigned custom expression for CDF.

distribution

Return the assigned probability distribution type.

samples

Return the empirical or generated samples.

samples_DF

Return the empirical or generated samples in a pandas Series.

theta

Return the assigned probability distribution parameters.

truncation_limits

Return the assigned truncation limits.

uni_samples

Return the samples from the controlling uniform distribution.

Methods

cdf(values)

Returns the cdf at the given values

inverse_transform(values)

Uses inverse probability integral transformation on the provided values.

inverse_transform_sampling()

Creates samples using inverse probability integral transformation.

property distribution

Return the assigned probability distribution type.

property theta

Return the assigned probability distribution parameters.

property truncation_limits

Return the assigned truncation limits.

property bounds

Return the assigned probability bounds.

property custom_expr

Return the assigned custom expression for CDF.

property RV_set

Return the RV_set this RV is a member of

property samples_DF

Return the empirical or generated samples in a pandas Series.

property samples

Return the empirical or generated samples.

property uni_samples

Return the samples from the controlling uniform distribution.

property anchor

Return the anchor of the variable (if any).

cdf(values)[source]

Returns the cdf at the given values

inverse_transform(values)[source]

Uses inverse probability integral transformation on the provided values.

inverse_transform_sampling()[source]

Creates samples using inverse probability integral transformation.

class pelicun.uq.RandomVariableSet(name, RV_list, Rho)[source]

Bases: object

Description

Parameters
name: string

A unique string that identifies the set of random variables.

RV_list: list of RandomVariable

Defines the random variables in the set

Rho: float 2D ndarray

Defines the correlation matrix that describes the correlation between the random variables in the set. Currently, only the Gaussian copula is supported.

Attributes
RV

Return the random variable(s) assigned to the set

samples

Return the samples of the variables in the set

size

Return the size (i.e., number of variables in the) RV set

Methods

Rho([var_subset])

Return the (subset of the) correlation matrix.

apply_correlation()

Apply correlation to n dimensional uniform samples.

orthotope_density([lower, upper, var_subset])

Estimate the probability density within an orthotope for the RV set.

property RV

Return the random variable(s) assigned to the set

property size

Return the size (i.e., number of variables in the) RV set

property samples

Return the samples of the variables in the set

Rho(var_subset=None)[source]

Return the (subset of the) correlation matrix.

apply_correlation()[source]

Apply correlation to n dimensional uniform samples.

Currently, correlation is applied using a Gaussian copula. First, we try using Cholesky transformation. If the correlation matrix is not positive semidefinite and Cholesky fails, use SVD to apply the correlations while preserving as much as possible from the correlation matrix.

orthotope_density(lower=None, upper=None, var_subset=None)[source]

Estimate the probability density within an orthotope for the RV set.

Use the mvn_orthotope_density function in this module for the calculation. The distribution of individual RVs is not limited to the normal family. The provided limits are converted to the standard normal space that is the basis of all RVs in pelicun. Truncation limits and correlation (using Gaussian copula) are automatically taken into consideration.

Parameters
lower: float ndarray, optional, default: None

Lower bound(s) of the orthotope. A scalar value can be used for a univariate RV; a list of bounds is expected in multivariate cases. If the orthotope is not bounded from below in a dimension, use ‘None’ to that dimension.

upper: float ndarray, optional, default: None

Upper bound(s) of the orthotope. A scalar value can be used for a univariate RV; a list of bounds is expected in multivariate cases. If the orthotope is not bounded from above in a dimension, use ‘None’ to that dimension.

var_subset: list of strings, optional, default: None

If provided, allows for selecting only a subset of the variables in the RV_set for the density calculation.

Returns
alpha: float

Estimate of the probability density within the orthotope.

eps_alpha: float

Estimate of the error in alpha.

class pelicun.uq.RandomVariableRegistry[source]

Bases: object

Description

Attributes
RV

Return all random variable(s) in the registry

RV_samples

Return the samples for every random variable in the registry

RV_set

Return the random variable set(s) in the registry.

Methods

RVs(keys)

Return a subset of the random variables in the registry

add_RV(RV)

Add a new random variable to the registry.

add_RV_set(RV_set)

Add a new set of random variables to the registry

generate_samples(sample_size[, method, seed])

Generates samples for all variables in the registry.

property RV

Return all random variable(s) in the registry

RVs(keys)[source]

Return a subset of the random variables in the registry

add_RV(RV)[source]

Add a new random variable to the registry.

property RV_set

Return the random variable set(s) in the registry.

add_RV_set(RV_set)[source]

Add a new set of random variables to the registry

property RV_samples

Return the samples for every random variable in the registry

generate_samples(sample_size, method='LHS_midpoint', seed=None)[source]

Generates samples for all variables in the registry.

Parameters
sample_size: int

The number of samples requested per variable.

method: {‘random’, ‘LHS’, ‘LHS_midpoint’}, optional

The sample generation method to use. ‘random’ stands for conventional random sampling; ‘LHS’ is Latin HyperCube Sampling with random sample location within each bin of the hypercube; ‘LHS_midpoint’ is like LHS, but the samples are assigned to the midpoints of the hypercube bins.

seed: int, optional

Random seed used for sampling.