2.1.7. pelicun.uq module¶

This module defines constants, classes and methods for uncertainty quantification in pelicun.

Contents

`mvn_orthotope_density`(mu, COV[, lower, upper])	Estimate the probability density within a hyperrectangle for an MVN distr.
`fit_distribution`(raw_samples, distribution)	Fit a distribution to samples using maximum likelihood estimation.
`RandomVariable`(name, distribution[, theta, …])	Description
`RandomVariableSet`(name, RV_list, Rho)	Description
`RandomVariableRegistry`()	Description

pelicun.uq.mvn_orthotope_density(mu, COV, lower=None, upper=None)[source]¶

Estimate the probability density within a hyperrectangle for an MVN distr.

Use the method of Alan Genz (1992) to estimate the probability density of a multivariate normal distribution within an n-orthotope (i.e., hyperrectangle) defined by its lower and upper bounds. Limits can be relaxed in any direction by assigning infinite bounds (i.e. numpy.inf).

Parameters

mu: float scalar or ndarray: Mean(s) of the non-truncated distribution.
COV: float ndarray: Covariance matrix of the non-truncated distribution
lower: float vector, optional, default: None: Lower bound(s) for the truncated distributions. A scalar value can be used for a univariate case, while a list of bounds is expected in multivariate cases. If the distribution is non-truncated from below in a subset of the dimensions, use either None or assign an infinite value (i.e. -numpy.inf) to those dimensions.
upper: float vector, optional, default: None: Upper bound(s) for the truncated distributions. A scalar value can be used for a univariate case, while a list of bounds is expected in multivariate cases. If the distribution is non-truncated from above in a subset of the dimensions, use either None or assign an infinite value (i.e. numpy.inf) to those dimensions.
Returns
——-
alpha: float: Estimate of the probability density within the hyperrectangle
eps_alpha: float: Estimate of the error in alpha.

pelicun.uq.fit_distribution(raw_samples, distribution, truncation_limits=[None, None], censored_count=0, detection_limits=[None, None], multi_fit=False, alpha_lim=0.0001)[source]¶

Fit a distribution to samples using maximum likelihood estimation.

The number of dimensions of the distribution are inferred from the shape of the sample data. Censoring is automatically considered if the number of censored samples and the corresponding detection limits are provided. Infinite or unspecified truncation limits lead to fitting a non-truncated distribution in that dimension.

Parameters

raw_samples: float ndarray: Raw data that serves as the basis of estimation. The number of samples equals the number of columns and each row introduces a new feature. In other words: a list of sample lists is expected where each sample list is a collection of samples of one variable.
distribution: {‘normal’, ‘lognormal’}: Defines the target probability distribution type. Different types of distributions can be mixed by providing a list rather than a single value. Each element of the list corresponds to one of the features in the raw_samples.
truncation_limits: float ndarray, optional, default: [None, None]: Lower and/or upper truncation limits for the specified distributions. A two-element vector can be used for a univariate case, while two lists of limits are expected in multivariate cases. If the distribution is non-truncated from one side in a subset of the dimensions, use either None or assign an infinite value (i.e. numpy.inf) to those dimensions.
censored_count: int, optional, default: None: The number of censored samples that are beyond the detection limits. All samples outside the detection limits are aggregated into one set. This works the same way in one and in multiple dimensions. Prescription of specific censored sample counts for sub-regions of the input space outside the detection limits is not supported.
detection_limits: float ndarray, optional, default: [None, None]: Lower and/or upper detection limits for the provided samples. A two-element vector can be used for a univariate case, while two lists of limits are expected in multivariate cases. If the data is not censored from one side in a subset of the dimensions, use either None or assign an infinite value (i.e. numpy.inf) to those dimensions.
multi_fit: bool, optional, default: False: If True, we attempt to fit a multivariate distribution to the samples. Otherwise, we fit each marginal univariate distribution independently and estimate the correlation matrix in the end based on the fitted marginals. Using multi_fit can be advantageous with censored data and if the correlation in the data is not Gaussian. It leads to substantially longer calculation time and does not always produce better results, especially when the number of dimensions is large.
alpha_lim: float, optional, default:None: Introduces a lower limit to the probability density within the n-orthotope defined by the truncation limits. Assigning a reasonable minimum (such as 1e-4) can be useful when the mean of the distribution is several standard deviations from the truncation limits and the sample size is small. Such cases without a limit often converge to distant means with inflated variances. Besides being incorrect estimates, those solutions only offer negligible reduction in the negative log likelihood, while making subsequent sampling of the truncated normal distribution very challenging.

Returns

theta: float ndarray: Estimates of the parameters of the fitted probability distribution in each dimension. The following parameters are returned for the supported distributions: normal - mean, standard deviation; lognormal - median, log standard deviation;
Rho: float 2D ndarray, optional: In the multivariate case, returns the estimate of the correlation matrix.

class pelicun.uq.RandomVariable(name, distribution, theta=None, truncation_limits=None, bounds=None, custom_expr=None, raw_samples=None, anchor=None)[source]¶

Bases: object

Description

Parameters

name: string: A unique string that identifies the random variable.
distribution: {‘normal’, ‘lognormal’, ‘multinomial’, ‘custom’, ‘empirical’,: ‘coupled_empirical’, ‘uniform’}, optional Defines the type of probability distribution for the random variable.
theta: float scalar or ndarray, optional: Set of parameters that define the cumulative distribution function of the variable given its distribution type. The following parameters are expected currently for the supported distribution types: normal - mean, standard deviation; lognormal - median, log standard deviation; uniform - a, b, the lower and upper bounds of the distribution; multinomial - likelihood of each unique event (the last event’s likelihood is adjusted automatically to ensure the likelihoods sum up to one); custom - according to the custom expression provided; empirical and coupled_empirical - N/A.
truncation_limits: float ndarray, optional: Defines the [a,b] truncation limits for the distribution. Use None to assign no limit in one direction.
bounded: float ndarray, optional: Defines the [P_a, P_b] probability bounds for the distribution. Use None to assign no lower or upper bound.
custom_expr: string, optional: Provide an expression that is a Python syntax for a custom CDF. The controlling variable shall be “x” and the parameters shall be “p1”, “p2”, etc.
anchor: RandomVariable, optional: Anchors this to another variable. If the anchor is not None, this variable will be perfectly correlated with its anchor. Note that the attributes of this variable and its anchor do not have to be identical.

Attributes

RV_set: Return the RV_set this RV is a member of
anchor: Return the anchor of the variable (if any).
bounds: Return the assigned probability bounds.
custom_expr: Return the assigned custom expression for CDF.
distribution: Return the assigned probability distribution type.
samples: Return the empirical or generated samples.
samples_DF: Return the empirical or generated samples in a pandas Series.
theta: Return the assigned probability distribution parameters.
truncation_limits: Return the assigned truncation limits.
uni_samples: Return the samples from the controlling uniform distribution.

Methods

`cdf`(values)	Returns the cdf at the given values
`inverse_transform`(values)	Uses inverse probability integral transformation on the provided values.
`inverse_transform_sampling`()	Creates samples using inverse probability integral transformation.

property distribution¶: Return the assigned probability distribution type.

property theta¶: Return the assigned probability distribution parameters.

property truncation_limits¶: Return the assigned truncation limits.

property bounds¶: Return the assigned probability bounds.

property custom_expr¶: Return the assigned custom expression for CDF.

property RV_set¶: Return the RV_set this RV is a member of

property samples_DF¶: Return the empirical or generated samples in a pandas Series.

property samples¶: Return the empirical or generated samples.

property uni_samples¶: Return the samples from the controlling uniform distribution.

property anchor¶: Return the anchor of the variable (if any).

cdf(values)[source]¶: Returns the cdf at the given values

inverse_transform(values)[source]¶: Uses inverse probability integral transformation on the provided values.

inverse_transform_sampling()[source]¶: Creates samples using inverse probability integral transformation.

class pelicun.uq.RandomVariableSet(name, RV_list, Rho)[source]¶

Bases: object

Description

Parameters

name: string: A unique string that identifies the set of random variables.
RV_list: list of RandomVariable: Defines the random variables in the set
Rho: float 2D ndarray: Defines the correlation matrix that describes the correlation between the random variables in the set. Currently, only the Gaussian copula is supported.

Attributes

RV: Return the random variable(s) assigned to the set
samples: Return the samples of the variables in the set
size: Return the size (i.e., number of variables in the) RV set

Methods

`Rho`([var_subset])	Return the (subset of the) correlation matrix.
`apply_correlation`()	Apply correlation to n dimensional uniform samples.
`orthotope_density`([lower, upper, var_subset])	Estimate the probability density within an orthotope for the RV set.

property RV¶: Return the random variable(s) assigned to the set

property size¶: Return the size (i.e., number of variables in the) RV set

property samples¶: Return the samples of the variables in the set

Rho(var_subset=None)[source]¶: Return the (subset of the) correlation matrix.

apply_correlation()[source]¶

Apply correlation to n dimensional uniform samples.

Currently, correlation is applied using a Gaussian copula. First, we try using Cholesky transformation. If the correlation matrix is not positive semidefinite and Cholesky fails, use SVD to apply the correlations while preserving as much as possible from the correlation matrix.

orthotope_density(lower=None, upper=None, var_subset=None)[source]¶

Estimate the probability density within an orthotope for the RV set.

Use the mvn_orthotope_density function in this module for the calculation. The distribution of individual RVs is not limited to the normal family. The provided limits are converted to the standard normal space that is the basis of all RVs in pelicun. Truncation limits and correlation (using Gaussian copula) are automatically taken into consideration.

Parameters

lower: float ndarray, optional, default: None: Lower bound(s) of the orthotope. A scalar value can be used for a univariate RV; a list of bounds is expected in multivariate cases. If the orthotope is not bounded from below in a dimension, use ‘None’ to that dimension.
upper: float ndarray, optional, default: None: Upper bound(s) of the orthotope. A scalar value can be used for a univariate RV; a list of bounds is expected in multivariate cases. If the orthotope is not bounded from above in a dimension, use ‘None’ to that dimension.
var_subset: list of strings, optional, default: None: If provided, allows for selecting only a subset of the variables in the RV_set for the density calculation.

Returns

alpha: float: Estimate of the probability density within the orthotope.
eps_alpha: float: Estimate of the error in alpha.

class pelicun.uq.RandomVariableRegistry[source]¶

Bases: object

Description

Attributes

RV: Return all random variable(s) in the registry
RV_samples: Return the samples for every random variable in the registry
RV_set: Return the random variable set(s) in the registry.

Methods

`RVs`(keys)	Return a subset of the random variables in the registry
`add_RV`(RV)	Add a new random variable to the registry.
`add_RV_set`(RV_set)	Add a new set of random variables to the registry
`generate_samples`(sample_size[, method, seed])	Generates samples for all variables in the registry.

property RV¶: Return all random variable(s) in the registry

RVs(keys)[source]¶: Return a subset of the random variables in the registry

add_RV(RV)[source]¶: Add a new random variable to the registry.

property RV_set¶: Return the random variable set(s) in the registry.

add_RV_set(RV_set)[source]¶: Add a new set of random variables to the registry

property RV_samples¶: Return the samples for every random variable in the registry

generate_samples(sample_size, method='LHS_midpoint', seed=None)[source]¶

Generates samples for all variables in the registry.

Parameters

sample_size: int: The number of samples requested per variable.
method: {‘random’, ‘LHS’, ‘LHS_midpoint’}, optional: The sample generation method to use. ‘random’ stands for conventional random sampling; ‘LHS’ is Latin HyperCube Sampling with random sample location within each bin of the hypercube; ‘LHS_midpoint’ is like LHS, but the samples are assigned to the midpoints of the hypercube bins.
seed: int, optional: Random seed used for sampling.