1.2. Methods in SimCenterUQ Engine¶
Nataf transformation¶
Nataf transformation is introduced to standardize the interface between UQ methods and Simulation models. Nataf converts the samples in the physical space (Xspace) into the standard normal samples (Uspace), \(T:\rm{X} \rightarrow \rm{U}\), and vice versa, during UQ computations [Liu1986]. Specifically, the latter transformation, called inverse Nataf \(T^{1}\), is performed each time when UQ engine generates sample points and calls external workflow applications, so that the main UQ algorithms would face only the standard normal random variables. Among various standardization transformations, Nataf is one of the most popular methods which exploits marginal distributions of each physical variables and their correlation coefficients.
(Ongoing implementation) All the custom UQ algorithm need to prepare for the standard Gaussian input while quoFEM intertwines with the simulation model to receive standard normal input u and gives physical y=G(u) back.
For the Nataf trasformation, the SimCenterUQ engine borrows a part of the distribution class structure developed by ERA group in the Technical University of Munich [ERA2019]
 Liu1986
Liu, P.L. and Der Kiureghian, A. (1986). Multivariate distribution models with prescribed marginals and covariances. Probabilistic Engineering Mechanics, 1(2), 105112.
 ERA2019
Engineering Risk Analysis Group, Technische Universität München: https://www.bgu.tum.de/era/software/eradist/ (Matlab, python programs and documentations)
Global sensitivity analysis¶
Variancebased global sensitivity indices¶
Global sensitivity analysis (GSA) is performed to quantify the contribution of each input variable to the uncertainty in QoI. Using the global sensitivity indices, users can set preferences between random variables considering both inherent randomness and its propagation through the model. GSA helps users to understand the overall impact of different sources of uncertainties, as well as to accelerate UQ computations by focusing on dominant dimensions or screening out trivial input variables.
Sobol indices are widely used variancebased global sensitivity measures. It has two types: main effect and total effect sensitivity indices. The main effect index finds the fraction of variance in QoI that can be attributed to specific input random variable(s) but without considering interactive effect with other input variables. The total effect index, on the other hand, additionally takes the interactions into account.
Given the output of model \(y=g(\boldsymbol{x})\) and input random variables \(\boldsymbol{x}=\{x_1,x_2, \cdots ,x_d\}\), the firstorder main and total effect indices of each input variable is defined as
respectively, where \(\boldsymbol{x}_{\sim i}\) indicates the set of all input variables except \(x_i\). It is noteworthy that in both equations, the variance operator \(\text{Var}_{x_i}[\cdot]\) captures only the part of uncertainty associated with \(x_i\), while the mean operator \(\text{E}_{\boldsymbol{x}_{\sim i}}[\cdot]\) averages out all remaining uncertainties. From the definitions, two indices theoretically have values between zero and one. Eq. (2.1.2.2.1) can also be understood intuitively. For example, if the QoI is insensitive to \(x_i\), the term inside \(\text{Var}_{x_i}[\cdot]\) is nearly constant and \(S_i\) approaches zero. On the other hand, when one single variable \(x_i\) dominates QoI, inside \(\text{Var}_{x_i}[\cdot]\) is approximately the same as \(y\), and thus \(S_i\) approaches one. Eq. (2.1.2.2.2) can be understood in similar ways. The secondorder main effect index that provides the pairwise interaction effect is defined as
where \(\boldsymbol{x}_{\sim ij}\) indicates the set of all input variables except \(x_i\) and \(x_j\). The higherorder indices are derived likewise. Theoretically, When all the input variables are uncorrelated to each other, the following equality holds.
Estimation of Sobol indices¶
GSA is typically computationally expensive. High computation cost attributes to the multiple integrations (\(d\)dimensional) associated with the variance and expectation operations shown in Eqs. (2.1.2.2.1) and (2.1.2.2.2). To reduce the computational cost, efficient Monte Carlo methods, stochastic expansion methods, or meta modelbased methods can be employed. Among different approaches, the SimCenterUQ engine supports the probability modelbased GSA (PMGSA) framework developed by [Hu2019].
The framework first conducts ordinary MCS to obtain inputoutput data pairs. Then by extracting only a subset dimension of the dataset, the probability distribution of a reduced dimension can be approximated and used for estimating the Sobol index. Among different probability distribution models introduced in [Hu2019] the Gaussian mixture model is implemented in this engine to approximate this lower dimension distribution. For example, to identify 1st order main Sobol index for a variable \(x_i\), a bivariate Gaussian mixture model is fitted for the joint probability distribution of \(x_i\) and \(y\), i.e.
using expectationmaximization (EM) algorithm. The mean operation Eq. (2.1.2.2.1) is then derived analytically from the Gaussian mixture model, while variance is approximated to be the sample variance. Therefore, the accuracy of the method depends on the quality of the base samples as well as the fitness of the mixture model. The below figure summarizes the procedure of Gaussian mixture modelbased PMGSA introduced in [Hu2019]. The number of mixture components is optimized along with the mixture parameters during expectationmaximization iterations.
Global surrogate modeling¶
Introduction to Gaussian process regression (Kriging)¶
Global surrogate modeling aims to build a regression model that reproduces the outcomes of computationally expensive high fidelity simulations.
where the basic assumption is that function evaluation speed of \(f^{\rm{sur}}(\boldsymbol{x})\) is incomparably faster than \(f^{\rm{ex}}(\boldsymbol{x})\). To perform surrogate modeling, we first need to acquire data samples, \((\boldsymbol{x},\boldsymbol{y})\), of exact model based on few rounds of model evaluations, and then the function is interpolated and extrapolated based on the data set. Among various surrogate techniques, Kriging approximates the response surface using a Gaussian process model. Specifically, Kriging surrogate model has the following form:
where the term \(\tilde{f}(\boldsymbol{x})^T\boldsymbol{\beta}\) captures the deterministic global trend via basis functions and linear combination coefficients \(\boldsymbol{\beta}\). The second term \(z(\boldsymbol{x})\) represents the residual and is modeled as a centered secondorder stationary Gaussian process. The assumption is that the true residual value is one of the realizations of the random process:
Therefore the main tasks of surrogate modeling is (1) to find optimal stochastic parameters \(\hat{\boldsymbol{\beta}}\) and \(\hat{K}(x_i,x_j)\) that best match the observations, and (2) to predict the response at an arbitrary sample point \(\boldsymbol{x^*}\) as a conditional distribution of \(f(\boldsymbol{y^*}\boldsymbol{y^{obs}})\), exploiting the fact that \(\boldsymbol{y^*}\) and \(\boldsymbol{y^{obs}}\) are joint Gaussian distribution with known mean and covariances.
Dealing with noisy measurements
In natural hazard applications, often the exact observations of \(\boldsymbol{y}\) are not available and only the noisy observations \(\boldsymbol{y^{obs}}\) are available:(1.2.9)¶\[ \boldsymbol{y^{obs}}=\boldsymbol{y} + \boldsymbol{\varepsilon} =f^{\rm{ex}} (\boldsymbol{x}) + \boldsymbol{\varepsilon}\]where a common assumption is that the measurement noise, \(\boldsymbol{\varepsilon}\), follows a white Gaussian distribution (i.e. \(\varepsilon\) is unbiased, follows a normal distribution with variance \(\tau\), and is independent of other observation noises). Additionally since the noise level is often unknown, \(\tau\) is also calibrated along with \(\beta\) and \(K(x_i,x_j)\). In such settings, surrogate model estimation will not interpolate the observation outputs \(\boldsymbol{y^{obs}}\), but instead make a regression curve passing through the optimal estimation of the true underlying outputs \(\boldsymbol{y}\). Additional to the measurement noise, a mild amount of inherent uncertainty in the simulation model (mild compared to a global trend) can be accounted for in terms of the same noise parameter \(\varepsilon\).Nugget effect: artificial noise for numerical stability
The constructed Kriging surrogate model is always smooth and continuous as it is a realization of a Gaussian process, while the actual response may be nonsmooth, discontinuous, or highly variant that goes beyond the flexibility of the surrogate model. Especially when the measurements are noiseless, the Gaussian process training can suffer from numerical instability. In such illposed problems, the introduction of a small amount of artificial noise, often referred to as nugget, may significantly improve the algorithmic stability. In the quoFEM, the nugget parameter is automatically optimized in the loop along with the other parameters. (Note: technically, nugget effect and measurement noise do not coincide in the mathematical formulation as the nugget effect conserves the interpolating property while measurement noise does not [Roustant2012]. However, this program treats the nugget as an artificial noise as their outcomes are often practically indistinguishable.)
Construction of surrogate model¶
InputOutput settings¶
Input (RV) type 
Output (QoI) type 


Case1 
Adaptive Design of Experiments (DoE) : a bounded variable space of \(\boldsymbol{x}\) 
Simulator : \(\boldsymbol{y}=f(\boldsymbol{x})\) 

Case2 
Data set : {\(\boldsymbol{x_1,x_2, ... ,x_N}\)} 
Simulator : \(\boldsymbol{y}=f(\boldsymbol{x})\) 

Case3 
Data set : {\(\boldsymbol{x_1,x_2, ... ,x_N}\)} 
Data set : {\(\boldsymbol{y_1,y_2, ... ,y_N}\)} 
User have the following options:
Case1 : users can provide a range of input variables (bounds) and a simulation model. After initial spacefilling phase using Latin hypercube sampling (LHS), adaptive design of experiment (DoE) is activated. Given current predictions, the next optimal simulation point is optimized such that expected gain is maximized.
Case2 : users can provide pairs of inputoutput dataset
Case3 : users can provide input data points and a simulation model
Kernel and basis functions¶
The covariance kernel of the outcome process is unknown in most practical applications. Therefore, the mathematical form of the kernel is first assumed, and its parameters are calibrated based on the observation data. Followings are some of the popular stationary covariance kernels.
Radialbasis function (RBF)
Radialbasis function, also known as squaredexponential or Gaussian kernel, is one of the most widely used covariance kernel.(1.2.10)¶\[k(\boldsymbol{x_i},\boldsymbol{x_j}) = \sigma\prod_{d=1}^{D} \exp\Bigg(\frac{1}{2} \frac{(x_{i,d}x_{j,d})^2}{l_d^2}\Bigg)\]where \(\boldsymbol{x_i}\) and \(\boldsymbol{x_j}\) are two arbitrary points in the domain and the hyper parameters, \(D\) is number of the input variables. The parameters \(\sigma\) and \(l_d\) respectively control the error scale and correlation length of the process.
Exponential
Similarly, exponential covariance function is defined as follows.(1.2.11)¶\[k(\boldsymbol{x_i},\boldsymbol{x_j}) = \sigma\prod_{d=1}^{D} \exp\Bigg(\frac{1}{2} \frac{x_{i,d}x_{j,d}}{l_d}\Bigg)\]Matern Class
Matern class of covariance function is another popular choice. It has a positive shape parameter often denotoed as \(\nu\) which additionally determines the roughness of the parameters. For Kriging regression, \(\nu=5/2\) and \(\nu=3/2\) is known to be generally applicable choices considering roughness property and the simplicity of the functional form. [Rasmussen2006](1.2.12)¶\[k(\boldsymbol{x_i},\boldsymbol{x_j}) = \sigma\prod_{d=1}^{D} g_d(h_{d})\]where \(h_d = x_{i,d}x_{j,d}\) and(1.2.13)¶\[\begin{split}g_{d,\frac{5}{2}}(h_d) &= \Bigg(1+ \frac{\sqrt{5}h_d}{l_d}+\frac{5h_d^2}{3l_d^2}\Bigg)\exp\Bigg(\frac{\sqrt{5}h_d}{l_d}\Bigg) \\ g_{d,\frac{3}{2}}(h_d) &= \Bigg(1+ \frac{\sqrt{3}h_d}{l_d}\Bigg)\exp\Bigg(\frac{\sqrt{3}h_d}{l_d}\Bigg)\end{split}\]respectively for \(\nu=5/2\) (smoother) and \(\nu=3/2\) (rougher). It is noted in the literature that if \(\nu\) is greater than \(5/2\), the Matern kernel behaves similar to the radialbasis function.
Once the kernel form is selected, the parameters are calibrated to maximize the likelihood of observations within the Gaussian process model. The default optimization function embedded in GPy is limitedmemory BFGS with bound constraints (LBFGSB) algorithm from Python/Numpy package. [ShaffieldML2012]
Adaptive Design of Experiments (DoE)¶
In the case where bounds of input variables and a simulator model is provided (Case 1), model evaluation points can be selected by spacefilling methods, e.g. Latin hyper cube sampling (LHS). This is nonadaptive Design of Experiments (DoE) in a sense that the whole samples can be located before running any simulations. On the other hand, the number of model evaluations can be reduced by selecting evaluation points adaptively after each run to get the best model improvements.
However, as shown in the figure, adaptive DoE requires multiple optimization turns to find the optimal surrogate model parameters as well as the next optimal DoE. Therefore, it is noted that the adaptive DoE is efficient only when model evaluation time is significantly greater than the optimization time.
Adaptive DoE algorithm: IMSEw, MMSEw [Kyprioti2020]_
The optimal design points can be selected by finding arguments that maximize (or minimize) the socalled score function. The score function in global surrogate modeling is often designed to predict the amount of reduced (or remaining) variance and bias after adding the new sample points. While there are many variations of the score function [Fuhg2020], in quoFEM, the modified integrated mean squared error (IMSE) from Kyprioti et al. (2020) is introduced as:
where \(\phi\) is bias measure from leaveoneout cross validation (LOOCV) analysis, \(\rho\) is a weighting coefficient, and \(\boldsymbol{\sigma_n}^2(\boldsymbol{x}\boldsymbol{X,x_{new}})\) is the predictive variance after additional observation \(x_{new}\) [Kyprioti2020]. To find the sample location that gives minimum IMSE value, two step screeningclustering algorithm is implemented.
Adaptive DoE algorithm: Pareto
Alternatively, multiple design points can be selected by multiobjective optimization scheme. The variance mesure and bias measure defined by
Adaptive DoE is terminated when one of the three conditions is met:
Time: analysis time exceeds a predefined (rough) time constraint
Count: number of model evaluation exceeds a predefined count constraint
Accuracy: accuracy measure of the model meets a predefined convergence level
Verification of surrogate model¶
Once the training is completed, the following three verification measures are presented based on leaveoneout crossvalidation (LOOCV) error estimation.
Leaveoneout crossvalidation (LOOCV)
LOOCV prediction \(\hat{\boldsymbol{y}}_k\) at each sample location \(\boldsymbol{x}_k\) is obatined by the following procedure: A temporary surrogate model \(\hat{\boldsymbol{y}}=f^{sur}_{loo,k}(\boldsymbol{\boldsymbol{x}})\) is constructed using the samples \(\{\boldsymbol{x}_1,\boldsymbol{x}_2,...,\boldsymbol{x}_{k1},\boldsymbol{x}_{k+1},...,\boldsymbol{x}_N\}\) and the calibrated parameters, and the prediction \(\hat{\boldsymbol{y}}_k=f^{sur}_{loo,k}(\boldsymbol{x}_k)\) is compared with the exact outcome \(y_k=f(\boldsymbol{x}_k)\).R2 error
R2 error is defined in terms of the total sum of squares over the residual sum of squares(1.2.16)¶\[\begin{align*} R^2 &= 1  \frac{\sum^N_{k=1} (\hat{y}_k\mu_\hat{y})^2}{\sum^N_{k=1} (\hat{y}_ky_k)^2} \end{align*}\]The surrogate model is considered welltrained when the R2 (<1) approaches 1Normalized rootmeansquarederror (NRMSE)
(1.2.17)¶\[\begin{align*} \rm{NRMSE} ~ &= \frac{\sqrt{\frac{1}{N_t} \sum^{N_t}_{k=1} (y_k\hat{y}_k)^2}}{\max_{k=1,...,N_t}(y_k)\min_{k=1,...,N_t}(y_k)} \end{align*}\]The surrogate model is considered welltrained when the NRMSE (>0) approaches 0Correlation coefficient
Correlation coefficient is a statistic that measures linear correlation between two variables(1.2.18)¶\[ \rho_{y,\hat{y}} = \frac{\sum^N_{k=1}(y_k\mu_{y})(\hat{y}_k\mu_{\hat{y}})} {\sigma_y \sigma_\hat{y}}\]where\(\mu_{y}\) : mean of \(\{y_k\}\)\(\mu_{\hat{y}}\): mean of \(\{\hat{y}_k\}\)\(\sigma_{y}\): standard deviation of \(\{y_k\}\)\(\sigma_{\hat{y}}\): standard deviation of \(\{\hat{y}_k\}\)The surrogate model is considered welltrained when the correlation coefficient ( \(1<\rho<1\) ) approaches 1
Note
Since these measures are calculated from the crossvalidation predictions rather than external validation predictions, they can be biased, particularly when a highly localized nonlinear range exists in the actual response surface and those regions are not covered by the training samples.
 Rasmussen2006
Rasmussen, C.E. and Williams, C.K. (2006). Gaussian Process for Machine Learning. Cambridge, MA: The MIT Press, 2006 (available online at http://www.gaussianprocess.org/gpml/)
 Kyprioti2020(1,2)
Kyprioti, A.P., Zhang, J., and Taflanidis, A.A. (2020). Adaptive design of experiments for global Kriging metamodeling through crossvalidation information. Structural and Multidisciplinary Optimization, 123.
 ShaffieldML2012
GPy, A Gaussian process framework in python, http://github.com/SheffieldML/GPy, since 2012
 Sacks1989
Sacks J.,Welch W.J.,Mitchell T.J.,Wynn H.P. (1989). Design and analysis of computer experiments. Stat Sci 4(4):409–435
 Fuhg2020
Fuhg, J.N., Fau, A., and Nackenhorst, U. (2020). Stateoftheart and comparative review of adaptive sampling methods for kriging. Archives of Computational Methods in Engineering, 159.
 Roustant2012
Roustant, O., Ginsbourger, D., and Deville, Y. (2012). DiceKriging, DiceOptim: Two R packages for the analysis of computer experiments by krigingbased metamodeling and optimization. Journal of Statistical Software, 21:1–55, 2012