5.2. Issues While Running

When running simulations, failures can occur due to various reasons such as incorrect setup, malfunctioning websites, user errors, or software bugs. Understanding the sequence of actions initiated by the user interface (UI) and backend upon job submission can aid in diagnosing issues. Here’s what happens when the Submit button is clicked:

  1. A temporary directory, tmp.SimCenter, is created in the working directory, followed by a templatedir within it.

  2. The UI processes all input panels, and workflow applications transfer necessary files to the templatedir.

  3. A Python script, specific to the simulation type (rWHALE.py for regional simulations, sWHALE.py for individual assessments), is executed from the templatedir. rWHALE serves as a wrapper, invoking sWHALE for each asset in regional assessments.

  4. For structural response simulations, uncertainty quantification is managed by the chosen UQ Engine (e.g., Dakota), which generates an input file (e.g., dakota.in) placed in tmp.SimCenter.

  5. The UQ engine runs the response simulation using the earlier created input file.

  6. For each structural response realization, the UQ engine creates a separate workdir folder within tmp.SimCenter.

  7. Upon completing all realizations, the UQ engine compiles the outputs from these folders into result files in tmp.SimCenter.

  8. Finally, the front-end processes these results, displaying them under the RES tab in the desktop application.

Below are common issues encountered during failures, along with troubleshooting steps:

  1. Could not create working dir: Lack of permission to create the tmp.SimCenter folder. Adjust the Local Jobs Directory and Remote Jobs Directory in the Preferences.

  2. No Script File: Missing main Python scripts (rWHALE.py, sWHALE.py, or qWHALE.py). Verify the Local Applications directory in Preferences points to the correct location. Reinstallation may be necessary if the applications folder is corrupted.

  3. Dakota failed to finish: The simulation requested from the UQ Engine terminated before completion. This can occur for a number of reasons. Go to the tmp.SimCenter folder and look for the dakota.err file. If no file exists then the UQ Engine did not even start. If the file exists look at its contents to learn more about the errors.

    1. No dakota.err file and no dakota.in file: the Python script in templatedir failed to create the files required to start the UQ Engine. Take a look at the workflow log file in the tmp.SimCenter folder to see if it shows any errors. These errors typically point to an incorrect setting in your input file. If no workflow log file exists, the Python script managing the workflow failed to start. This is typically caused by an issue with your Python installation. Make sure you’ve followed the instructions in the Installation guide and set up your Python environment properly.

    2. No dakota.err but dakota.in exists: Even though the required files are available, the UQ Engine failed to start. If Dakota was selected as the UQ Engine, check the Dakota installation. Also, make sure you’ve followed the instructions in the Installation guide and set up your Python environment properly.

    3. dakota.err file exists but it is empty: This means that the UQ Engine started successfully, but there was a problem during the simulation. Go to one of the workdir folders that contain individual realizations. You can run that simulation using the workflow driver file. Run the driver file from the command line to see what errors are generated during the simulation. These errors are typically related to the event or structural model description.

    4. dakota.err file exists: Open the file and see what the error is. For example, if it says Error: at least one variable must be specified. This means no random variables have been specified. You have only one deterministic event, or you have not specified any random variables for the analysis. Do not hesitate to reach out to the developers through the github discussion page if you do not understand the error messages in this file.

  4. You ran remotely at DesignSafe and no dakota.out files came back: Go to your data depot folder at DesignSafe using your browser. Go to archive/jobs and use the job number from the table that pops up when you ask to get the job results from DesignSafe. Study both the .err and .out files in that directory for information on what went wrong.

  5. No results and you used the Site Response to create the event. You must run a simulated event in the Site Response Widget before you can submit a job to run.

For unresolved issues, please seek assistance on the github discussion page.