C: Tapis-cli¶
Today we have a number of exercises. The purpose of these exercises is to set it up so that you are able to run your parallel application on Frontera or Stampede2 by issuing commands in the terminal of your desktop using your DesignSafe account and resources it makes available to you. Time permitting we will share the applications with fellow classmates. The advantage of being able to do this from a terminal is convenience and speed, e.g. you no longer will be required to login and find a token, cd to appropriate directories, edit submit scripts, and so on. Ultimately, which as you progress in your careers you will begin to understand, the ability to share your work is one of the really great advantages provided.
There are 5 steps to the exercise today. The steps follow the videos presented for todays class (these are enclosed in hints herein). The exercises are outlined below the hint.
However, before you can begin, you need to make sure the tapis command is working from a terminal application on your desktop. The note contains setup instructions for this, the warning a warning about possible changes that you may need to make if you are running Windows 10.
Note
Before you can proceed, initial setup is required of the Tapis-cli. This is done by invoking the dollowing in a linux (see warning below) shell. For some systems, i.e. ubuntu you should use pip3 instead of pip:
pip install tapis-cli
Once the cli is installed you need to configure it to use deignsafe and you need to provide your username and password:
tapis auth initThe application will prompt for a number of things. For tenant enter designsafe, for username and password your DesignSafe username and password. For others, ignore by just hitting enter.
Warning
If you fail in last part of first exercise it means that the tapis-cli is not going to work. From past experience we have found the install works for some, but not all, and we are not sure why. You can still use the work you have done up until that point in the exercise, but you need to do something different different to complete it. We provide 3 solutions, in order of time to proceed:
Try and update to the latest version of tapis-cli:
$ git clone https://github.com/TACC-Cloud/tapis-cli.git $ cd tapis-cli $ pip install --upgrade .
Run the exercise at TACC using either Frontera or Stampede2. To do this you need to login to your TACC machine and install tapis-cli as a local user.
pip intall tapis-cli --user
Install and use the Ubuntu subsystem on Windows 10
The Ubuntu subsystem is actually a full Ubuntu linux system running within a virtual machine while Windows is running as the primary OS. Microsoft provides step-by-step instructions on how to install it on your Windows system. See https://docs.microsoft.com/en-us/windows/wsl/install-win10 for details.
Warning: If you are running Windows in a virtual environment such as VMware or Parallels, installing the Ubuntu subsystem in Windows will fail. Install Ubuntu in a separate virtual machine instead.
Step 1: Setting Up an Execution System¶
Tapis provides systems which provide access to the file systems and hardware resources, e.g. Frontera and Stampede, at TACC. For our Storage System we will be using the default provided by DesignSafe (designsafe.storage.default). In this exercise we are going to create an execution system by providing a .json file containing basic information to one of our machine logins, e.g. username, password, and locations within our machine account of directories to place temporary files (the location the files are placed when we run an app).
Before you create a system, you might want to issue some tapis commands to get comfortable with the tapis-cli.
tapis -h tapis systems -h tapis systems search --default eq true tapis systems show designsafe.storage.default tapis systems list tapis systems search --execution-type eq HPC
You first need to edit a file provided in the code/agave/ agave directory. We have two template systems to chooose from , fronteraSystem.json and stampede2Syatem.json, which to use depends on which TACC system you have been logging into.
The Frontera file is shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | {
"maxSystemJobs": 500,
"executionType": "HPC",
"available": true,
"description": "Frontera has two computing subsystems, a primary computing system focused on double precision performance, and a second subsystem focused on single precision streaming-memory computing.",
"storage": {
"proxy": null,
"protocol": "SFTP",
"mirror": false,
"host": "frontera.tacc.utexas.edu",
"port": 22,
"auth": {
"type": "PASSWORD",
"username": "${USERNAME}",
"password": "${PASSWORD}"
},
"homeDir": "/",
"rootDir": "${SCRATCH_DIR}"
},
"type": "EXECUTION",
"login": {
"proxy": null,
"protocol": "SSH",
"port": 22,
"auth": {
"type": "PASSWORD",
"username": "${USERNAME}",
"password": "${PASSWORD}"
},
"host": "frontera.tacc.utexas.edu"
},
"startupScript": "~/.bashrc",
"scheduler": "SLURM",
"default": false,
"public": false,
"maxSystemJobsPerUser": 250,
"id": "demo.exec.frontera.${USERNAME}",
"workDir": "",
"site": "tacc.utexas.edu",
"environment": "",
"queues": [
{
"name": "normal",
"maxJobs": -1,
"maxMemoryPerNode": "192GB",
"default": true,
"maxRequestedTime": "48:00:00",
"description": null,
"maxNodes": 512,
"maxProcessorsPerNode": 28672,
"mappedName": null,
"maxUserJobs": 50,
"customDirectives": "${ALLOCATION}"
},
{
"name": "development",
"maxJobs": -1,
"maxMemoryPerNode": "192GB",
"default": false,
"maxRequestedTime": "02:00:00",
"description": null,
"maxNodes": 40,
"maxProcessorsPerNode": 2240,
"mappedName": null,
"maxUserJobs": 1,
"customDirectives": "${ALLOCATION}"
},
{
"name": "large",
"maxJobs": -1,
"maxMemoryPerNode": "192GB",
"default": false,
"maxRequestedTime": "48:00:00",
"description": null,
"maxNodes": 2048,
"maxProcessorsPerNode": 114688,
"mappedName": null,
"maxUserJobs": 5,
"customDirectives": "${ALLOCATION}"
},
{
"name": "flex",
"maxJobs": -1,
"maxMemoryPerNode": "192GB",
"default": false,
"maxRequestedTime": "48:00:00",
"description": null,
"maxNodes": 128,
"maxProcessorsPerNode": 7168,
"mappedName": null,
"maxUserJobs": 50,
"customDirectives": "${ALLOCATION}"
},
{
"name": "rtx",
"maxJobs": -1,
"maxMemoryPerNode": "128GB",
"default": false,
"maxRequestedTime": "48:00:00",
"description": null,
"maxNodes": 22,
"maxProcessorsPerNode": -1,
"mappedName": null,
"maxUserJobs": 5,
"customDirectives": "${ALLOCATION}"
},
{
"name": "rtx-dev",
"maxJobs": -1,
"maxMemoryPerNode": "128GB",
"default": false,
"maxRequestedTime": "02:00:00",
"description": null,
"maxNodes": 2,
"maxProcessorsPerNode": -1,
"mappedName": null,
"maxUserJobs": 2,
"customDirectives": "${ALLOCATION}"
}
],
"globalDefault": false,
"name": "${USERNAME} Frontera HPC DEMO Execution System for DesignSafe",
"status": "UP",
"scratchDir": "${SCRATCH_DIR}"
}
|
In this file you need to search for the following and replace them with appropriate text. The ${ALLOCATION} text is replaced with -A FTA-DD-SimCenter if using Frontera.
These are the four values that need to be replaced:
${USERNAME}, ${PASSWORD}, ${SCRATCH_DIR} and ${ALLOCATION}.
The scratch_dir can be found by logging into Frontera, issuing the commands cds (change dir to scratch) and then pwd (print working directory). The results of pwd is your scratch dir.
Once the file has been completed, you can create the system by invoking one of the the following command.
for Frontera:
tapis systems create -F fronteraSystem.jsonNow have a look for it. In the file we named it demo something (see line 27 above).
tapis systems search --name like demo -f jsonThe system should appear in what is returned.
Thats it, congratulations you have created an execution system.
Warning
Never ever ever check the file into github unless you remove your password. We suggest editing this file elasewhere and then removing it when the task is completed.
Hint
A demonstration is contained at end of the Video
Step 2: Exploring File System Commands¶
Tapis provide commands for doing file operartions. It providess commands for uploading and downloading files to and from the storage systems as well as typical commands dealing with file system opeartions at thr remove stoarge system.
Some commands we would like you to try.
Begin by listing the files in your home directory
tapis files list agave://designsafe.storage.default/YOUR_NAME
Add a directory tmp to your home folder at DesignSafe
tapis files mkdir agave://designsafe.storage.default/YOUR_NAME tmp
Copy a small file SimCenterBootcamp2020/code/agave/ExerciseDays4/ex1/piMPI.c to your current directory. From there send it to your new folder at DesignSafe.
tapis files upload agave://designsafe.storage.default/YOUR_NAME/tmp piMPI.c
Note
We will be using this file in our app. We compile it and run it. If yours worked from yesterday, upload your file instead. It makes the exercise somewhat more meaningful. If your file has a different name, you should be able to identify what small change you have to make in the submitClone.json file you will edit later.
Remove your local copy and try and copy the file you just uploaded.
tapis files download agave://designsafe.storage.default/YOUR_NAME/tmp/piMPI.c
List the other tapis file commands and explore what they do.
tapis files -h
Hint
A demonstartion is contained at end of the Video
Step 3: Build a Tapis app¶
A Tapis apps is a containerized application. Each app has a description that describes it’s name, inputs and parameters. This description can be obtained using the tapis apps show command. The description contains information about where the container for the application resides, inputs and outputs, information about the execution system on which the app will run and and information about a bash script, typically called wrapper.sh. The bash script is the srcipt that is run when the application is started running at a HPC resource. The will have access to all the files in the app container when it is run as well as all files and directories provided through the inputs.
We are going a develop tapis container application and we are going to use some tapis apps and files comamnds to do so. To build our app, like all programmers do, we are going to start by cloning an existing one that has similar inputs to the one we want, basically an input directory and a parameter. We will name the app mpiCompileRun and associate it with the exe srvice we created in exercise 1. (You will need the name). Use the following command to see what is needed.
The app we have in mind is one that will compile a program we have uploaded piMPI.c say and run it. The wrapper will utilize two variables programFile, the name of the file to compile and run, and inputDirectory the location of directory containg the file. A wrapper.sh file for this purpose is shown below. Line 1 through 4 and 13 through 17 are required for Tapis. The other lines are linux commands you have been using, module load to load the intel compiler, we next change directory to inputDirectory, and finally compile the program and then run it with ibrun:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | set -x
WRAPPERDIR=$( cd "$( dirname "$0" )" && pwd )
${AGAVE_JOB_CALLBACK_RUNNING}
module load intel
cd "${inputDirectory}"
mpicc ${programFile}
ibrun ./a.out
if [ ! $? ]; then
echo "program exited with an error status. $?" >&2
${AGAVE_JOB_CALLBACK_FAILURE}
exit
fi
|
We are first going to search for an app to clone. Let us look at simcenter apps.
tapis apps search --name like simcenter -f json
You should see one with an id simcenter-dakota-1.0.0u1. Have a look at it’s description.
tapis apps show simcenter-dakota-1.0.0u1 -f json
Amidst the results returned, you will see it takes an input directory and some parameters to run:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | {
"id": "simcenter-dakota-1.0.0u1",
"name": "simcenter-dakota",
"executionSystem": "designsafe.simcenter.exec.stampede2",
"deploymentSystem": "designsafe.storage.default",
"available": true,
"parallelism": "PARALLEL",
"defaultProcessorsPerNode": 128,
"defaultMemoryPerNode": 1,
"defaultNodeCount": 8,
"defaultMaxRunTime": "06:00:00",
......
......
"modules": [
"load dakota"
],
"inputs": [
{
"id": "inputDirectory",
.....
},
"details": {
.....
},
"semantics": {
.....
}
}
],
"parameters": [
{
"id": "driverFile",
"value": {
.....
},
"details": {
.....
},
"semantics": {
}
},
{
"id": "modules",
"value": {
.....
},
"details": {
.....
},
"semantics": {
.....
}
},
....
],
......
......
}
|
We are going to use this app as our starting point. We are going to clone the app. Have a look at the inputs for tapis apps clone with the following:
tapis apps clone -h
After having reviewed results of previous, you should be able to understand the following. The command to clone is some modifications based on your account, i.e. the -e refers to the execution service and for that you need to enter the id of the execution service you created in step 1 (the id was on line 37). Also replace YOUR_NAME with yor login name:
tapis apps clone -e demo.exec.frontera.YOUR_NAME -n mpiCompileRun.YOUR_NAME -x 0.0.1 simcenter-dakota-1.0.0u1
Note
Your name is not needed, it will be used in part 5 so that we can share applications and as such we will want to be able to distinguish between applications. For normal application development, you may not need or want it.
Having cloned the app, let us look at it’s description to see the inputs and outputs. We can get this description using the following:
tapis apps show -f json mpiCompileRun.YOUR_NAME-0.0.1 > mpiCompileRunYOUR_NAME.json
This command as placed the description of the mpiCompileRun to json file mpiCompileRun.json. Open it up and have a look. It will be similar to what was shown above, differences will be the id and executionSystem. You will see the input and parameters section for this app. Also you will see the application directory, which is in your home/applications foler at designsafe. From the application directory download the wrapper.sh script. We will modify it a bit. We will keep inputDirectory but will only have one input parameter programName. You need to edit the file before continuing, the edits should be obvious.
Note
There is a difference between inputs and parameters arguments. input arguments are file or directory resources that will be copied by tapis to the directory where the wrapper script is run. paramaters are arguments used in the script. In our example we could have just spcified programFile as being an input, but we wanted to show use of inputs and parameters (and also this would allow you to compile programs with many files in an input directory).
After editing the app description file, we can update the app.
tapis apps update -F mpiCompileRunYOUR_NAME.json mpiCompileRun.YOUR_NAME-0.0.1
Finally we need to replace the wrapper.sh of the existing app with our one. We do this with one of the tapis files system commands:
tapis files upload agave://designsafe.storage.default/YOUR_NAME/applications/mpiCompileRun.YOUR_NAME-0.0.1 wrapper.sh
We now have an application ready and wating to compile our code and run it!
Hint
A demonstration is contained at end of the Video
Step 4: Submitting a Job¶
Now we want to actually submit a job and have it compile and run on a HPC. In the code/agave folder there is a file piMPI.c and a json file cloneSubmit.json. The piMPI.c we have alredy used in files and placed in a demo folder at designsafe off our remote home folder. To submit a job to run at TACC through tapis we need to create the input file telling tapis what to do. The input file is application specific. An input file template for our app is found in cloneSubmit.json which is shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | {
"name": "Run 1",
"appId": "mpiCompileSimCenter-0.0.1",
"inputs": {
"inputDirectory": "agave://designsafe.storage.default/tg457427/demo2"
},
"parameters" : {
"programFile":"piMPI.c"
},
"maxRunTime": "00:01:00",
"memoryPerNode": "1GB",
"nodeCount": 1,
"processorsPerNode": 8,
"maxRunTime": "00:01:00",
"archive": true,
"archiveOnAppError":true,
"archiveSystem":"designsafe.storage.default",
"notifications": [
{
"url" : "fmckenna@berkeley.edu",
"event": "*"
}
]
}
|
There are 3 lines in this file needing changing. In Line 3 you need to set the appID to be equal to name of the app you created in step 3. You need to set the inputDirectory equal to the location you placed the piMPI.c file on line 5 and finally the name of the program to compile and run programName needs to be set on line 8. Other options are set for the piMPI application such as runTime, numNodes and numCores.
After file has been saved the use submits it to tapis job service by typing the following:
tapis jobs submit -F cloneSubmit.json
Tapis will respond with a message that hopefully says job was submitted sucessfully and will provide a jobID. That jobID is important, as you will use it to query status of job and download job information when job finished.
To look up job status, type:
tapis jobs status -f json 5ce7f59d-0c4f-46c1-806a-35965317525f-007
There are a number of states a job might be in, queued, running, finished, and the dreaded failed. Once a job has finished you can download the results with
tapis jobs show –f json 5ce7f59d-0c4f-46c1-806a-35965317525f-007
This results in a long list of output. Buried in it is the archivePath section. This is where the results have been stored.
"accepted": "2021-01-08T10:19:45.773Z",
"appId": "mpiCompileSimCenter.tg457427-0.0.1",
"appUuid": "7984683744829894165-242ac117-0001-005",
"archive": true,
"archivePath": "tg457427/archive/jobs/job-507792d1-35b0-4dc0-abd2-421cfba7ddc3-007",
"archiveSystem": "designsafe.storage.default",
"blockedCount": 0,
"created": "2021-01-08T10:19:45.779Z",
"ended": "7 hours ago",
The results folder can be viewed using the tapis files list command, i.e. for my job I would list the files in the following way:
tapis files list agave://designsafe.storage.default/tg457427/archive/jobs/job-507792d1-35b0-4dc0-abd2-421cfba7ddc3-007
In this folder you will see a long file name ending in .out. This you can download with the tapis files download. The results of the file I submitted is as shown below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | Shell debugging temporarily silenced: export LMOD_SH_DBG_ON=1 for Lmod's output
Shell debugging restarted
program file is piMPI.c
/scratch1/00477/tg457427/scratch1/00477/tg457427/tg457427/job-507792d1-35b0-4dc0-abd2-421cfba7ddc3-007-run-1
total 33
4 drwx------ 4 tg457427 G-80610 4096 Jan 8 04:21 .
4 drwx------ 3 tg457427 G-80610 4096 Jan 8 04:19 ..
4 -rw------- 1 tg457427 G-80610 345 Jan 8 04:19 .agave.archive
4 -rw------- 1 tg457427 G-80610 28 Jan 8 04:19 .agave.log
4 drwx------ 2 tg457427 G-80610 4096 Jan 8 04:19 demo2
1 -rw------- 1 tg457427 G-80610 653 Jan 8 04:22 run-1-507792d1-35b0-4dc0-abd2-421cfba7ddc3-007.err
1 -rw------- 1 tg457427 G-80610 239 Jan 8 04:22 run-1-507792d1-35b0-4dc0-abd2-421cfba7ddc3-007.out
4 -rwx------ 1 tg457427 G-80610 2132 Jan 8 04:19 run-1.ipcexe
4 drwx------ 3 tg457427 G-80610 4096 Jan 8 04:19 test
4 -rw------- 1 tg457427 G-80610 401 Jan 8 04:19 wrapper.sh
currentDIR
/scratch1/00477/tg457427/scratch1/00477/tg457427/tg457427/job-507792d1-35b0-4dc0-abd2-421cfba7ddc3-007-run-1/demo2
directory listing
total 12
4 drwx------ 2 tg457427 G-80610 4096 Jan 8 04:19 .
4 drwx------ 4 tg457427 G-80610 4096 Jan 8 04:21 ..
4 -rw------- 1 tg457427 G-80610 1286 Jan 8 04:19 piMPI.c
TACC: Starting up job 2332652
TACC: Starting parallel tasks...
PI = 3.14159265, duration: 0.080000 s
TACC: Shutdown complete. Exiting.
|
Hint
A demonstration is contained at end of the Video
Step 5: Allowing Others to Use your App¶
Sharing of resources is something built into Tapis, which allows researchers to share resoures (execution sytems, files, and apps). Check out the following commands:
tapis files pems drop
tapis files pems grant
tapis files pems list
tapis files pems revoke
tapis files pems show
tapis systems roles drop
tapis systems roles grant
tapis systems roles list
tapis systems roles revoke
tapis systems roles show
tapis apps pems grant
tapis apps pems list
tapis apps pems revoke
tapis apps pems show
Note
There are other tapis options that will allow you to publish and unpblish your resources, but these are not available to you to use.
For this exercise you need to select a partner, each of you need to swap usernames. You are then going to allow your partner to use your app and then you are going to run your job with their app.
To first look at your existing permissions issue the following:
tapis apps pems show mpiCompileRunYOUR_NAME-0.0.1
To add someone to your app who can execute issue
tapis apps pems show mpiCompileRunYOUR_NAME-0.0.1
You should see something like:
+----------+------+-------+---------+
| username | read | write | execute |
+----------+------+-------+---------+
| tg457427 | True | True | True |
+----------+------+-------+---------+
Now to allow the user fmk to execute your app, you would issue the command shown below. Change fmk to your partners name and issue the following:
tapis apps pems grant mpiCompileRunYOUR_NAME-0.0.1 fmk EXECUTE
Now you should see something like:
+----------+-------+-------+---------+
| username | read | write | execute |
+----------+-------+-------+---------+
| tg457427 | True | True | True |
| fmk | False | False | True |
+----------+-------+-------+---------+
Now see if you can figure out how to do the following:
Find your partners application using tapis apps search
Edit your job script to point to your partners app.
Launch the job using your partners application.
Check your program still works!