When compiling at TACC if you wish to use gcc as I have done, issue the following command when you login.
moduleloadgcc
When building and testing that the application works, use idev, as I have been showing in the videos.
When launchig the job to test the performance you will need to use sbatch and place your job in the queue. To do this you need to create a script that will be launched when the job runs. I have placed two scripts in each of the file folders. The script informs the system how many nodes and cores per node, what the expected run time is, and how to run the jib. Once the executable exists, the job is launched using the following command issued from a login node:
sbatchsubmit.sh
Full documentation on submitting scripts for OpenMP and MPI can be found online at TACC
Warning
Our solution of pi.c as written as a loop dependency which may need to revise for tomorrows OpenMPI problem.
You are to modify the pi.c application and run it to use mpi. I have included a few files in code/parallel/ExercisesDay4/ex1 to help you. They include pi.c above, gather1.c and a submit.sh script. gather1.c was presented in the video, and us shown below:
1#include<mpi.h> 2#include<stdio.h> 3#include<stdlib.h> 4#define LUMP 5 5 6intmain(intargc,char**argv){ 7 8intnumP,procID; 910// the usual mpi initialization11MPI_Init(&argc,&argv);12MPI_Comm_size(MPI_COMM_WORLD,&numP);13MPI_Comm_rank(MPI_COMM_WORLD,&procID);1415int*globalData=NULL;16intlocalData[LUMP];1718// process 0 is only 1 that needs global data19if(procID==0){20globalData=malloc(LUMP*numP*sizeof(int));21for(inti=0;i<LUMP*numP;i++)22globalData[i]=0;23}2425for(inti=0;i<LUMP;i++)26localData[i]=procID*10+i;2728MPI_Gather(localData,LUMP,MPI_INT,globalData,LUMP,MPI_INT,0,MPI_COMM_WORLD);2930if(procID==0){31for(inti=0;i<numP*LUMP;i++)32printf("%d ",globalData[i]);33printf("\n");34}3536if(procID==0)37free(globalData);3839MPI_Finalize();40}
The submit script is as shown below.
1#!/bin/bash 2#-------------------------------------------------------------------- 3# Generic SLURM script – MPI Hello World 4# 5# This script requests 1 node and 8 cores/node (out of total 64 avail) 6# for a total of 1*8 = 8 MPI tasks. 7#--------------------------------------------------------------------- 8#SBATCH -J myjob 9#SBATCH -o myjob.%j.out 10#SBATCH -e myjob.%j.err 11#SBATCH -p development12#SBATCH -N 113#SBATCH -n 414#SBATCH -t 00:02:0015#SBATCH -A DesignSafe-SimCenter1617ibrun./pi1819
Problem 2: Compute the Norm of a vector using MPI
Given what you just did with pi can you now write a program to compute the norm of a vector. In the ex2 directory I have placed a file scatterArray.c. This file will use MPI_Scatter to send components of the vector to the different processes in the running parallel application.
1#include<stdio.h> 2#include<stdlib.h> 3#include<mpi.h> 4 5intmain(intargc,char**argv){ 6 7intprocID,numP; 8 9double*globalVector=NULL;10double*localVector=NULL;1112MPI_Init(&argc,&argv);13MPI_Comm_rank(MPI_COMM_WORLD,&procID);14MPI_Comm_size(MPI_COMM_WORLD,&numP);1516if(argc!=2){17printf("Error correct usage: app vectorSize\n");18return0;19}20intvectorSize=atoi(argv[1]);21intremainder=vectorSize%numP;2223// Only the root process initializes the global array24if(procID==0){25globalVector=(double*)malloc(sizeof(double)*vectorSize);26srand(50);27for(inti=0;i<vectorSize;i++){28doublerandom_number=1.0+(double)rand()/RAND_MAX;29globalVector[i]=random_number;30}31}3233// Determine the size of the local array for each process34intlocalSize=vectorSize/numP;3536// Allocate memory for the local array37localVector=(double*)malloc(sizeof(double)*localSize);3839// Scatter the global array to all processes40MPI_Scatter(globalVector,localSize,MPI_DOUBLE,41localVector,localSize,MPI_DOUBLE,420,MPI_COMM_WORLD);4344// Print the local array for each process45printf("Process %d received: ",procID);46for(inti=0;i<localSize;i++){47printf("%.2f ",localVector[i]);48}49printf("\n");5051// process0 has some stuff in the globalArray that was not sent!52if(procID==0){53printf("Process 0 Additional NOT SENT still in globalVector: ");54for(inti=numP*localSize;i<vectorSize;i++)55printf("%.2f ",globalVector[i]);56printf("\n");57}5859// Clean up memory60free(globalVector);61free(localVector);6263MPI_Finalize();64return0;65}
Note
The vector size may not always be divisible by the number of processes. In such a case there will be additional terms not sent. Don’t forget to include them in the computation!
Problem 3: Bonus Parallelize your matMul solution using MPI
If you want a more complicated problem to parallelize, I suggest parallelizing you matMul application from Day 2.