C: Assignments Session 5

Problem 1: Parallelize pi.c using MPI

Today we are going to parallelize the pi.c code you developed for day 1. to run at TACC you will need to use either idev or sbatch.

The figure below shows an method to compute pi by numerical integration. We would like you to implement that computation in a C program.

Computation of pi numerically

#include <stdio.h>
#include <time.h>
#include <math.h>

static long int numSteps = 1000000000;

int main() {

  // perform calculation
  double pi   = 0;
  double dx = 1./numSteps;
  double x  = dx*0.50;
  
  for (int i=0; i<numSteps; i++) {
    pi += 4./(1.+x*x);
    x += dx;
  }
  
  pi *= dx;
  
  printf("PI = %16.14f Difference from math.h definition %16.14f \n",pi, pi-M_PI);
  return 0;
}

Note

When compiling at TACC if you wish to use gcc as I have done, issue the following command when you login.
module load gcc
When building and testing that the application works, use idev, as I have been showing in the videos.

When launchig the job to test the performance you will need to use sbatch and place your job in the queue. To do this you need to create a script that will be launched when the job runs. I have placed two scripts in each of the file folders. The script informs the system how many nodes and cores per node, what the expected run time is, and how to run the jib. Once the executable exists, the job is launched using the following command issued from a login node:
sbatch submit.sh

Full documentation on submitting scripts for OpenMP and MPI can be found online at TACC

Warning

Our solution of pi.c as written as a loop dependency which may need to revise for tomorrows OpenMPI problem.

You are to modify the pi.c application and run it to use mpi. I have included a few files in code/parallel/ExercisesDay4/ex1 to help you. They include pi.c above, gather1.c and a submit.sh script. gather1.c was presented in the video, and us shown below:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define LUMP 5

int main(int argc, char **argv) {
  
  int numP, procID;

  // the usual mpi initialization
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numP);
  MPI_Comm_rank(MPI_COMM_WORLD, &procID);

  int *globalData=NULL;
  int localData[LUMP];

  // process 0 is only 1 that needs global data
  if (procID == 0) {
    globalData = malloc(LUMP * numP * sizeof(int) );
    for (int i=0; i<LUMP*numP; i++)
      globalData[i] = 0;
  }

  for (int i=0; i<LUMP; i++)
    localData[i] = procID*10+i;
  
  MPI_Gather(localData, LUMP, MPI_INT, globalData, LUMP, MPI_INT, 0, MPI_COMM_WORLD);

  if (procID == 0) {
    for (int i=0; i<numP*LUMP; i++)
      printf("%d ", globalData[i]);
    printf("\n");
  }

  if (procID == 0)
    free(globalData);

  MPI_Finalize();
}

The submit script is as shown below.

#!/bin/bash
#--------------------------------------------------------------------
# Generic SLURM script – MPI Hello World
#
# This script requests 1 node and 8 cores/node (out of total 64 avail)
# for a total of 1*8 = 8 MPI tasks.
#---------------------------------------------------------------------
#SBATCH -J myjob
#SBATCH -o myjob.%j.out 
#SBATCH -e myjob.%j.err 
#SBATCH -p development
#SBATCH -N 1
#SBATCH -n 4
#SBATCH -t 00:02:00
#SBATCH -A DesignSafe-SimCenter

ibrun ./pi

Problem 2: Compute the Norm of a vector using MPI

Given what you just did with pi can you now write a program to compute the norm of a vector. In the ex2 directory I have placed a file scatterArray.c. This file will use MPI_Scatter to send components of the vector to the different processes in the running parallel application.

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char** argv) {
  
    int procID, numP;
    
    double* globalVector = NULL;
    double* localVector = NULL;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &procID);
    MPI_Comm_size(MPI_COMM_WORLD, &numP);

    if (argc != 2) {
      printf("Error correct usage: app vectorSize\n");
      return 0;
    }
    int vectorSize = atoi(argv[1]);
    int remainder = vectorSize % numP;

    // Only the root process initializes the global array
    if (procID == 0) {
      globalVector = (double*)malloc(sizeof(double) * vectorSize);
      srand(50);
      for (int i = 0; i < vectorSize; i++) {
	double random_number = 1.0 + (double)rand() / RAND_MAX;
	globalVector[i] = random_number;
      }
    }

    // Determine the size of the local array for each process
    int localSize = vectorSize / numP;

    // Allocate memory for the local array
    localVector = (double*)malloc(sizeof(double) * localSize);

    // Scatter the global array to all processes
    MPI_Scatter(globalVector, localSize, MPI_DOUBLE,
                localVector, localSize, MPI_DOUBLE,
                0, MPI_COMM_WORLD);

    // Print the local array for each process
    printf("Process %d received: ", procID);
    for (int i = 0; i < localSize; i++) {
        printf("%.2f ", localVector[i]);
    }
    printf("\n");

    // process0 has some stuff in the globalArray that was not sent!
    if (procID == 0) {
      printf("Process 0 Additional NOT SENT still in globalVector: ");    
      for (int i=numP*localSize; i<vectorSize; i++)
        printf("%.2f ", globalVector[i]);
      printf("\n");
    }

    // Clean up memory
    free(globalVector);
    free(localVector);

    MPI_Finalize();
    return 0;
}

Note

The vector size may not always be divisible by the number of processes. In such a case there will be additional terms not sent. Don’t forget to include them in the computation!

Problem 3: Bonus Parallelize your matMul solution using MPI

If you want a more complicated problem to parallelize, I suggest parallelizing you matMul application from Day 2.