C: Assignments Session 2¶
Some more problems for you to tackle. Parts should look and feel familiar from first session, though we will add more features as we go.
Problem 1: DGEMM¶
Navigate to /assignments/C-Session2/matMul. Instead of a single file, there are multiple files. One of these files, BlasDGEMM.c, invokes the BLAS dgemm function and requires that the application be linked to the BLAS library. Compiling and linking the applications would require you to find the path to the blas libraries. In addition the multiple .c files would require multiple compilation commands. Compiling this version requires multiple steps:
gcc myDGEMM.c -c
gcc blasDGEMM.c -c
gcc matMul.c myDGEMM.o blasDGEMM.o -lm -/pathtoblaslibrary -o matMul
And you can run the executable as
./matMul
Imagine doing this for many more files, usually tens to hundreds. That would be painstaking and inefficient and very error prone. Software engineers developed several tools to simplify and automate the compile process. One of those tools is cmake, a member of the make family of tools. You find a configuration file names CMakeList.txt in the source folder. The configuration file is a plain text file, so you can and should check out how it is written.
The compile process now becomes
1. a configuration step - done only once or every time you are adding a file to the project. Inside the source folder, execute
$ mkdir build
$ cd build
$ cmake ..
$ cmake --build .
This will check your system for compilers and other development tool and create a Makefile in each source folder.
Note
Placing the compile files into a build folder makes cleanup easier: simply delete the entire build folder when done. It can be regenerated easily using the above procedure.
2. From now on, every time you make changes to any of the files within your project, simply type
$ cmake --build .
to recompile all portions necessary and link all parts to one executable. That process remains exactly the same regardless of the number of files in your project. Give it a try and see how convenient this is especially for projects provided by somebody else.
Now that you can compile the matMul application, you will find it does not work! You are required to fix the matMul.c program to allocate memory for the A, B, C and C1 arrays. These arrays are double arrays to hold square, n by n, matrices that are required to be stored in column major order. Some code is required lines 29 through 32. You should also throw in 4 lines ariund line 59.
1#include <stdio.h>
2#include <stdlib.h>
3#include <math.h>
4
5extern void myDGEMM(int n, double *A, double *B, double *C);
6extern void blasDGEMM(int n, double* A, double* B, double* C);
7
8void fill(double* p, int n) {
9 for (int i = 0; i < n; ++i)
10 p[i] = (double)rand() / (double)RAND_MAX ;
11}
12
13/* The benchmarking program */
14int main(int argc, char** argv) {
15
16 if (argc != 2) {
17 printf("Correct usage: app matrixDimension?\n");
18 exit(0);
19 }
20
21 // get matrix size
22 int n = atoi(argv[1]);
23 n = fabs(n);
24 if (n == 0)
25 n = 10;
26
27 int result = 0;
28
29 double *A = 0; // << SOME CODE HERE
30 double *B = 0; // << SOME CODE HERE
31 double *C = 0; // << SOME CODE HERE
32 double *C1 = 0; // << SOME CODE HERE
33
34 if (A == 0 || B == 0 || C == 0 || C1 == 0) {
35 printf("NO MMEORY ALLOCATED FOR ARRAYS\n");
36 exit(0);
37 }
38
39 fill(A, n * n);
40 fill(B, n * n);
41 fill(C, n * n);
42
43 for (int i=0; i<n*n; i++)
44 C1[i]=C[i];
45
46 blasDGEMM(n, A, B, C);
47
48 myDGEMM(n, A, B, C1);
49
50 // check they are the same .. take into account there will be differences due to roundoff
51 for (int j=0; j<n*n; j++) {
52 double diff = C1[j] - C[j];
53 double error = fabs(diff/C[j]);
54 if (error > 1e-10) {
55 result = 1;
56 }
57 }
58
59 // GOOD PRACTICE TO PUT 4 LINES of CODE HERE
60
61
62 printf("%d\n", result);
63 return 0;
64}
After fixing the matMul.c file, you need to edit the myDGEMM.c file and place in their code to perform the matrix-matrix operation: C = C + A * B;
1const char* dgemm_desc = "Naive, three-loop dgemm.";
2
3/*
4 * This routine performs a dgemm operation
5 * C := C + A * B
6 * where A, B, and C are lda-by-lda matrices stored in column-major format.
7 * On exit, A and B maintain their input values.
8 *
9 * NOTE: Fortran storage: C(i,j) = C[i + j*n]
10 */
11void myDGEMM(int n, double* A, double* B, double* C) {
12 return;
13}
Note
The CMake process created another executabble, benchmark. If you run it you will see how your implementation compares in performance against the vendor supplied blas function. It is probably a pretty bad comparison. Try improving the performance. You can play with different compile options, or a revised algorithm, e.g. black matrix-multiply.
Problem 2: Reading From a CSV file, Memory Allocation & Writing to Binary¶
Reading of data from files and placing them into containers such as Vectors is easy if you know the size of the data you are reading. If this is unknown the problem becomes more tricky. The solution presented on slide 22 worked for a small number of inputs, but failed with a segmentation fault for larger problems. You are to fix the problem. A copy of the offending file file3.c has been placed in the directory binaryFile along with two files. The program can handle the first small.txt, it will fail with the second big.txt. Can you make the program work. The solution will test your understanding of file I/O, memory management and pointers.
Initial code is provided in the /assignments/C-Session2/binaryFile directory.
At end of the program, you are asked to modify the code so that the results of the two vectors are ouput to a binary file. Output the contents of vector1 followed by vector2.
The file3.c is as shown below. You need to put some code to replace comment at the line 41.
1
2// program to read values from a file, each file a csv list of int and two double
3// written: fmk
4
5#include <stdio.h>
6#include <stdlib.h>
7
8int main(int argc, char **argv) {
9
10 if (argc != 3) {
11 fprintf(stdout, "ERROR correct usage appName inputFile outputBinaryFile\n");
12 return -1;
13 }
14
15 //
16 // read from ascii file
17 //
18
19 FILE *filePtr = fopen(argv[1],"r");
20
21 int i = 0;
22 float float1, float2;
23 int maxVectorSize = 100;
24 double *vector1 = (double *)malloc(maxVectorSize*sizeof(double));
25 double *vector2 = (double *)malloc(maxVectorSize*sizeof(double));
26 int vectorSize = 0;
27
28 while (fscanf(filePtr,"%d, %f, %f\n", &i, &float1, &float2) != EOF) {
29 vector1[vectorSize] = float1;
30 vector2[vectorSize] = float2;
31 printf("%d, %f, %f\n",i, vector2[i], vector1[i]);
32 vectorSize++;
33
34 if (vectorSize == maxVectorSize) {
35 // some code needed here I think .. programming exercise
36 }
37 }
38
39 fclose(filePtr);
40
41 //
42 // write data to binary file
43 //
44
45 FILE *filePtrB = fopen(argv[2],"wb");
46
47 // some missing code to write vector1, followed by vector 2
48
49 fclose(filePtrB);
50}
The small.txt file is as shown below.
10, 0.153779, 0.560532
21, 0.865013, 0.276724
32, 0.895919, 0.704462
43, 0.886472, 0.929641
54, 0.469290, 0.350208
65, 0.941637, 0.096535
76, 0.457211, 0.346164
87, 0.970019, 0.114938
98, 0.769819, 0.341565
109, 0.684224, 0.748597
Note
No cmake or Makefile has been provided. You can compile the file with icc or whatever compiler you are using. The program takes two inputs, the file to read and the file to write. To compile and test the program, issue the following at the terminal prompt. When done compare the file sizes of the binary file to the text file.
icc file3.c -o file3
./file2 small.txt
./file2 big.txt
Note
Give some thought as to how you would open the file and read back in the two vectors. If you have some time, write a program to do and have that program write the contents of the binary files to a csv file.