C: Assignments Day 2¶
Today we have several problems for you to tackle. Parts should look and feel familiar from Day 1, though we will add more features as we go.
Problem 1: DGEMM¶
Navigate to /assignments/C-Day2/matMul. Instead of a single file, there are multiple files. One of these files, BlasDGEMM.c, invokes the BLAS dgemm function and requires that the application be linked to the BLAS library. Compiling and linking the applications would require you to find the path to the blas libraries. In addition the multiple .c files would require multiple compilation commands. Compiling this version requires multiple steps:
gcc myDGEMM.c -c
gcc blasDGEMM.c -c
gcc matMul.c myDGEMM.o blasDGEMM.o -lm -/pathtoblaslibrary -o matMul
And you can run the executable as
./matMul
Imagine doing this for many more files, usually tens to hundreds. That would be painstaking and inefficient and very error prone. Software engineers developed several tools to simplify and automate the compile process. One of those tools is cmake, a member of the make family of tools. You find a configuration file names CMakeList.txt in the source folder. The configuration file is a plain text file, so you can and should check out how it is written.
The compile process now becomes
1. a configuration step - done only once or every time you are adding a file to the project. Inside the source folder, execute
$ mkdir build
$ cd build
$ cmake ..
$ cmake --build .
This will check your system for compilers and other development tool and create a Makefile in each source folder.
Note
Placing the compile files into a build folder makes cleanup easier: simply delete the entire build folder when done. It can be regenerated easily using the above procedure.
2. From now on, every time you make changes to any of the files within your project, simply type
$ cmake --build .
to recompile all portions necessary and link all parts to one executable. That process remains exactly the same regardless of the number of files in your project. Give it a try and see how convenient this is especially for projects provided by somebody else.
Now that you can compile the matMul application, you will find it does not work! You are required to fix the matMul.c program to allocate memory for the A, B, C and C1 arrays. These arrays are double arrays to hold square, n by n, matrices that are required to be stored in column major order. Some code is required lines 29 through 32. You should also throw in 4 lines ariund line 59.
1#include <stdio.h>
2#include <stdlib.h>
3#include <math.h>
4
5extern void myDGEMM(int n, double *A, double *B, double *C);
6extern void blasDGEMM(int n, double* A, double* B, double* C);
7
8void fill(double* p, int n) {
9 for (int i = 0; i < n; ++i)
10 p[i] = (double)rand() / (double)RAND_MAX ;
11}
12
13/* The benchmarking program */
14int main(int argc, char** argv) {
15
16 if (argc != 2) {
17 printf("Correct usage: app matrixDimension?\n");
18 exit(0);
19 }
20
21 // get matrix size
22 int n = atoi(argv[1]);
23 n = fabs(n);
24 if (n == 0)
25 n = 10;
26
27 int result = 0;
28
29 double *A = 0; // << SOME CODE HERE
30 double *B = 0; // << SOME CODE HERE
31 double *C = 0; // << SOME CODE HERE
32 double *C1 = 0; // << SOME CODE HERE
33
34 if (A == 0 || B == 0 || C == 0 || C1 == 0) {
35 printf("NO MMEORY ALLOCATED FOR ARRAYS\n");
36 exit(0);
37 }
38
39 fill(A, n * n);
40 fill(B, n * n);
41 fill(C, n * n);
42
43 for (int i=0; i<n*n; i++)
44 C1[i]=C[i];
45
46 blasDGEMM(n, A, B, C);
47
48 myDGEMM(n, A, B, C1);
49
50 // check they are the same .. take into account there will be differences due to roundoff
51 for (int j=0; j<n*n; j++) {
52 double diff = C1[j] - C[j];
53 double error = fabs(diff/C[j]);
54 if (error > 1e-10) {
55 result = 1;
56 }
57 }
58
59 // GOOD PRACTICE TO PUT 4 LINES of CODE HERE
60
61
62 printf("%d\n", result);
63 return 0;
64}
After fixing the matMul.c file, you need to edit the myDGEMM.c file and place in their code to perform the matrix-matrix operation: C = C + A * B;
1const char* dgemm_desc = "Naive, three-loop dgemm.";
2
3/*
4 * This routine performs a dgemm operation
5 * C := C + A * B
6 * where A, B, and C are lda-by-lda matrices stored in column-major format.
7 * On exit, A and B maintain their input values.
8 *
9 * NOTE: Fortran storage: C(i,j) = C[i + j*n]
10 */
11void myDGEMM(int n, double* A, double* B, double* C) {
12 return;
13}
Note
The CMake process created another executabble, benchmark. If you run it you will see how your implementation compares in performance against the vendor supplied blas function. It is probably a pretty bad comparison. Try improving the performance. You can play with different compile options, or a revised algorithm, e.g. black matrix-multiply.
Problem 2: Using structures¶
The implementation of StressTransform()
was intentionally done a bit clumsy, just the way a beginner might
write it. Your task in this exercise is to create a structure
typedef struct {
double sigx;
double sigy;
double tau;
} STRESS ;
and modify the code from the previous exercise to utilize the much easier to read data structure provided
by this struct
. Use the code skeleton provided in /assignments/C_Day2/stressTransformationStruct to develop that
code. The included CMakeList.txt
shall be used to compile your code.
Note
Your modified StressTransform(...)
will require a pointer to a STRESS
type object. The
way to achieve that in an efficient manner is to use a typedef struct {...} STRESS ;
.
In addition, inside the function that receives the pointer to a structure, assigning a new value to entries in such a structure requires the syntax
void StressTransform(STRESS stressIn, STRESS *stressOut, ....) {
...
stressIn->sigx = ... ;
}
This replaces the form
*sigx = ... ;
used for scalar-valued arguments.
Problem 3: Writing data for use by other programs: CSV¶
While C is very powerful for numeric computations, it can be impractical to generate graphs or fancy images using the computed values. A more efficient way is to use C to do the analysis, write results to an easily readable file, and use specialized tools for the post-processing. One common and simple format is CSV (comma-separated-values), which van be read easily by MATLAB, python, or Excel.
Your task: modify the code given in /assignments/C-Day2/stressTransformFile/ex2-3/ to
1. Take one argument \(\Delta\theta\) in degrees after the name of the executable, defining the increment at which transformed stress values shall be written:
$ Exercise2-3 5.0
The format of the output shall be for one angle per line, organized as follows:
theta, sigma_x, sigma_y, tau_xy
...
Output shall commence until an angle of \(180^\circ\) has been reached or exceeded.
Once your code outputs the information, run it once more and save the results to a file names list.csv (make sure to add the spaces around the ‘>’)
$ Exercise2-3 5.0 > list.csv
Note
You may want to download the file list.csv to your local computer before trying the next step, for it
will require access to your display. That file can be opened in Excel and plotted there. A more
efficient way is to prepare some nice plotting code, such as the provided plotter.py
. In the same
folder where you placed list.csv run
Windows 10
>> python.exe plotter.py
MacOS or Linux
$ python3 plotter.py
Isn’t that nice?
Problem 4: Writing to a binary file¶
Modify the code generated in the previous exercise to write a binary file named mohrcircle.dta instead
of the formatted ASCII data. The data shall be exported in clocks composed of double theta
followed by a block of STRESS
(or the three components of stress as double
).
You may be working of your code or use the provided code skeleton in /assignments/C-Day2/stressTensorFile/ex2-4.
This time, your code should be totally silent on execution. The only sign of success will be the creation of the data file. For the next steps, run your program with the following parameters:
$ Exercise2-4 5.0
Note
How large do you expect the binary file to be? Discuss, predicts, and check using
$ ls -l mohrcircle.dta
You should be able to predict the exact number (to the byte!).
Note
This problem comes with validation code, something worth developing every time you are working on software that will be modified over an extended period of time and/or by multiple people.
The validation consists of (1) a C code parse.c
which reads the binary file and outputs its
contents to a CSV file, and (2) a shell script validate.sh
that attempts to run the
validation code and compares the output generated from your binary file to an output generated by a
correct code.
Run the validation script as
$ sh ./validate.sh
and check its feedback. (That script may not run on all platforms.)
Note
Binary files are not readable by traditional ASCII editors (text editors). Doings so, usually shows some unintelligible scramble of characters, sometimes leaving your terminal in an unusable state.
However, you may view binary files using a hex-dump utility. That approach may help you understand and recover the structure of a binary file (though it still requires some practice and skill and luck). You may try such a tool on your binary file using
$ xxd mohrcircle.dta | less
where the | less
pipes the output in a pager utility that allows you to search the output,
jump pages forward and backward, or move to any specific line. Press q
to exit this utility.
Problem 5: Reading From a CSV file, Memory Allocation & Writing to Binary¶
Reading of data from files and placing them into containers such as Vectors is easy if you know the size of the data you are reading. If this is unknown the problem becomes more tricky. The solution presented on slide 22 worked for a small number of inputs, but failed with a segmentation fault for larger problems. You are to fix the problem. A copy of the offending file file3.c has been placed in the directory binaryFile along with two files. The program can handle the first small.txt, it will fail with the second big.txt. Can you make the program work. The solution will test your understanding of file I/O, memory management and pointers.
Initial code is provided in the /assignments/C-Day2/binaryFile directory.
At end of the program, you are asked to modify the code so that the results of the two vectors are ouput to a binary file. Output the contents of vector1 followed by vector2.
The file3.c is as shown below. You need to put some code to replace comment at the line 41.
1
2// program to read values from a file, each file a csv list of int and two double
3// written: fmk
4
5#include <stdio.h>
6#include <stdlib.h>
7
8int main(int argc, char **argv) {
9
10 if (argc != 3) {
11 fprintf(stdout, "ERROR correct usage appName inputFile outputBinaryFile\n");
12 return -1;
13 }
14
15 //
16 // read from ascii file
17 //
18
19 FILE *filePtr = fopen(argv[1],"r");
20
21 int i = 0;
22 float float1, float2;
23 int maxVectorSize = 100;
24 double *vector1 = (double *)malloc(maxVectorSize*sizeof(double));
25 double *vector2 = (double *)malloc(maxVectorSize*sizeof(double));
26 int vectorSize = 0;
27
28 while (fscanf(filePtr,"%d, %f, %f\n", &i, &float1, &float2) != EOF) {
29 vector1[vectorSize] = float1;
30 vector2[vectorSize] = float2;
31 printf("%d, %f, %f\n",i, vector2[i], vector1[i]);
32 vectorSize++;
33
34 if (vectorSize == maxVectorSize) {
35 // some code needed here I think .. programming exercise
36 }
37 }
38
39 fclose(filePtr);
40
41 //
42 // write data to binary file
43 //
44
45 FILE *filePtrB = fopen(argv[2],"wb");
46
47 // some missing code to write vector1, followed by vector 2
48
49 fclose(filePtrB);
50}
The small.txt file is as shown below.
10, 0.153779, 0.560532
21, 0.865013, 0.276724
32, 0.895919, 0.704462
43, 0.886472, 0.929641
54, 0.469290, 0.350208
65, 0.941637, 0.096535
76, 0.457211, 0.346164
87, 0.970019, 0.114938
98, 0.769819, 0.341565
109, 0.684224, 0.748597
Note
No cmake or Makefile has been provided. You can compile the file with icc or whatever compiler you are using. The program takes two inputs, the file to read and the file to write. To compile and test the program, issue the following at the terminal prompt. When done compare the file sizes of the binary file to the text file.
icc file3.c -o file3
./file2 small.txt
./file2 big.txt
Note
Give some thought as to how you would open the file and read back in the two vectors. If you have some time, write a program to do and have that program write the contents of the binary files to a csv file.