finally got all gpu operation working, and implemented gpu constructor and destructor

jolatechno · jolatechno · commit 07c524ab4c50 · 2020-12-30T21:00:50.000+01:00
diff --git a/README.md b/README.md
@@ -2,11 +2,11 @@
 
 ## Requirements
 
-This library is design for Linux only _for now_, it require `g++-10` for compilation.
+This library is design for Linux only _for now_, it require `g++` (or `g++-10` for __AMD__ GPU offloading) for compilation.
 
  For the best result, you should install __Openmp__ (`libomp5-xx`), and compile the library using it.
 
-For GPU offloading you will also need the correct GPU drivers, and either `gcc-10-offload-nvptx` for __NVIDIA__ cards, or `gcc-10-offload-amdgcn` for __AMD__ GPUs.
+For GPU offloading you will also need the correct GPU drivers, and either `gcc-offload-nvptx` (or `gcc-10-offload-nvptx` if you want to use `g++-10`) for __NVIDIA__ cards, or `gcc-10-offload-amdgcn` for __AMD__ GPUs.
 
 To take advantage of MPI, you need to install `mpic++`.
 
@@ -28,7 +28,7 @@ The function that are defined when using mpi are `.send()` and `.receive()` for
 
 To compile it with __Openmp__, you need to use the `"openmp"` directive before all other targets, which will modify the `LDLIBS` variable in the [Makefile](./Makefile).
 
-If compiled with __Openmp__, loops of more than 500 iterations will automatically use a `pragma omp parralel`. You can change this threshold by adding `-DCPU_LIMIT=xxx` to `CCFLAGS` by using `CCFLAGS="CPU_LIMIT=xxx"` with `make`.
+If compiled with __Openmp__, loops of more than 500 iterations will automatically use a `pragma omp parralel`. You can change this threshold by adding `-DCPU_LIMIT=xxx` to `CCFLAGS` by using `CCFLAGS="-DCPU_LIMIT=xxx"` with `make`.
 
 ### Offloading to GPUs
 
@@ -40,7 +40,7 @@ If you encounter some errors you might want to also pass the flag  `"CCFLAGS=-fc
 
 All arithmetic operations, self-operators, and comparisons excluding `==, !=` (for performance reasons) are now supported on GPUs.
 
-Loops of more than 10000 iterations will automatically be offloaded to GPUs. You can change this threshold by adding `-DCPU_LIMIT=xxx` to `CCFLAGS` by using `CCFLAGS="GPU_LIMIT=yyy"` with `make`. If you want to modify both the GPU offloading threshold and the __Openmp__ threshold, you need to use `CCFLAGS="-DCPU_LIMIT=yyy -DGPU_LIMIT=xxx"` with `make`.
+Loops of more than 10000 iterations will automatically be offloaded to GPUs. You can change this threshold by adding `-DCPU_LIMIT=xxx` to `CCFLAGS` by using `CCFLAGS="-DGPU_LIMIT=yyy"` with `make`. If you want to modify both the GPU offloading threshold and the __Openmp__ threshold, you need to use `CCFLAGS="-DCPU_LIMIT=yyy -DGPU_LIMIT=xxx"` with `make`.
 
 Atomic operations for type `uint8_t` are not supported by __Openmp__ on GPU (see [issue #1](https://github.com/jolatechno/binary_algebra/issues/1)). I finally found a work around for every operation, either by converting types, or by grouping operations together to only apply atomic operations on `long unsigned int`.
 
diff --git a/performance_testing/functions.hpp b/performance_testing/functions.hpp
@@ -1,8 +1,6 @@
 #include "../src/binary_arithmetic.hpp"
 
-#include <stdio.h> //for testing
 void multiplication_mat_vect(Matrix mat, Vector vect) {
-  printf("!! height : %d\n", mat.height); //for testing
   mat * vect;
 }
 
diff --git a/performance_testing/test.cpp b/performance_testing/test.cpp
@@ -17,10 +17,16 @@ int main(int argc, char** argv){
   #endif
 
   const int n_iter = 100;
-  #if defined(_OPENMP)
-    const int sizes[] = {
-      10, 100, 500,
-    };
+  #ifdef _OPENMP
+    #ifdef TARGET
+      const int sizes[] = {
+        100, 500, 1000,
+      };
+    #else
+      const int sizes[] = {
+        10, 100, 500,
+      };
+    #endif
   #else
     const int sizes[] = {
       10, 50, 100,
diff --git a/src/arithmetic.inl b/src/arithmetic.inl
@@ -334,7 +334,6 @@ Matrix Matrix::operator*(Matrix const& other) const {
   return res;
 }
 
-#include <stdio.h> //for testing
 Vector Matrix::operator*(Vector const& other) const {
   assert(width == other.height); //check if dimensions are compatible
 
@@ -347,8 +346,6 @@ Vector Matrix::operator*(Vector const& other) const {
   uint8_t *other_blocks = other.blocks;
   uint64_t *this_blocks = blocks;
 
-  printf("!! adress : %p\n", this_blocks); //for testing
-
   int16_t i, k;
   #if defined(_OPENMP) && defined(TARGET)
     if(_height*_width > GPU_LIMIT) {
diff --git a/src/openmp.hpp b/src/openmp.hpp
@@ -1,6 +1,6 @@
 #pragma once
 
-#if defined(_OPENMP)
+#ifdef _OPENMP
   #define _OPENMP_PRAGMA(all) _Pragma(all)
 
   #include <omp.h>

Original file line number	Diff line number	Diff line change
`@@ -1,8 +1,6 @@`
`1`	`1`	`#include "../src/binary_arithmetic.hpp"`
`2`	`2`
`3`		`-#include <stdio.h> //for testing`
`4`	`3`	`void multiplication_mat_vect(Matrix mat, Vector vect) {`
`5`		`- printf("!! height : %d\n", mat.height); //for testing`
`6`	`4`	`mat * vect;`
`7`	`5`	`}`
`8`	`6`