Skip to content

Commit 07c524a

Browse files
committed
finally got all gpu operation working, and implemented gpu constructor and destructor
1 parent 86596c6 commit 07c524a

File tree

5 files changed

+15
-14
lines changed

5 files changed

+15
-14
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22

33
## Requirements
44

5-
This library is design for Linux only _for now_, it require `g++-10` for compilation.
5+
This library is design for Linux only _for now_, it require `g++` (or `g++-10` for __AMD__ GPU offloading) for compilation.
66

77
For the best result, you should install __Openmp__ (`libomp5-xx`), and compile the library using it.
88

9-
For GPU offloading you will also need the correct GPU drivers, and either `gcc-10-offload-nvptx` for __NVIDIA__ cards, or `gcc-10-offload-amdgcn` for __AMD__ GPUs.
9+
For GPU offloading you will also need the correct GPU drivers, and either `gcc-offload-nvptx` (or `gcc-10-offload-nvptx` if you want to use `g++-10`) for __NVIDIA__ cards, or `gcc-10-offload-amdgcn` for __AMD__ GPUs.
1010

1111
To take advantage of MPI, you need to install `mpic++`.
1212

@@ -28,7 +28,7 @@ The function that are defined when using mpi are `.send()` and `.receive()` for
2828

2929
To compile it with __Openmp__, you need to use the `"openmp"` directive before all other targets, which will modify the `LDLIBS` variable in the [Makefile](./Makefile).
3030

31-
If compiled with __Openmp__, loops of more than 500 iterations will automatically use a `pragma omp parralel`. You can change this threshold by adding `-DCPU_LIMIT=xxx` to `CCFLAGS` by using `CCFLAGS="CPU_LIMIT=xxx"` with `make`.
31+
If compiled with __Openmp__, loops of more than 500 iterations will automatically use a `pragma omp parralel`. You can change this threshold by adding `-DCPU_LIMIT=xxx` to `CCFLAGS` by using `CCFLAGS="-DCPU_LIMIT=xxx"` with `make`.
3232

3333
### Offloading to GPUs
3434

@@ -40,7 +40,7 @@ If you encounter some errors you might want to also pass the flag `"CCFLAGS=-fc
4040

4141
All arithmetic operations, self-operators, and comparisons excluding `==, !=` (for performance reasons) are now supported on GPUs.
4242

43-
Loops of more than 10000 iterations will automatically be offloaded to GPUs. You can change this threshold by adding `-DCPU_LIMIT=xxx` to `CCFLAGS` by using `CCFLAGS="GPU_LIMIT=yyy"` with `make`. If you want to modify both the GPU offloading threshold and the __Openmp__ threshold, you need to use `CCFLAGS="-DCPU_LIMIT=yyy -DGPU_LIMIT=xxx"` with `make`.
43+
Loops of more than 10000 iterations will automatically be offloaded to GPUs. You can change this threshold by adding `-DCPU_LIMIT=xxx` to `CCFLAGS` by using `CCFLAGS="-DGPU_LIMIT=yyy"` with `make`. If you want to modify both the GPU offloading threshold and the __Openmp__ threshold, you need to use `CCFLAGS="-DCPU_LIMIT=yyy -DGPU_LIMIT=xxx"` with `make`.
4444

4545
Atomic operations for type `uint8_t` are not supported by __Openmp__ on GPU (see [issue #1](https://github.com/jolatechno/binary_algebra/issues/1)). I finally found a work around for every operation, either by converting types, or by grouping operations together to only apply atomic operations on `long unsigned int`.
4646

performance_testing/functions.hpp

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
#include "../src/binary_arithmetic.hpp"
22

3-
#include <stdio.h> //for testing
43
void multiplication_mat_vect(Matrix mat, Vector vect) {
5-
printf("!! height : %d\n", mat.height); //for testing
64
mat * vect;
75
}
86

performance_testing/test.cpp

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,16 @@ int main(int argc, char** argv){
1717
#endif
1818

1919
const int n_iter = 100;
20-
#if defined(_OPENMP)
21-
const int sizes[] = {
22-
10, 100, 500,
23-
};
20+
#ifdef _OPENMP
21+
#ifdef TARGET
22+
const int sizes[] = {
23+
100, 500, 1000,
24+
};
25+
#else
26+
const int sizes[] = {
27+
10, 100, 500,
28+
};
29+
#endif
2430
#else
2531
const int sizes[] = {
2632
10, 50, 100,

src/arithmetic.inl

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -334,7 +334,6 @@ Matrix Matrix::operator*(Matrix const& other) const {
334334
return res;
335335
}
336336

337-
#include <stdio.h> //for testing
338337
Vector Matrix::operator*(Vector const& other) const {
339338
assert(width == other.height); //check if dimensions are compatible
340339

@@ -347,8 +346,6 @@ Vector Matrix::operator*(Vector const& other) const {
347346
uint8_t *other_blocks = other.blocks;
348347
uint64_t *this_blocks = blocks;
349348

350-
printf("!! adress : %p\n", this_blocks); //for testing
351-
352349
int16_t i, k;
353350
#if defined(_OPENMP) && defined(TARGET)
354351
if(_height*_width > GPU_LIMIT) {

src/openmp.hpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#pragma once
22

3-
#if defined(_OPENMP)
3+
#ifdef _OPENMP
44
#define _OPENMP_PRAGMA(all) _Pragma(all)
55

66
#include <omp.h>

0 commit comments

Comments
 (0)