I'm in the beginning of trying converting our HUGE data/spark code from the CPU to the GPU. While I have vast experience with GPUs/CUDA, I find it a bit hard to pin-point exactly how/what the performance limitations I'm seeing. Furthermore, I'm looking at many queries, with different issues (such as non-supported features, strings, parquet issues etc.)
I've followed all the performance tuning guides and the suggested actions here, still would be happy to get more insights and assistance if possible :)
I am currently looking at the following query. CPU time is roughly the same as the GPU.
select first(x1, true), first(x2, true), ... first(x26, true), sum(case when y1 = 'aa' and y2 then 1 else 0 END), sum(case when y1 = 'bb' and y2 then 1 else 0 END) where (z1 is null or z1 = false) and (z2 is null or z2 = false) and (z3 is null or z3 = false) and (z4 is null or z4 = false) group by some_string_field_up_to_50_chars
Fields x1 to x26 are either strings or booleans.
Attached is the nvprof output. If I understand correctly 50% of the compute time is string related? and 20% is decoding the PARQUET's page information (i.e. not even decompressing the data itself?)
I guess my questions are:
- Is there some way of improving the query's GPU performance (I've replaced the FIRST operators to MIN, didn't help)
- Can I somehow evaluate if the parquet file/configuration is harming the GPU's performance (I'm running it on a small subset of the data - 100 ~300MB files each for a total of 35GB. Full data is over a Tera)
- Previous answers here suggested running the qualification tool and explain tool. Can I somehow retrieve information if the data being fed to the GPU is too small (because of a mis-configuration/small Parquet files/etc)?
Any further assistance is more than welcomed :)

I'm in the beginning of trying converting our HUGE data/spark code from the CPU to the GPU. While I have vast experience with GPUs/CUDA, I find it a bit hard to pin-point exactly how/what the performance limitations I'm seeing. Furthermore, I'm looking at many queries, with different issues (such as non-supported features, strings, parquet issues etc.)
I've followed all the performance tuning guides and the suggested actions here, still would be happy to get more insights and assistance if possible :)
I am currently looking at the following query. CPU time is roughly the same as the GPU.
select first(x1, true), first(x2, true), ... first(x26, true), sum(case when y1 = 'aa' and y2 then 1 else 0 END), sum(case when y1 = 'bb' and y2 then 1 else 0 END) where (z1 is null or z1 = false) and (z2 is null or z2 = false) and (z3 is null or z3 = false) and (z4 is null or z4 = false) group by some_string_field_up_to_50_charsFields x1 to x26 are either strings or booleans.
Attached is the nvprof output. If I understand correctly 50% of the compute time is string related? and 20% is decoding the PARQUET's page information (i.e. not even decompressing the data itself?)
I guess my questions are:
Any further assistance is more than welcomed :)