Unnecessary vec3 copy #434

enadream · 2024-12-16T19:55:27Z

enadream
Dec 16, 2024

Hi, I decided to use cglm instead of glm because I thought glm doesn't make operations in place, I mean it creates an intermediate object to make operations and then copy the result into the target object. I thought if I use cglm that problem wouldn't be in case, however when I read your vec3.h file I saw that you used a lot of intermediate vec3 inorder to do operations and then copy that vector to the dest parameter. Why did you do that ? Why instead of copying results directly into dest vector you created a vec t and then copied it into dest using glm_vec3_copy ? What's the point of that ?

recp · 2024-12-24T19:55:57Z

recp
Dec 24, 2024
Maintainer

Hi @enadream,

Sorry for the delay,

I mean it creates an intermediate object to make operations and then copy the result into the target object

In general creating an intermediate object is not a drawback but beneficial for some circumstances:

Safety: Let's take a look at glm_vec3_cross():

CGLM_INLINE
void
glm_vec3_cross(vec3 a, vec3 b, vec3 dest) {
  vec3 c;
  /* (u2.v3 - u3.v2, u3.v1 - u1.v3, u1.v2 - u2.v1) */
  c[0] = a[1] * b[2] - a[2] * b[1];
  c[1] = a[2] * b[0] - a[0] * b[2];
  c[2] = a[0] * b[1] - a[1] * b[0];
  glm_vec3_copy(c, dest);
}

can indeed be (without extra glm_vec3_copy()):

CGLM_INLINE
void
glm_vec3_cross(vec3 a, vec3 b, vec3 dest) {
  /* (u2.v3 - u3.v2, u3.v1 - u1.v3, u1.v2 - u2.v1) */
  dest[0] = a[1] * b[2] - a[2] * b[1];
  dest[1] = a[2] * b[0] - a[0] * b[2];
  dest[2] = a[0] * b[1] - a[1] * b[0];
}

bu if you call glm_vec3_cross(a, b, b); store cross of a and b then store result in b. In this case b would be used twice which would cause unexpected results (b -> dest):

  dest[0] = a[1] * dest[2] - a[2] * dest[1];
  dest[1] = a[2] * dest[0] - a[0] * dest[2];
  dest[2] = a[0] * dest[1] - a[1] * dest[0];

as you can see dest[0] is written in first line then used in other lines...

Optimization: compilers are smart enough to reduce intermediate objects' overhead if any. Actually sometimes produces faster code. We can load vec4 with simd ( if possible ) then work on registers directly which is fast. However in most cases we dont know where vec3 comes from, from array of vec3 or individual aligned or unaligned vec3 variable, from stack or heap or something elsewhere which may not be efficient to load individual items each time from remote source. Copying it once into intermediate value in stack brings extra copy but makes things fast because we are on stack now.

vec3 can be optimized with simd in the future if there will be enough room for efficiency

Struct API

cglm also provides structure api ( see documentation ). Struct api returns intermediate value directly and compilers may produce faster code by return value optimization and reduce extra copy/move operations. Both Array and Struct Api ( including CALL / GLMM APIs too ) can be used together.

0 replies

enadream · 2024-12-24T20:39:51Z

enadream
Dec 24, 2024
Author

@recp I hadn't thought about it this way. Thanks for the clarification.

0 replies

manofcosine · 2025-03-06T16:29:13Z

manofcosine
Mar 6, 2025

@recp, thanks for the answer.

For safety, I totally understand. However, for optimization, I do not quite understand.

However in most cases we dont know where vec3 comes from, from array of vec3 or individual aligned or unaligned vec3 variable, from stack or heap or something elsewhere which may not be efficient to load individual items each time from remote source.

As input parameters, a and b are only worse than dest,
they also come from anywhere possible, and they are loaded(readed from) more than dest,
which is only wrriten to if a or b does not share one pointer with dest.

I treat c as an output/write buffer, if that is right.
But why not any input/read buffer?
Why not copy a and b to stack too since the 3 of them are equally external pointers?

Please make more clarifictions.

0 replies

recp · 2025-03-19T20:13:17Z

recp
Mar 19, 2025
Maintainer

Hi @manofcosine,

Sorry for the delay, was too busy,

First of all it is not always about performance but safety too. For performance part;

If we can use simd almost all params are loaded into simd register already, so no need to extra copy. Otherwise like vec3 specifically speaking for glm_vec3_cross() as an example:

we doesn't do any write on input params like a and b ( readonly or late-write ) so even in the case of all params is dest like glm_vec3_cross(dest, dest, dest) for any reason, the result still wont overlap during calculations since writing is done to an intermediate value. This is safety part.

For performance about this case it has to be measured of course to say something more precise, but since inputs are readonly and only 3 floats; compiler may generate quite good optimized code to pre-fetch all vector into register[s], also access pattern is cache-friendly. We can copy on stack if we prove it is faster then directly access of course... Just wanted to avoid extra copy/move operations ...

We copy some input params like glm_mat4_mul() to stack then work on it instead of directly access each time. For small vectors we just trust to compiler for now until we do better design or make optimized version...

1 reply

manofcosine Mar 21, 2025

@recp, thanks again! You must have gone through all the dangerous cases to take everything into consideration.
Performance is not portable but safety is. I think I understand your design pattern now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unnecessary vec3 copy #434

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Unnecessary vec3 copy #434

Uh oh!

enadream Dec 16, 2024

Replies: 4 comments · 1 reply

Uh oh!

recp Dec 24, 2024 Maintainer

Struct API

Uh oh!

enadream Dec 24, 2024 Author

Uh oh!

manofcosine Mar 6, 2025

Uh oh!

recp Mar 19, 2025 Maintainer

Uh oh!

Uh oh!

manofcosine Mar 21, 2025

enadream
Dec 16, 2024

Replies: 4 comments 1 reply

recp
Dec 24, 2024
Maintainer

enadream
Dec 24, 2024
Author

manofcosine
Mar 6, 2025

recp
Mar 19, 2025
Maintainer