Skip to content

Add geoLoopAreaRads2 function#1101

Merged
ajfriend merged 63 commits intouber:masterfrom
ajfriend:cagnoli_area
Dec 17, 2025
Merged

Add geoLoopAreaRads2 function#1101
ajfriend merged 63 commits intouber:masterfrom
ajfriend:cagnoli_area

Conversation

@ajfriend
Copy link
Copy Markdown
Collaborator

@ajfriend ajfriend commented Nov 28, 2025

This PR introduces a new area calculation algorithm which works for large, globe-spanning spherical polygons. That is, this algorithm should work for polygons that enclose the poles, cross the antimeridian, or even ones that are not contained in a hemisphere. The algorithm critically pays attention to vertex ordering, so, for example, you can switch the order of vertices in a cell to get a region representing "everything on the globe except that cell", and then compute that area correctly.

I'll try to keep this PR small by introducing only the core algorithm for simple loops of lat/lngs and for cells (replacing our existing algorithm in cellAreaRads2). In a follow-up PR, I can add functions that compute area for polygons (with holes), multi-polygons, their linked list versions, and the different units (rads^2, km^2, m^2). Although, this will be a lot of functions, so we should probably think about which ones we want to actually expose and support going forward. I would even consider making geoLoopAreaRads2() an internal-only function, and only exposing higher level functions at the polygon and multipolygon level. Curious to hear what others think.

I'll use the "global polygon" area algorithm from this PR for a new implementation of the "cellsToMultiPolygon" algorithm.

PR summary:

  • Introduce geoLoopAreaRads2() function as the core algorithm for computing area. In this PR, we use it to implement cellAreaRads2(), and in the future, it will be the core algorithm for computing area of polygons and multipolygons.
  • Enables area calculation of "global" polygons without issues along poles or the antimeridian, or with polygons larger than a hemisphere.
  • 1.64x faster than the existing cellAreaRads2 function.
  • Adds a "compensated summation" algorithm to compute sums of floating point numbers more accurately, which we use when summing terms to compute the area of polygons/loops. Accuracy benchmarks are improved in this PR, although much of that improvement is not in the cell area calculation itself, but in summing up the areas in the benchmark when testing if we arrive at the area of the globe. We'll keep the compensated summation code since we expect an improvement when computing the area of larger and more complex loops/polygons, but we should add a benchmark/test to demonstrate this in a follow-up PR dealing with such polygons.

@coveralls
Copy link
Copy Markdown

coveralls commented Nov 28, 2025

Coverage Status

coverage: 98.952% (+0.003%) from 98.949%
when pulling 4931800 on ajfriend:cagnoli_area
into 312fbbc on uber:master.

@ajfriend
Copy link
Copy Markdown
Collaborator Author

I'm still playing around with some optimizations to make this code faster. Here's one failed attempt: https://gist.github.com/ajfriend/ba522a38fa46dbc54875a9042157d2e3

Reusing some of the trig operations across points that appear twice in the area calculation doesn't seem to save much time. Maybe 2% improvement at most.

@ajfriend
Copy link
Copy Markdown
Collaborator Author

ajfriend commented Nov 29, 2025

Comparison of Naive, Kahan, and Neumaier summation

Accuracy

Here are the accuracy results from benchmarkArea.c, which adds up the areas of all the cells at a given resolution, and computes the difference between the area of the entire globe 4*pi:

Naive summation

res: 0, diff: 3.552714e-15
res: 1, diff: 3.907985e-14
res: 2, diff: 1.456613e-13
res: 3, diff: 4.654055e-13
res: 4, diff: 2.220446e-12
res: 5, diff: 1.520561e-12
res: 6, diff: 1.761613e-11
res: 7, diff: 1.515765e-11
res: 8, diff: 2.926903e-11

Kahan

res: 0, diff: 5.329071e-15
res: 1, diff: 3.552714e-15
res: 2, diff: 3.552714e-15
res: 3, diff: 0.000000e+00
res: 4, diff: 3.552714e-15
res: 5, diff: 1.776357e-15
res: 6, diff: 0.000000e+00
res: 7, diff: 0.000000e+00
res: 8, diff: 0.000000e+00

Neumaier

res: 0, diff: 3.552714e-15
res: 1, diff: 3.552714e-15
res: 2, diff: 3.552714e-15
res: 3, diff: 1.776357e-15
res: 4, diff: 1.776357e-15
res: 5, diff: 0.000000e+00
res: 6, diff: 0.000000e+00
res: 7, diff: 0.000000e+00
res: 8, diff: 1.776357e-15

Notice that Kahan and Neumaier give a big improvement over naive summation, with Neumaier usually being just slightly better than Kahan.

Compute time

I ran some quick benchmarks on my Apple M3 laptop, and found (roughly) that Kahan is about 0.5% slower than simple summation, and Neumaier is around 1.0% slower than simple summation.

Conclusion

Kahan improves accuracy over simple almost as well as Neumaier. Kahan is slightly slower than simple and slightly faster than Neumaier. Kahan has a slightly simpler implementation than Neumaier. I'll go with Kahan for the time being. If we run into examples where Neumaier provides obviously better behavior (e.g., degenerate polygons?) we can consider switching.

@ajfriend ajfriend mentioned this pull request Dec 1, 2025
6 tasks
@ajfriend
Copy link
Copy Markdown
Collaborator Author

ajfriend commented Dec 2, 2025

All comments should be addressed. Ready for re-review.

Comment thread src/apps/testapps/testGeoLoopArea.c
Comment thread src/h3lib/lib/area.c
@ajfriend
Copy link
Copy Markdown
Collaborator Author

ajfriend commented Dec 11, 2025

Some notes on translating the "Cagnoli math" from d3-geo to what we have here:

  1. We drop the absolute value logic from d3-geo because, as far as i can tell, it is no longer necessary. See the discussion in Simplify dLambda handling in area calculation d3/d3-geo#294
sdLambda = dLambda >= 0 ? 1 : -1,
adLambda = sdLambda * dLambda,
  1. The d3 code takes in one point at a time, and updates "previous" and "current" points:
// Advance the previous points.
lambda0 = lambda, cosPhi0 = cosPhi, sinPhi0 = sinPhi; 

Our code, on the other hand, takes in two points (one edge arc) at a time. I thought this made the code more decoupled clearer, and there was no performance loss: #1101 (comment)

The d3 code dLambda = lambda - lambda0 becomes our double d = y.lng - x.lng;

  1. Instead of "phi" and "lambda", I just reused "lat" and "lng".

  2. The d3 areaRingSum.add(atan2(v, u)); is the same as our -2.0 * atan2(sa * sd, sa * cd + ca);

Comment thread src/h3lib/include/area.h
@@ -0,0 +1,27 @@
/*
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only question here is why in new area.h/.c files instead of in one of the existing files. Do you expect to add more to this later?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be adding more functions for areas of polygons and multipolygons as a follow-up to this PR (#1104), and the area.c file seemed like a good way to group them and their helper functions.

But I agree this area.h file is a little odd with just one function. Currently, it just enables the tests in testGeoLoopArea.c, but it will serve the same purpose in #1104.

If we decide to expose the area functions in the public API, we can probably drop this area.h file (but keep the area.c file.

Copy link
Copy Markdown
Collaborator Author

@ajfriend ajfriend Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a byproduct of this being a sequence of stacked PRs. I've added a TODO in #1103 to remind me to review file organization questions like this after we have all the changes landed.

Copy link
Copy Markdown
Collaborator

@nrabinowitz nrabinowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

Comment thread src/h3lib/include/adder.h
summation), which allows us to add up sequences of floating-point numbers
with better precision than naive summation, especially when the terms in
the sum vary significantly in magnitude.
See: https://en.wikipedia.org/wiki/Kahan_summation_algorithm
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL. Very cool!

Comment thread src/h3lib/lib/area.c
* The edge arcs between adjacent vertices are assumed to be the shortest
* geodesic path between them; that is, all arcs are interpreted to be less
* than 180 degrees or pi radians.
* Avoid arcs that are exactly pi (i.e, two antipodal vertices).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to add a comment on consequences of this, and maybe throw an error if such an arc is encountered? Though presumably you could only tell if they were using the same const for Pi.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a numerically stable test for exactly pi radians apart, so I'm not sure what we can do other than to let users know that two points will always be interpreted as describing the shortest arc between them. And to note that there are discontinuities at around arcs with pi radians, so they should be avoided.

I suppose one thing we could do would be to pick some tolerance and throw an error if an arc is within pi - tol. Let me create an issue for that.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/h3lib/lib/area.c
H3Error H3_EXPORT(cellAreaKm2)(H3Index cell, double *out) {
H3Error err = H3_EXPORT(cellAreaRads2)(cell, out);
if (!err) {
*out *= EARTH_RADIUS_KM * EARTH_RADIUS_KM;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Worth precalculating this as a new const?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it was done before when this function lived in the latlng.c file. I'm 95% certain modern compilers pick up on this and will make the constant for us, but we probably should verify that at some point.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think compilers will form the constant for us automatically.

(Side note: Compiler optimizations can be pretty impressive. I spent a while seeing how I could speed up the area calculation math, and one of the ideas was to used a fused sincos like https://developer.apple.com/documentation/simd/sincos(_:)-9pa2l which avoids duplicating some computation. I didn't get any speedup... because the compiler was already doing exactly that in the cagnoli function!)

@ajfriend ajfriend merged commit 7a9fdf1 into uber:master Dec 17, 2025
45 checks passed
@ajfriend ajfriend deleted the cagnoli_area branch December 17, 2025 22:58
@ajfriend ajfriend added this to the v4.5 milestone Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants