Conversation
|
I'm still playing around with some optimizations to make this code faster. Here's one failed attempt: https://gist.github.com/ajfriend/ba522a38fa46dbc54875a9042157d2e3 Reusing some of the trig operations across points that appear twice in the area calculation doesn't seem to save much time. Maybe 2% improvement at most. |
Comparison of Naive, Kahan, and Neumaier summationAccuracyHere are the accuracy results from Naive summationKahanNeumaierNotice that Kahan and Neumaier give a big improvement over naive summation, with Neumaier usually being just slightly better than Kahan. Compute timeI ran some quick benchmarks on my Apple M3 laptop, and found (roughly) that Kahan is about 0.5% slower than simple summation, and Neumaier is around 1.0% slower than simple summation. ConclusionKahan improves accuracy over simple almost as well as Neumaier. Kahan is slightly slower than simple and slightly faster than Neumaier. Kahan has a slightly simpler implementation than Neumaier. I'll go with Kahan for the time being. If we run into examples where Neumaier provides obviously better behavior (e.g., degenerate polygons?) we can consider switching. |
Also, probably do not need to fuzz if no longer in the public API
|
All comments should be addressed. Ready for re-review. |
|
Some notes on translating the "Cagnoli math" from d3-geo to what we have here:
sdLambda = dLambda >= 0 ? 1 : -1,
adLambda = sdLambda * dLambda,
// Advance the previous points.
lambda0 = lambda, cosPhi0 = cosPhi, sinPhi0 = sinPhi; Our code, on the other hand, takes in two points (one edge arc) at a time. I thought this made the code more decoupled clearer, and there was no performance loss: #1101 (comment) The d3 code
|
| @@ -0,0 +1,27 @@ | |||
| /* | |||
There was a problem hiding this comment.
My only question here is why in new area.h/.c files instead of in one of the existing files. Do you expect to add more to this later?
There was a problem hiding this comment.
I will be adding more functions for areas of polygons and multipolygons as a follow-up to this PR (#1104), and the area.c file seemed like a good way to group them and their helper functions.
But I agree this area.h file is a little odd with just one function. Currently, it just enables the tests in testGeoLoopArea.c, but it will serve the same purpose in #1104.
If we decide to expose the area functions in the public API, we can probably drop this area.h file (but keep the area.c file.
There was a problem hiding this comment.
This is also a byproduct of this being a sequence of stacked PRs. I've added a TODO in #1103 to remind me to review file organization questions like this after we have all the changes landed.
nrabinowitz
left a comment
There was a problem hiding this comment.
This looks good to me!
| summation), which allows us to add up sequences of floating-point numbers | ||
| with better precision than naive summation, especially when the terms in | ||
| the sum vary significantly in magnitude. | ||
| See: https://en.wikipedia.org/wiki/Kahan_summation_algorithm |
| * The edge arcs between adjacent vertices are assumed to be the shortest | ||
| * geodesic path between them; that is, all arcs are interpreted to be less | ||
| * than 180 degrees or pi radians. | ||
| * Avoid arcs that are exactly pi (i.e, two antipodal vertices). |
There was a problem hiding this comment.
Might want to add a comment on consequences of this, and maybe throw an error if such an arc is encountered? Though presumably you could only tell if they were using the same const for Pi.
There was a problem hiding this comment.
There isn't a numerically stable test for exactly pi radians apart, so I'm not sure what we can do other than to let users know that two points will always be interpreted as describing the shortest arc between them. And to note that there are discontinuities at around arcs with pi radians, so they should be avoided.
I suppose one thing we could do would be to pick some tolerance and throw an error if an arc is within pi - tol. Let me create an issue for that.
| H3Error H3_EXPORT(cellAreaKm2)(H3Index cell, double *out) { | ||
| H3Error err = H3_EXPORT(cellAreaRads2)(cell, out); | ||
| if (!err) { | ||
| *out *= EARTH_RADIUS_KM * EARTH_RADIUS_KM; |
There was a problem hiding this comment.
Nit: Worth precalculating this as a new const?
There was a problem hiding this comment.
This is how it was done before when this function lived in the latlng.c file. I'm 95% certain modern compilers pick up on this and will make the constant for us, but we probably should verify that at some point.
There was a problem hiding this comment.
I think compilers will form the constant for us automatically.
(Side note: Compiler optimizations can be pretty impressive. I spent a while seeing how I could speed up the area calculation math, and one of the ideas was to used a fused sincos like https://developer.apple.com/documentation/simd/sincos(_:)-9pa2l which avoids duplicating some computation. I didn't get any speedup... because the compiler was already doing exactly that in the cagnoli function!)
This PR introduces a new area calculation algorithm which works for large, globe-spanning spherical polygons. That is, this algorithm should work for polygons that enclose the poles, cross the antimeridian, or even ones that are not contained in a hemisphere. The algorithm critically pays attention to vertex ordering, so, for example, you can switch the order of vertices in a cell to get a region representing "everything on the globe except that cell", and then compute that area correctly.
I'll try to keep this PR small by introducing only the core algorithm for simple loops of lat/lngs and for cells (replacing our existing algorithm in
cellAreaRads2). In a follow-up PR, I can add functions that compute area for polygons (with holes), multi-polygons, their linked list versions, and the different units (rads^2, km^2, m^2). Although, this will be a lot of functions, so we should probably think about which ones we want to actually expose and support going forward. I would even consider makinggeoLoopAreaRads2()an internal-only function, and only exposing higher level functions at the polygon and multipolygon level. Curious to hear what others think.I'll use the "global polygon" area algorithm from this PR for a new implementation of the "cellsToMultiPolygon" algorithm.
PR summary:
geoLoopAreaRads2()function as the core algorithm for computing area. In this PR, we use it to implementcellAreaRads2(), and in the future, it will be the core algorithm for computing area of polygons and multipolygons.cellAreaRads2function.