feat: added new algorithm to clusterize cells in-place#4735
feat: added new algorithm to clusterize cells in-place#4735goetzgaycken wants to merge 2 commits intoacts-project:mainfrom
Conversation
|
073195f to
0e5eca0
Compare
5fbbf2a to
eee8e25
Compare
|
I haven't heard any strong voices concerning the demonstration of this new clusterization algorithm in the examples framework. Therefore, the plan would be to leave for the time being the option in the DigitizationAlgorithm to toggle between the two clusterization algorithm which slightly increases its complexity. The longer term plan would be to only have one clusterization algorithm. |
ca9b8ef to
9aca2e0
Compare
| @@ -0,0 +1,331 @@ | |||
| // This file is part of the ACTS project. | |||
There was a problem hiding this comment.
This implementation does not support time-based clusterization as well. Can it be extended to cover that case?
There was a problem hiding this comment.
I haven't tried it, and I haven't even tried to clusterize more than 2 dimensions. But in theory one could add time as an extra coordinate and transform the time such that a time difference <=1 means connected. The types of all coordinates have to be the same and the type must be an integral type (though the "integral" requirement is likely not really necessary, but the abort criterion would have to be relaxed to <=2, I suppose ) If the time cannot be transformed in such a way (I would not know why not) one would have to specialize isConnected or isConnected4 and isConnected8, time could not be used as a sorting direction since a difference of >1 in the sorting axis would abort the search for connections.
There was a problem hiding this comment.
right now we are doing it with way: https://github.com/acts-project/acts/blob/main/Core/include/Acts/Clusterization/TimedClusterization.hpp
do you think this is not adaptable to this implementation of yours?
There was a problem hiding this comment.
to allow for different coordinate types (i.e. int for spacial coordinates and float for time) would make the code more complicated. Since in the cases I have seen a temporary cell collection is created anyways, then one can also transform the time into something more suitable for the clusterization, I would say.
| @@ -0,0 +1,331 @@ | |||
| // This file is part of the ACTS project. | |||
| // | |||
There was a problem hiding this comment.
why is this a different implementation of the clustering routine instead of a replacement? If you can demonstrate this provides the same results and it is faster I'd suggest to simply replace the previous version and mark the PR as a breaking one.
There was a problem hiding this comment.
If the current clusterization would produce the final clusters, than that might be advantageous. The final label sorting might add an unnecessary overhead in that case. So far I only encountered cases in which the cluster collection was temporary.
9aca2e0 to
d1fd488
Compare
d1fd488 to
b54ce39
Compare
|
btw. the names isConnected4 and isConnected8 only make sense in the 2D case. In the examples framework "commonCorrner" is used instead. To distinguish the two one could use commonEdge commonEdgeAndCorner. But that would get rather verbose. |
|
At the moment the criterion for aborting the search for connections is It is not clear to me whether there would be a use case for that (or does a useful clusterization of location+time would really require none integral times? ) |
b54ce39 to
11f46ad
Compare
11f46ad to
efc6765
Compare
efc6765 to
eb253b4
Compare
7b9aa72 to
a2898e2
Compare
|
a2898e2 to
94c8288
Compare
94c8288 to
d5bf037
Compare
| unsigned int min_label; | ||
| unsigned int max_label; |
There was a problem hiding this comment.
| unsigned int min_label; | |
| unsigned int max_label; | |
| unsigned int min_label{}; | |
| unsigned int max_label{}; |
There was a problem hiding this comment.
these are point-less initializations. The values are overwritten in the lines below. It is likely optimized away, but shouldn't the static code analyzer understand that there is no problem here rather than the variables to be set to some useless random values (i.e. the initialization with zero only satisfies clang-tidy but would not be a meaningful default value).
There was a problem hiding this comment.
To you it is pointless - ok, but to others, including me, it is not. A good optimizer will make this go away, a good static analyzer would understand that there is no problem. We only have one of both and we should work with what we have and don't turn off the linter for such unimportant cases. If you have better default values you can use those of course.
There was a problem hiding this comment.
In the following example clang++ --analyze will complain
unsigned int absDiff(int a, int b) {
int x0;
int x1;
// NOLINTBEGIN(cppcoreguidelines-init-variables)
if ( a<b) {
x0=a;
x1=b;
}
else {
x0=b;
}
// NOLINTEND(cppcoreguidelines-init-variables)
return x1-x0;
}
If I change it as You say to satisfy clang-tidy:
unsigned int absDiff(int a, int b) {
int x0{};
int x1{};
if ( a<b) {
x0=a;
x1=b;
}
else {
x0=b;
}
return x1-x0;
}
Neither of the tools (clang-tidy, clang++ --analyze) will complain. So, is better to satisfy bad tools, or to not add pointless/incorrect initializations. For me it is clearly the latter because in the latter case, a good tool actually helps me to find the bug, where in the first case the bad tool just helps me to hide the mistake under the rug.
There was a problem hiding this comment.
I strongly think we should avoid toggling the linter even if we're convinced it's safe until we have reason to believe there is some performance to be gained.
I would ask you therefore to go with the (potentially unnecessary) initialization for now, and circle back later if we find this can improve things.
There was a problem hiding this comment.
One possibility, I see, to write the code without bogus initialization and without triggering bogus clang-tidy warnings, would be to use a helper lambda e.g.
auto [min_label,max_label] = [&]() {
if (a < b) {
....
return std::make_pair(a,b);
} else {
....
return std::make_pair(b,a);
}
}();
But that makes things much more complicated than necessary.
There was a problem hiding this comment.
The point is not so much that it is unnecessary, but that the default value is not necessarily correct.
There was a problem hiding this comment.
Another simple example. This is wrong
#include <array>
int main() {
std::array<int,2> a;
for (unsigned int i=0; i<1; ++i) {
a[i]=i+1;
}
int sum=0;
for (unsigned int i=0; i<a.size(); ++i) {
sum += a[i];
}
return sum==3 ? 0 : 1;
}
but it is not more correct when changing std::array<int,2> a; to std::array<int,2> a{};.
I thought that I encountered a static analyzer which would detect the problem but the ones I tried won't. However valgrind and even clang++ with -fsanitize=memory will find the problem, but obviously not if a gets initialized. Anyway, initializing variables just for the sake of making clang-tidy happy, makes it harder for better tools to find the actual error. Therefore, I don't think it is an improvement, it just hides a potential problem.
ea99ee1 to
72b2838
Compare
|
changes d5bf037 to ea99ee1 and ea99ee1 to 72b2838:
I did not:
|
72b2838 to
0c631b5
Compare
I thought a little bit more about the customization of the connection testing. I suppose some customization will be necessary if the clusterization should take into account spatial and time cell components. I originally wanted to get rid of the connection object template parameter (current clusterization), but since I already introduced a template parameter to steer the connection test (enum), it could also just be turned into an arbitrary type which is more or less the current design. The overhead is minimal (or zero) if the "connection" object has no data members and only static methods which is the case for the default: "ConnectionHelper" (Maybe ConnectionTester would be a better name? ). This helper provides the original functions |
0c631b5 to
8033eb6
Compare
8033eb6 to
077f30a
Compare
077f30a to
cbd9c04
Compare
cbd9c04 to
095167c
Compare
|
|
Sorry but I don't think this discussion is worth the time and effort given what this is about. I asked for very simple adjustments to the code to apply with out guidelines of readability and maintainability. I cannot approve this as. You can of course ask someone else for a review / approval. |
|
changes ddb6bc2:
|
a404751 to
74723d2
Compare
The in-place clusterization could be used instead of the default clusterization. It clusterizes by sorting a cell container in such a way that adjacent cells with the same label belong to the same cluster.
Instead of switching off the clang-tidy warning about uninitialized variables which are initialized just a couple of lines below, the variables are now initialized to bogus values which satisfies clang-tidy but disables better static analyzers or other tools to actually find real initialization problems.
74723d2 to
8469555
Compare
|



The in-place clusterization could be used instead of the default clusterization but has a slightly different interface. It clusterizes by sorting the cell container in such a way that adjacent cells with the same label belong to the same cluster. This is advantageous since an additional cluster collection is avoid if the cluster collection resulting from the clusterization is only temporary.