feat: added new algorithm to clusterize cells in-place by goetzgaycken · Pull Request #4735 · acts-project/acts

goetzgaycken · 2025-10-24T11:39:30Z

The in-place clusterization could be used instead of the default clusterization but has a slightly different interface. It clusterizes by sorting the cell container in such a way that adjacent cells with the same label belong to the same cluster. This is advantageous since an additional cluster collection is avoid if the cluster collection resulting from the clusterization is only temporary.

goetzgaycken · 2025-10-24T11:41:30Z

In synthetic benchmarks it seems to be 20+-4% faster than the current clusterization.
When running a digitization only example job using a test sample (ttbar, PU200, Geant4, ODD), the improvements seem more substantial. A reason might be that the code path with the current clusterization is not fully optimized.

github-actions · 2025-10-24T15:08:55Z

📊: Physics performance monitoring for `8469555`

Full contents

physmon summary

goetzgaycken · 2025-10-29T10:32:08Z

I haven't heard any strong voices concerning the demonstration of this new clusterization algorithm in the examples framework. Therefore, the plan would be to leave for the time being the option in the DigitizationAlgorithm to toggle between the two clusterization algorithm which slightly increases its complexity. The longer term plan would be to only have one clusterization algorithm.

Core/include/Acts/Clusterization/InPlaceClusterization.hpp

CarloVarni · 2025-10-29T14:50:55Z

Core/include/Acts/Clusterization/InPlaceClusterization.hpp

@@ -0,0 +1,331 @@
+// This file is part of the ACTS project.


This implementation does not support time-based clusterization as well. Can it be extended to cover that case?

I haven't tried it, and I haven't even tried to clusterize more than 2 dimensions. But in theory one could add time as an extra coordinate and transform the time such that a time difference <=1 means connected. The types of all coordinates have to be the same and the type must be an integral type (though the "integral" requirement is likely not really necessary, but the abort criterion would have to be relaxed to <=2, I suppose ) If the time cannot be transformed in such a way (I would not know why not) one would have to specialize isConnected or isConnected4 and isConnected8, time could not be used as a sorting direction since a difference of >1 in the sorting axis would abort the search for connections.

right now we are doing it with way: https://github.com/acts-project/acts/blob/main/Core/include/Acts/Clusterization/TimedClusterization.hpp

do you think this is not adaptable to this implementation of yours?

to allow for different coordinate types (i.e. int for spacial coordinates and float for time) would make the code more complicated. Since in the cases I have seen a temporary cell collection is created anyways, then one can also transform the time into something more suitable for the clusterization, I would say.

CarloVarni · 2025-10-29T14:51:44Z

Core/include/Acts/Clusterization/InPlaceClusterization.hpp

@@ -0,0 +1,331 @@
+// This file is part of the ACTS project.
+//


why is this a different implementation of the clustering routine instead of a replacement? If you can demonstrate this provides the same results and it is faster I'd suggest to simply replace the previous version and mark the PR as a breaking one.

If the current clusterization would produce the final clusters, than that might be advantageous. The final label sorting might add an unnecessary overhead in that case. So far I only encountered cases in which the cluster collection was temporary.

goetzgaycken · 2025-10-30T07:49:12Z

changes from 9aca2e0 to d1fd488:

removed unnecessary inline keyword
changed template parameter names from T_Name to name_t
put clusterization into namespace Acts
merged requirements i.e. requires(a){..} && requires(b) {...} -> requires(a,b) {...}

goetzgaycken · 2025-10-30T07:58:43Z

changes d1fd488 to b54ce39:

specify concept for index_t
removed unnecessary template parameter axis_t

goetzgaycken · 2025-10-30T08:15:56Z

btw. the names isConnected4 and isConnected8 only make sense in the 2D case. In the examples framework "commonCorrner" is used instead. To distinguish the two one could use commonEdge commonEdgeAndCorner. But that would get rather verbose.

goetzgaycken · 2025-10-30T08:31:21Z

At the moment the criterion for aborting the search for connections is diff[SORT_AXIS] > 1, which assumes that the coordinates are integral. In theory it might be possible to use floats, but then this criterion would be too restrictive. So one possibility would be to replace this by a function like isConnected, then alternative implementations could be provided.

It is not clear to me whether there would be a use case for that (or does a useful clusterization of location+time would really require none integral times? )

goetzgaycken · 2025-10-30T09:10:12Z

changes b54ce39 to 11f46ad:

extended requirements for CellWithLabel:
- coordinates can be compared,
- label and dimension are of unsigned integral type.

goetzgaycken · 2025-10-30T12:26:42Z

changes 11f46ad to efc6765:

fixed clang-tidy warnings, and silenced clang-tidy warnings about uninitialized values in 2 cases in which the variables are set just below.

goetzgaycken · 2025-10-31T11:09:40Z

changes 7b9aa72 to a2898e2:

renamed connection types from fold -> ConnectionType, 8Fold -> CommonEdgeOrCorner, 4Fold -> ConnonEdge
support unsigned coordinates i.e. when computing differences for unsigned types subtract smaller from larger value, the difference is now a template which can be specialized to support alternative coordinate types.
moved abort criterion (diff[sort_axis]>1) to template which could be specialized.
improved comment concerning types of coordinates.

goetzgaycken · 2025-10-31T11:31:31Z

changes a2898e2 to 94c8288:

forgot a couple of places to rename fold -> connect type, common corner ...

Core/include/Acts/Clusterization/InPlaceClusterization.hpp

andiwand · 2025-11-04T09:06:46Z

Core/include/Acts/Clusterization/InPlaceClusterization.hpp

+          unsigned int min_label;
+          unsigned int max_label;


Suggested change

unsigned int min_label;

unsigned int max_label;

unsigned int min_label{};

unsigned int max_label{};

these are point-less initializations. The values are overwritten in the lines below. It is likely optimized away, but shouldn't the static code analyzer understand that there is no problem here rather than the variables to be set to some useless random values (i.e. the initialization with zero only satisfies clang-tidy but would not be a meaningful default value).

To you it is pointless - ok, but to others, including me, it is not. A good optimizer will make this go away, a good static analyzer would understand that there is no problem. We only have one of both and we should work with what we have and don't turn off the linter for such unimportant cases. If you have better default values you can use those of course.

In the following example clang++ --analyze will complain

unsigned int absDiff(int a, int b) { int x0; int x1; // NOLINTBEGIN(cppcoreguidelines-init-variables) if ( a<b) { x0=a; x1=b; } else { x0=b; } // NOLINTEND(cppcoreguidelines-init-variables) return x1-x0; }

If I change it as You say to satisfy clang-tidy:

unsigned int absDiff(int a, int b) { int x0{}; int x1{}; if ( a<b) { x0=a; x1=b; } else { x0=b; } return x1-x0; }

Neither of the tools (clang-tidy, clang++ --analyze) will complain. So, is better to satisfy bad tools, or to not add pointless/incorrect initializations. For me it is clearly the latter because in the latter case, a good tool actually helps me to find the bug, where in the first case the bad tool just helps me to hide the mistake under the rug.

I strongly think we should avoid toggling the linter even if we're convinced it's safe until we have reason to believe there is some performance to be gained.

I would ask you therefore to go with the (potentially unnecessary) initialization for now, and circle back later if we find this can improve things.

One possibility, I see, to write the code without bogus initialization and without triggering bogus clang-tidy warnings, would be to use a helper lambda e.g.

auto [min_label,max_label] = [&]() { if (a < b) { .... return std::make_pair(a,b); } else { .... return std::make_pair(b,a); } }();

But that makes things much more complicated than necessary.

The point is not so much that it is unnecessary, but that the default value is not necessarily correct.

Another simple example. This is wrong

#include <array> int main() { std::array<int,2> a; for (unsigned int i=0; i<1; ++i) { a[i]=i+1; } int sum=0; for (unsigned int i=0; i<a.size(); ++i) { sum += a[i]; } return sum==3 ? 0 : 1; }

but it is not more correct when changing std::array<int,2> a; to std::array<int,2> a{};.

I thought that I encountered a static analyzer which would detect the problem but the ones I tried won't. However valgrind and even clang++ with -fsanitize=memory will find the problem, but obviously not if a gets initialized. Anyway, initializing variables just for the sake of making clang-tidy happy, makes it harder for better tools to find the actual error. Therefore, I don't think it is an improvement, it just hides a potential problem.

Core/include/Acts/Clusterization/InPlaceClusterization.hpp

goetzgaycken · 2025-11-04T17:19:12Z

changes d5bf037 to ea99ee1 and ea99ee1 to 72b2838:

removed member prefix from m_ from public members of the reference Cell implementation.
exit early and reduced indention in for_each_cluster
improved comments for Cell,
empty line after pragma once
make use of CellCollecion::value_type which was required indirectly anyway, now more explicitly.

I did not:

extend to CellCollection concept to require the container to provide the dimension of the grid,
make a static_assert a requirement of the CellCollection since it is only relevant for the function and is more a sanity check.
satisfy clang-tidy (i.e. I did not initialize variables to some random value just before they are actually set to a meaningful one just to avoid clang-tidy warnings)

goetzgaycken · 2025-11-05T13:52:56Z

change 72b2838 to 0c631b5:

Like original clusterization there is now an object for the connection testing, which by default has no data members and only static methods, but could be replaced by something which can provide parameters to the connection test. In the clusterization function the object type replaces the connection type template parameter
Extended the 2D clusterization unit test by an "example" which clusterizes cells with 2D spatial coordinates and time, where also time is given by a 16 bit integer.

I thought a little bit more about the customization of the connection testing. I suppose some customization will be necessary if the clusterization should take into account spatial and time cell components. I originally wanted to get rid of the connection object template parameter (current clusterization), but since I already introduced a template parameter to steer the connection test (enum), it could also just be turned into an arbitrary type which is more or less the current design. The overhead is minimal (or zero) if the "connection" object has no data members and only static methods which is the case for the default: "ConnectionHelper" (Maybe ConnectionTester would be a better name? ). This helper provides the original functions isConnected and canAbortSearch.

goetzgaycken · 2025-11-05T14:24:46Z

changes 0c631b5 to 8033eb6:

obfuscated an aggressive constant.

goetzgaycken · 2025-11-06T12:14:44Z

changes 033eb6 to 077f30a:

rebase to 2025-11-06_T1142
solved merge conflicts in ClusterizationTests2D, and
adjusted additions to ClusterizationTests2D for duplicate cell exception.

goetzgaycken · 2025-11-06T12:36:08Z

changes 077f30a to cbd9c04:

clang-tidy and doc fixes

goetzgaycken · 2025-11-06T14:58:35Z

cbd9c04 to 095167c:

more line and doc fixes.

sonarqubecloud · 2025-11-06T16:19:02Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
96.5% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

andiwand · 2025-11-12T00:46:21Z

Sorry but I don't think this discussion is worth the time and effort given what this is about. I asked for very simple adjustments to the code to apply with out guidelines of readability and maintainability. I cannot approve this as. You can of course ask someone else for a review / approval.

goetzgaycken · 2025-11-17T10:30:18Z

changes ddb6bc2:

as the comments says, works around the false clang-tidy positive without switching off the warning as requested. Not sure it is an improvement though.

The in-place clusterization could be used instead of the default clusterization. It clusterizes by sorting a cell container in such a way that adjacent cells with the same label belong to the same cluster.

Instead of switching off the clang-tidy warning about uninitialized variables which are initialized just a couple of lines below, the variables are now initialized to bogus values which satisfies clang-tidy but disables better static analyzers or other tools to actually find real initialization problems.

goetzgaycken · 2025-11-26T11:42:20Z

changes 74723d2 to 8469555:

bowing to the machine. Reverted the latest change because the solution which solved the clang-tidy complaint about uninitialized variables, by actually initializing the variables properly, added too much complexity. Moreover, instead of switching off the clang-tidy warning caused by its limited unstanding, clang-tidy is now satisfied by bogus initializations. Likely compilers are smarter than clang-tidy and understand that these intializations do not serve any purpose and optimize them away, but at the same time smarter tools are taken the possibility to actually find real initialization problems, since there are now the bogus initializations. Anyway clang-tidy is happy, and it seems to be the preferred solution even so the solution has no other purpose than satisfying clang-tidy and it makes harder to find problems with better tools.

github-actions bot added this to the next milestone Oct 24, 2025

github-actions bot added Component - Core Affects the Core module Component - Examples Affects the Examples module Clustering labels Oct 24, 2025

goetzgaycken force-pushed the feat_new_clusterization_alg branch 4 times, most recently from 073195f to 0e5eca0 Compare October 24, 2025 14:14

goetzgaycken force-pushed the feat_new_clusterization_alg branch 3 times, most recently from 5fbbf2a to eee8e25 Compare October 29, 2025 10:09

goetzgaycken force-pushed the feat_new_clusterization_alg branch 2 times, most recently from ca9b8ef to 9aca2e0 Compare October 29, 2025 11:53

goetzgaycken marked this pull request as ready for review October 29, 2025 11:53

CarloVarni self-requested a review October 29, 2025 14:08

CarloVarni requested changes Oct 29, 2025

View reviewed changes

goetzgaycken force-pushed the feat_new_clusterization_alg branch from 9aca2e0 to d1fd488 Compare October 30, 2025 07:31

goetzgaycken force-pushed the feat_new_clusterization_alg branch from d1fd488 to b54ce39 Compare October 30, 2025 07:57

goetzgaycken force-pushed the feat_new_clusterization_alg branch from b54ce39 to 11f46ad Compare October 30, 2025 09:05

goetzgaycken force-pushed the feat_new_clusterization_alg branch from 11f46ad to efc6765 Compare October 30, 2025 12:24

goetzgaycken force-pushed the feat_new_clusterization_alg branch from efc6765 to eb253b4 Compare October 30, 2025 15:34

goetzgaycken force-pushed the feat_new_clusterization_alg branch from 7b9aa72 to a2898e2 Compare October 31, 2025 11:05

goetzgaycken force-pushed the feat_new_clusterization_alg branch from a2898e2 to 94c8288 Compare October 31, 2025 11:30

goetzgaycken force-pushed the feat_new_clusterization_alg branch from 94c8288 to d5bf037 Compare October 31, 2025 12:12

andiwand reviewed Nov 4, 2025

View reviewed changes

goetzgaycken force-pushed the feat_new_clusterization_alg branch 2 times, most recently from ea99ee1 to 72b2838 Compare November 4, 2025 17:12

goetzgaycken force-pushed the feat_new_clusterization_alg branch from 72b2838 to 0c631b5 Compare November 5, 2025 13:35

goetzgaycken force-pushed the feat_new_clusterization_alg branch from 0c631b5 to 8033eb6 Compare November 5, 2025 14:24

goetzgaycken force-pushed the feat_new_clusterization_alg branch from 8033eb6 to 077f30a Compare November 6, 2025 12:09

goetzgaycken force-pushed the feat_new_clusterization_alg branch from 077f30a to cbd9c04 Compare November 6, 2025 12:34

goetzgaycken force-pushed the feat_new_clusterization_alg branch from cbd9c04 to 095167c Compare November 6, 2025 14:56

goetzgaycken force-pushed the feat_new_clusterization_alg branch 2 times, most recently from a404751 to 74723d2 Compare November 18, 2025 07:45

Goetz Gaycken added 2 commits November 26, 2025 12:25

Added faster, in-place clusterization.

6cf07e7

The in-place clusterization could be used instead of the default clusterization. It clusterizes by sorting a cell container in such a way that adjacent cells with the same label belong to the same cluster.

goetzgaycken force-pushed the feat_new_clusterization_alg branch from 74723d2 to 8469555 Compare November 26, 2025 11:27

github-actions bot added the Stale label Dec 26, 2025

github-actions bot removed the Stale label Mar 25, 2026

		@@ -0,0 +1,331 @@
		// This file is part of the ACTS project.
		//

Conversation

goetzgaycken commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goetzgaycken commented Oct 24, 2025

Uh oh!

github-actions bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊: Physics performance monitoring for 8469555

physmon summary

Uh oh!

goetzgaycken commented Oct 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goetzgaycken Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goetzgaycken Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CarloVarni Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goetzgaycken commented Oct 30, 2025

Uh oh!

goetzgaycken commented Oct 30, 2025

Uh oh!

goetzgaycken commented Oct 30, 2025

Uh oh!

goetzgaycken commented Oct 30, 2025

Uh oh!

goetzgaycken commented Oct 30, 2025

Uh oh!

goetzgaycken commented Oct 30, 2025

Uh oh!

goetzgaycken commented Oct 31, 2025

Uh oh!

goetzgaycken commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goetzgaycken Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goetzgaycken Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goetzgaycken commented Oct 24, 2025 •

edited

Loading

github-actions bot commented Oct 24, 2025 •

edited

Loading

📊: Physics performance monitoring for `8469555`

goetzgaycken Oct 29, 2025 •

edited

Loading

goetzgaycken Oct 30, 2025 •

edited

Loading

CarloVarni Oct 29, 2025 •

edited

Loading

goetzgaycken Nov 4, 2025 •

edited

Loading

goetzgaycken Nov 11, 2025 •

edited

Loading

goetzgaycken commented Nov 4, 2025 •

edited

Loading