UCS/CONFIG: Support numeric ranges for devices#11340
UCS/CONFIG: Support numeric ranges for devices#11340guy-ealey-morag wants to merge 14 commits intoopenucx:masterfrom
Conversation
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
1a6d926 to
dcedc34
Compare
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
| unsigned expected_device_idx) | ||
| { | ||
| for (unsigned i = 0; i < count; ++i) { | ||
| std::string const device_name = build_mlx5_device_name(i); |
| "a[0-A]b", "a[-]b", "[]", "[-1-2-]", "[--]", | ||
| "[1-2][3-4]", "[4-8][", "[5-6]]", "][4-5]", "[[0-4]]", | ||
| "a[0-4]b[6-8]c", "a[5-2]b", "[10-0]", "ab[100-1]suffix" | ||
| }; |
There was a problem hiding this comment.
const and "a[-4]", "[-4]", "foo[]]", "[]]", "a-]", "a[-", "a[4-]", "a[4-6" which is not exhaustive but surely sufficient (as is your list).
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
| ucs_assertv(delim[0] != '\0', "delimiter cannot be empty"); | ||
| ucs_assertv(strlen(delim) == 1, "delimiter must be a single character"); |
There was a problem hiding this comment.
Wouldn't it be better to have delim as char instead of const char *?
There was a problem hiding this comment.
This arg was actually only recently added (#11206)
Changing it to char would require putting it into a null-terminated string to be able to use it with strspn, strpbrk, ucs_string_buffer_for_each_token because they take a set of delimiters as argument.
The asserts are here to make sure there is a single delimiter given because we use it to build the output.
Let me know if you still think it's worth changing it.
| ucs_free(str_dup); | ||
| return 0; | ||
| UCS_INIT_ONCE(®ex_init) { | ||
| ret = regcomp(&range_regex, |
There was a problem hiding this comment.
should we call regfree somewhere?
There was a problem hiding this comment.
Moved the regex object to the file scope and added it to the cleanup logic
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
| "\\[([0-9]+)-([0-9]+)\\]" /* range */ | ||
| "([^][]*)$" /* suffix */, | ||
| REG_EXTENDED); | ||
| ucs_assertv(ret == 0, "failed to compile range regex: %d", ret); |
There was a problem hiding this comment.
I expect it to never fail if it passes in CI, but I changed it to a fatal error for robustness
| pmatch[0].rm_so = 0; | ||
| pmatch[0].rm_eo = (regoff_t)token_len; | ||
|
|
||
| ret = regexec(&ucs_config_range_regex, token, 5, pmatch, REG_STARTEND); |
There was a problem hiding this comment.
ucs_static_array_size() for all 5
| break; | ||
| } | ||
|
|
||
| count_total += ucs_config_array_expand_range(strb, p, next - p, delim, |
There was a problem hiding this comment.
can we make any truncation (reaching max_elements) logged, to help quickly understanding over/mis use?
There was a problem hiding this comment.
Added warning log with matching tests
|
|
||
| snprintf(buf, max, | ||
| "comma-separated list (use \"all\" for including all items, " | ||
| "\'^\' for negation, or \'prefix[start-end]suffix\' for ranges) " |
There was a problem hiding this comment.
maybe reverse order like: use 'all' for all items, prefic[s-e]suffix for ranges, start with ^ for negation?
| return 1; | ||
| } | ||
|
|
||
| static void ucs_config_array_expand_ranges(ucs_string_buffer_t *strb, |
There was a problem hiding this comment.
some systems have: rocep124s0, rocep193s0, rocep125s0, rocep194s0. in that case worth noting that expansion could lead to needing many entries. rocep[124-193]s0 for instance.
There was a problem hiding this comment.
In this case the best we can do right now is rocep[124-125]s0,rocep[193-194]s0.
If it's very common we can think of extending this feature to support multiple ranges,
something like rocep[123-124,193-194]s0.
What do you think?
| ucs_assertv(strlen(delim) == 1, "delimiter must be a single character"); | ||
|
|
||
| if (expand_ranges) { | ||
| ucs_config_array_expand_ranges(&strb, buf, delim, UCS_CONFIG_ARRAY_MAX); |
There was a problem hiding this comment.
for my understanding, after expansion, does it lead to output logs stating that specified devices are unknown, in case they would not be there?
There was a problem hiding this comment.
Yes, it's one of the requirements to keep these warnings for ranges as well.
I updated the test test_nonexistent_device_warning to include a range as well
| UCS_TEST_P(test_ucp_devices_config_mlx5, duplicate_device_warning_two_ranges) | ||
| { | ||
| test_duplicate_device_warning({"mlx5_0:1", "mlx5_1:1", "mlx5_2:1"}, | ||
| "mlx5_[0-1],mlx5_[1-2]", "mlx5_1"); |
There was a problem hiding this comment.
for my understanding does range support mlx5_[0-1]:2? if yes could we add test if not already there?
There was a problem hiding this comment.
Yes, but do you know if these devices exist in the CI setup?
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
…sing Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
Signed-off-by: Guy Ealey Morag <gealeymorag@nvidia.com>
| const char *input, const char *delim, | ||
| size_t max_elements) | ||
| { | ||
| size_t count_total = 0; |
There was a problem hiding this comment.
Minotr: num_elements would be more intuitive to be compared against max_elements?
What?
Add support numeric ranges for devices (e.g.. mlx5_[0-2]) with
UCX_*_DEVICESenvironment variables (NET_DEVICES, SHM_DEVICES, ACC_DEVICES, SELF_DEVICES).Why?
This makes it easier to specify multiple related devices easily, for example:
UCX_NET_DEVICES=mlx5_[1-3]instead ofUCX_NET_DEVICES=mlx5_1,mlx5_2,mlx5_3How?
Added a new config type
UCS_CONFIG_TYPE_ALLOW_LIST_WITH_RANGESthat detects the numeric range pattern in the list and expands it to the different items of the range.Note:
Range expansion was originally planned to be implemented as a generic api in
stringandstring_buffer(#11215), but due to increasing complexity in the generic implementation, it is put on hold for now and the non-generic design is prioritized instead.