Skip to content

http/client: Count and export integrated queue length#3259

Open
xemul wants to merge 1 commit intoscylladb:masterfrom
xemul:br-http-client-integrated-queue-length
Open

http/client: Count and export integrated queue length#3259
xemul wants to merge 1 commit intoscylladb:masterfrom
xemul:br-http-client-integrated-queue-length

Conversation

@xemul
Copy link
Contributor

@xemul xemul commented Feb 17, 2026

There's an implicit queue of requests in the client -- when the connection pool exhausts the client waits for a free socket to become available. Since there can be many parallel fibers trying to call .make_request(), this forms a queue of requests.

This patch counts the number of "queued" requests into the integrated length value merged recently. Checkpoint happens once the request gets its socket and proceeds.

Exporting should be performed by the caller. For that the client provides const reference to the integrated length value itself. It's preferred that called exports .integral() value as COUNTER metrics and rate()-s it on the dashboard.

@xemul xemul requested a review from kreuzerkrieg February 17, 2026 09:05
@xemul xemul requested a review from nyh February 23, 2026 07:59
@xemul
Copy link
Contributor Author

xemul commented Mar 10, 2026

@nyh , please review

Copy link
Contributor

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, this is the first time I see this integrated-length.hh (I guess I've been sleeping under a rock), so I'm not 100% sold on its usefulness, but it does sound like it might be useful.

I think there's a small problem here. I'm clicking "approve", because I don't think it matters much, but please consider it anyway:

Looking at the integrated_length, it seems it has a third optional template parameter, Resolution which defaults to nanoseconds. I don't see any reason to keep nanosecond resolution, when the lowres_clock's resolution is milliseconds. So I think the "Resolution" can, and should, be milliseconds, not nanoseconds.
By the way, why doesn't the resolution defaults to taking the resolution of the clock itself instead of a separate parameter?

@xemul
Copy link
Contributor Author

xemul commented Mar 10, 2026

So I think the "Resolution" can, and should, be milliseconds, not nanoseconds.

Well, it's two fold. On one hand, HTTP requests are not expected to come at nanoseconds frequency, that's true. On the other hand, unless the update frequency exceeds the resolution, it only affects the internal 64-bit integral value overflow rate. For observed value in the range of [0,1000], even using nanoseconds resolution the calculated 64-bit integral would overflow in ~6 months.

By the way, why doesn't the resolution defaults to taking the resolution of the clock itself instead of a separate parameter?

Because both, steady_clock and lowres_clock in seastar have the same rep value 🤷‍♂️

@xemul xemul force-pushed the br-http-client-integrated-queue-length branch from f9b9b90 to f12f99d Compare March 11, 2026 07:17
@xemul
Copy link
Contributor Author

xemul commented Mar 11, 2026

upd:

  • changed resolution to milliseconds

@nyh
Copy link
Contributor

nyh commented Mar 11, 2026

* changed resolution to milliseconds

When I suggested to use milliseconds, I was for some reason convinced that lowres_clock actually has milliseconds resolution. Now I see you were right, it has the same resolution as the regular system clock, and increments the time every task_quota, which concievably maybe less than one millisecond... So maybe I was wrong, and it's better to use microseconds not milliseconds? Although milliseconds is probably good enough - and will allow integration of larger numbers over longer time periods. So I don't know.

@xemul
Copy link
Contributor Author

xemul commented Mar 12, 2026

The integrated_value Resultion "interacts" with the frequency of checkpoints. If they happen faster than two subsequent resolution tick, they would get squashed. For http client we don't expect requests to get queued as fast as once-per-less-than-millisecond.

@nyh
Copy link
Contributor

nyh commented Mar 12, 2026

The integrated_value Resultion "interacts" with the frequency of checkpoints. If they happen faster than two subsequent resolution tick, they would get squashed. For http client we don't expect requests to get queued as fast as once-per-less-than-millisecond.

So would you like to leave it millisecond or return it to microsecond? Up to you. When I suggested millisecond, I was under the mistaken impression that this is the resolution we already have.

There's an implicit queue of requests in the client -- when the connection
pool exhausts the client waits for a free socket to become available. Since
there can be many parallel fibers trying to call .make_request(), this forms
a queue of requests.

This patch counts the number of "queued" requests into the integrated length
value merged recently. Checkpoint happens once the request gets its socket
and proceeds.

Exporting should be performed by the caller. For that the client provides
const reference to the integrated length value itself. It's preferred that
called exports .integral() value as COUNTER metrics and rate()-s it on the
dashboard.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
@xemul xemul force-pushed the br-http-client-integrated-queue-length branch from f12f99d to a188ce8 Compare March 13, 2026 07:29
@xemul
Copy link
Contributor Author

xemul commented Mar 13, 2026

I think microseconds is safer yet harmless (pushed the updated patch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants