Add metrics about the solver-service behaviour by moyodiallo · Pull Request #70 · ocurrent/solver-service

moyodiallo · 2023-07-06T12:52:09Z

No description provided.

tmcgilchrist · 2023-09-25T00:41:59Z

@moyodiallo Could you update this to the EIO version of the solver?

moyodiallo · 2023-09-27T16:47:34Z

The metrics is adapted to EIO version #71. It needs some tests with Grafana graphics.

tmcgilchrist

Everything else looks good. @moyodiallo co-ordinate with @mtelvers to deploy it to a solver worker and check the statistics are exported correctly. Do we still have a test solver pool available?

tmcgilchrist · 2023-10-04T04:36:13Z

service/pool.ml

+  let request, set_reply = Eio.Stream.take t.requests in
+  Atomic.incr t.running;
  handle request |> Promise.resolve set_reply;
+  Atomic.decr t.running;


This pattern (incr, handle, decr) feels unsafe in the presence of exceptions from handle request cc @talex5

handle request needs to catch those exceptions otherwise we're losing a worker and that will result as a bug.

Yes, if a worker crashes then the whole service exits, so it's not strictly necessary to handle exceptions here, although it would make the code more robust to future changes. However, I don't think you need this counter. The number of running workers is always min n_workers n_jobs, so you can just calculate it as needed.

Yes you're right, I did not get this min n_workers n_jobs. Thanks.

Oh, we don't have n_jobs in the pool. All we have is waiting jobs (jobs Eio.Stream.t). It won't be precise if consider waiting jobs as n_jobs, ex. 2 jobs waiting and all the 8 workers busy.

What would be useful to see is:

n_workers - static capacity of this solver_worker

queued_requests - requests waiting for a Fiber to run on

running_workers - Fibers actively running a solve job

That would let us answer questions like total solver_worker capacity available, utilisation of the total capacity, and saturation of the total capacity and per solver_worker saturation.

The number of running workers is always min n_workers n_jobs, so you can just calculate it as needed.

I thought from reading the code and https://github.com/ocaml-multicore/eio?search=1#multicore-support, Pool pre_forked a number of OS threads (Domains?) which would wake up when work was added to the Eio.Stream? Is that accurate @talex5 ?

That's right - there is a fixed pool of workers and they will all be busy unless there just aren't any jobs in the queue.

The issue here is that run_worker is running in a worker domain, and so can't (currently) update Prometheus metrics itself (which I guess it why it's updating an atomic instead and waiting for the main domain to report that). But the main domain can work out how many workers are running just by knowing how many jobs are outstanding, so this isn't necessary.

That's right there's a way to do that.

service/solver.ml

solver instead of worker make sense. Co-authored-by: Mark Elvers <mtelvers@users.noreply.github.com>

The pool stream wasn't accumulating the requests. So it was incorrect to consider its length as waiting requests.

moyodiallo requested review from benmandrew, mtelvers and tmcgilchrist July 6, 2023 12:52

moyodiallo marked this pull request as ready for review July 18, 2023 08:16

moyodiallo force-pushed the metrics branch from 4016c53 to 0694e2b Compare September 27, 2023 16:39

moyodiallo force-pushed the metrics branch from 0694e2b to 6e7f2b1 Compare September 27, 2023 16:55

Add metrics on the internal-workers

9975663

moyodiallo force-pushed the metrics branch from 6e7f2b1 to 9975663 Compare October 2, 2023 10:15

tmcgilchrist reviewed Oct 4, 2023

View reviewed changes

Update from Thomas's comment

08be7f2

mtelvers reviewed Oct 18, 2023

View reviewed changes

service/solver.ml Outdated Show resolved Hide resolved

moyodiallo and others added 2 commits October 18, 2023 15:39

Update service/solver.ml

fbe5cd9

solver instead of worker make sense. Co-authored-by: Mark Elvers <mtelvers@users.noreply.github.com>

Refactoring, fixing the number of waiting requests

b857d5f

The pool stream wasn't accumulating the requests. So it was incorrect to consider its length as waiting requests.

moyodiallo force-pushed the metrics branch from c698f05 to b857d5f Compare October 25, 2023 11:31

Add the total time spent finding solutions.

c790a40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics about the solver-service behaviour#70

Add metrics about the solver-service behaviour#70
moyodiallo wants to merge 5 commits intoocurrent:mainfrom
moyodiallo:metrics

moyodiallo commented Jul 6, 2023

Uh oh!

tmcgilchrist commented Sep 25, 2023

Uh oh!

moyodiallo commented Sep 27, 2023

Uh oh!

tmcgilchrist left a comment

Uh oh!

tmcgilchrist Oct 4, 2023

Uh oh!

moyodiallo Oct 4, 2023

Uh oh!

talex5 Oct 4, 2023

Uh oh!

moyodiallo Oct 4, 2023 •

edited

Loading

Uh oh!

moyodiallo Oct 4, 2023

Uh oh!

tmcgilchrist Oct 5, 2023

Uh oh!

talex5 Oct 8, 2023

Uh oh!

moyodiallo Oct 9, 2023

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

moyodiallo commented Jul 6, 2023

Uh oh!

tmcgilchrist commented Sep 25, 2023

Uh oh!

moyodiallo commented Sep 27, 2023

Uh oh!

tmcgilchrist left a comment

Choose a reason for hiding this comment

Uh oh!

tmcgilchrist Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

moyodiallo Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

talex5 Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

moyodiallo Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moyodiallo Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

tmcgilchrist Oct 5, 2023

Choose a reason for hiding this comment

Uh oh!

talex5 Oct 8, 2023

Choose a reason for hiding this comment

Uh oh!

moyodiallo Oct 9, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

moyodiallo Oct 4, 2023 •

edited

Loading