Skip to content

CPU and Max RSS Analysis tools#675

Open
ChrisPaulBennett wants to merge 6 commits intocylc:masterfrom
ChrisPaulBennett:memory_view
Open

CPU and Max RSS Analysis tools#675
ChrisPaulBennett wants to merge 6 commits intocylc:masterfrom
ChrisPaulBennett:memory_view

Conversation

@ChrisPaulBennett
Copy link
Contributor

@ChrisPaulBennett ChrisPaulBennett commented Mar 12, 2025

This apart of 3 pull requests for adding CPU time and Max RSS analysis to the Cylc UI.

This adds the Max RSS and CPU time (as measured by cgroups) to the table view, box plot and time series views.

Linked to;
cylc/cylc-flow#6663
cylc/cylc-ui#2100

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@ChrisPaulBennett ChrisPaulBennett marked this pull request as draft March 12, 2025 09:19
@MetRonnie

This comment was marked as resolved.

@ChrisPaulBennett
Copy link
Contributor Author

I've merged in the previous changes to the SQL infrastructure that deals with task states. @oliver-sanders I've had to change some of your code that creates parts of the SQL query, mostly just to take into account the new join.
The codecov failures are from code that I haven't touched.

@MetRonnie

This comment was marked as resolved.

Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes appear to have broken compatibility with workflows run in earlier versions of Cylc. I tested the tasks GraphQL query on a workflow that ran in 8.4.4.dev and got an empty result

query {
  tasks(live: false, workflows: ["wflow"]) {
    name
  }
}
{
  "data": {
    "tasks": []
  }
}

@ChrisPaulBennett
Copy link
Contributor Author

This been fixed. Max RSS and CPU time will be 0, if no data is present. An update to the cylc-ui needs to be made to not show cpu time / max rss charts if there is no data

@ChrisPaulBennett ChrisPaulBennett self-assigned this Jul 18, 2025
@MetRonnie MetRonnie added this to the 1.8.0 milestone Aug 4, 2025
@MetRonnie
Copy link
Member

This been fixed. Max RSS and CPU time will be 0, if no data is present. An update to the cylc-ui needs to be made to not show cpu time / max rss charts if there is no data

Is there a way to make this null instead of 0? Because currently the UI can't tell the difference between no data and times that are actually zero (to the nearest whole second).

@oliver-sanders oliver-sanders modified the milestones: 1.8.0, 1.9.0 Sep 17, 2025
@ChrisPaulBennett
Copy link
Contributor Author

This been fixed. Max RSS and CPU time will be 0, if no data is present. An update to the cylc-ui needs to be made to not show cpu time / max rss charts if there is no data

Is there a way to make this null instead of 0? Because currently the UI can't tell the difference between no data and times that are actually zero (to the nearest whole second).

This has been fixed in the UI repo

@MetRonnie

This comment was marked as resolved.

@oliver-sanders oliver-sanders requested review from wxtim and removed request for MetRonnie January 13, 2026 15:38
task_events(
cycle TEXT,
name TEXT,
submit_num INTEGER,
Copy link
Member

@wxtim wxtim Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this order is wrong - time comes before submit number in the table. I'm going to push a PR which breaks one of these tests, but should make this setup safer by building the db using data from cylc-flow. This will make these tests more fragile in a desirable way.

ChrisPaulBennett#6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently you have discussed this with Oliver and agreed that the hardcoded approach allows us to ensure back compat (legit). However I still think that the table order is wrong.

'User',
'UsersJob',
),
task_entries=[
Copy link
Member

@wxtim wxtim Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've used this chunk of database twice - I've pulled it out and turned it into a "module" scope pytest fixture to save on creating two databases in ChrisPaulBennett#6.

Also about 150 lines of code less.

return conn


def test_make_task_query_1():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have proposed docstrings in my ChrisPaulBennett#6

* Simplify SQL where possible.
* Only retrieve and process profiler fields when requested for the jobs
  query.
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this end-to-end and can confirm it is working as intended.

I have a couple of suggestions to simply things:

  1. Switch the message format from string to JSON.
    • SQL/Sqlite has JSON functionality which makes pulling multiple messages out of a single field easier.
    • This removes the need for many of the INSTR and SUBSTR operators.
    • This also removes the need for the CAST operators.
  2. Only retrieve/process these new fields when they were requested.
    • The jobs query is used for other things in the GUI.
    • The tasks query isn't yet, but will be one day.
    • To reduce the overhead added with this PR, we aught to avoid doing any heavy lifting when it isn't necessary.

I have implemented these suggestions in:

See what you think.

Comment on lines +803 to +807
WITH data AS (
SELECT
tj.*,
CAST(
SUBSTR(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jobs query is used for more than just the Analysis views.

The Log view uses this query to retrieve the job's status in order to work out which log file to display.

Ideally, we wouldn't want to get SQL running around doing all of this maths unless the result is actually required.

Note, I think the requested fields are available as field_ids (see get_elements above in the call chain).

Comment on lines +812 to +819
CAST(
SUBSTR(
te.message,
INSTR(te.message, 'max_rss ') + 9,
INSTR(te.message, ' mem_alloc') -
(INSTR(te.message, 'max_rss ') + 9)
) AS INT
) AS max_rss,
Copy link
Member

@oliver-sanders oliver-sanders Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to this string processing would be JSON processing.

E.G, messages would look like profiler:{"max_rss": 1234, "mem_alloc": 2345, "cpu_time": 3456} with no constraint on field order.

The query would look like:

SELECT
  j.name
  j.cycle
  j.run_status
  JSON_EXTRACT(SUBSTR(e.message, 8), '$/cpu_time') cpu_time
  JSON_EXTRACT(SUBSTR(e.message, 8), '$/max_rss') max_rss
  JSON_EXTRACT(SUBSTR(e.message, 8), '$/mem_alloc') mem_alloc
FROM
  task_jobs j
LEFT JOIN
  task_events e
ON
  j.name = e.name
  AND ...
  AND e.message LIKE 'profiler:%'

This would remove the need for the CAST and INSTR logic.

I think it would also remove the need for the WITH data AS part. Hopefully that would make it easier to turn the LEFT JOIN on/off depending on whether we require the additional values or not.

Have implemented: ChrisPaulBennett/cylc-flow#2, ChrisPaulBennett#7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants