Conversation
This comment was marked as resolved.
This comment was marked as resolved.
|
I've merged in the previous changes to the SQL infrastructure that deals with task states. @oliver-sanders I've had to change some of your code that creates parts of the SQL query, mostly just to take into account the new join. |
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
These changes appear to have broken compatibility with workflows run in earlier versions of Cylc. I tested the tasks GraphQL query on a workflow that ran in 8.4.4.dev and got an empty result
query {
tasks(live: false, workflows: ["wflow"]) {
name
}
}{
"data": {
"tasks": []
}
}|
This been fixed. Max RSS and CPU time will be 0, if no data is present. An update to the cylc-ui needs to be made to not show cpu time / max rss charts if there is no data |
Is there a way to make this null instead of 0? Because currently the UI can't tell the difference between no data and times that are actually zero (to the nearest whole second). |
This has been fixed in the UI repo |
This comment was marked as resolved.
This comment was marked as resolved.
e14ad5c to
83dbbd6
Compare
83dbbd6 to
0d31a9f
Compare
| task_events( | ||
| cycle TEXT, | ||
| name TEXT, | ||
| submit_num INTEGER, |
There was a problem hiding this comment.
I think that this order is wrong - time comes before submit number in the table. I'm going to push a PR which breaks one of these tests, but should make this setup safer by building the db using data from cylc-flow. This will make these tests more fragile in a desirable way.
There was a problem hiding this comment.
Apparently you have discussed this with Oliver and agreed that the hardcoded approach allows us to ensure back compat (legit). However I still think that the table order is wrong.
| 'User', | ||
| 'UsersJob', | ||
| ), | ||
| task_entries=[ |
There was a problem hiding this comment.
You've used this chunk of database twice - I've pulled it out and turned it into a "module" scope pytest fixture to save on creating two databases in ChrisPaulBennett#6.
Also about 150 lines of code less.
| return conn | ||
|
|
||
|
|
||
| def test_make_task_query_1(): |
* Simplify SQL where possible. * Only retrieve and process profiler fields when requested for the jobs query.
oliver-sanders
left a comment
There was a problem hiding this comment.
Tested this end-to-end and can confirm it is working as intended.
I have a couple of suggestions to simply things:
- Switch the message format from string to JSON.
- SQL/Sqlite has JSON functionality which makes pulling multiple messages out of a single field easier.
- This removes the need for many of the
INSTRandSUBSTRoperators. - This also removes the need for the
CASToperators.
- Only retrieve/process these new fields when they were requested.
- The jobs query is used for other things in the GUI.
- The tasks query isn't yet, but will be one day.
- To reduce the overhead added with this PR, we aught to avoid doing any heavy lifting when it isn't necessary.
I have implemented these suggestions in:
See what you think.
cylc/uiserver/schema.py
Outdated
| WITH data AS ( | ||
| SELECT | ||
| tj.*, | ||
| CAST( | ||
| SUBSTR( |
There was a problem hiding this comment.
The jobs query is used for more than just the Analysis views.
The Log view uses this query to retrieve the job's status in order to work out which log file to display.
Ideally, we wouldn't want to get SQL running around doing all of this maths unless the result is actually required.
Note, I think the requested fields are available as field_ids (see get_elements above in the call chain).
cylc/uiserver/schema.py
Outdated
| CAST( | ||
| SUBSTR( | ||
| te.message, | ||
| INSTR(te.message, 'max_rss ') + 9, | ||
| INSTR(te.message, ' mem_alloc') - | ||
| (INSTR(te.message, 'max_rss ') + 9) | ||
| ) AS INT | ||
| ) AS max_rss, |
There was a problem hiding this comment.
An alternative to this string processing would be JSON processing.
E.G, messages would look like profiler:{"max_rss": 1234, "mem_alloc": 2345, "cpu_time": 3456} with no constraint on field order.
The query would look like:
SELECT
j.name
j.cycle
j.run_status
JSON_EXTRACT(SUBSTR(e.message, 8), '$/cpu_time') cpu_time
JSON_EXTRACT(SUBSTR(e.message, 8), '$/max_rss') max_rss
JSON_EXTRACT(SUBSTR(e.message, 8), '$/mem_alloc') mem_alloc
FROM
task_jobs j
LEFT JOIN
task_events e
ON
j.name = e.name
AND ...
AND e.message LIKE 'profiler:%'This would remove the need for the CAST and INSTR logic.
I think it would also remove the need for the WITH data AS part. Hopefully that would make it easier to turn the LEFT JOIN on/off depending on whether we require the additional values or not.
Have implemented: ChrisPaulBennett/cylc-flow#2, ChrisPaulBennett#7
e30b993 to
9b2e5b8
Compare
This apart of 3 pull requests for adding CPU time and Max RSS analysis to the Cylc UI.
This adds the Max RSS and CPU time (as measured by cgroups) to the table view, box plot and time series views.
Linked to;
cylc/cylc-flow#6663
cylc/cylc-ui#2100
Check List
CONTRIBUTING.mdand added my name as a Code Contributor.setup.cfg(andconda-environment.ymlif present).?.?.xbranch.