feat: Add canister_metrics endpoint on management canister#6217
feat: Add canister_metrics endpoint on management canister#6217dsarlis wants to merge 18 commits intodfinity:masterfrom
Conversation
Co-authored-by: mraszyk <31483726+mraszyk@users.noreply.github.com>
Co-authored-by: mraszyk <31483726+mraszyk@users.noreply.github.com>
mraszyk
left a comment
There was a problem hiding this comment.
LGTM modulo the note about query calls in the formal part of the spec
Co-authored-by: mraszyk <31483726+mraszyk@users.noreply.github.com>
Add counter versions of metrics for consumed cycles that are stored in `ReplicatedState`. The existing ones behave like gauges (so their values can go down when prepayments are made and up when refunds are issued) which makes it more challenging for consumers to build automated monitoring tools to perform aggregations over them. By having them monotonically increase, it's easier to calculate rates of change, show aggregates over time etc. The key idea is to introduce a second map of `<CyclesUseCase, NominalCycles>` in the `ReplicatedState` that will only be updated once per use case: either at the payment stage if we know the precise amount or only at refund stage if a prepayment is made with an expected refund later. The second map is quite similar to the existing in all other aspects (how they are stored in checkpoints or how they are exposed as prometheus metrics) besides how the values are updated. A new map is introduced to ease the transition as migrating from the old map to new is non-trivial given that a proper cutoff point needs to be introduced to handle outstanding callbacks that might have been created before the metric introduction. This is left for a follow-up if and when people decide to do it. The new map will be used in a follow up that will implement the [new management canister endpoint](dfinity/portal#6217) to retrieve canister level metrics. Additionally, the new metrics include the use case `HttpsOutcalls` in the canister level metrics as it's useful to determine how much each canister uses this feature. I've opted to not change existing metrics to do the same as it would make things less clean imo than the current approach -- a single specific API is used to perform this update in exactly one place where it's needed. The changes in the PR are mostly driven by the addition of the new map of metrics, updates in protobuf files to store the new metrics, the changes to support having the `HttpsOutcalls` use case additionally included as well as some changes in tests to support the new metrics.
mraszyk
left a comment
There was a problem hiding this comment.
This doesn't seem merged at the moment so not sure what kind of feedback/action is expected.
| list_canisters : () -> (list_canisters_result) query; | ||
|
|
||
| // Returns canister related metrics | ||
| canister_metrics: (canister_metrics_args) -> (canister_metrics_result) query; |
There was a problem hiding this comment.
let's move this close to canister_status since it belongs together
There was a problem hiding this comment.
You mean only here or also in the main document? It's now listed as the last API, it seems we kinda follow chronological order there but I can move it, I don't have a strong opinion.
Nothing more than what you already provided. I merged with master and had to fix some conflicts, so I just wanted to make sure it's ready (sans the changelog entry which will happen right before merging). |
Implement the `canister_metrics` endpoint as described in dfinity/portal#6217 to allow controllers or subnet admins to retrieve canister level metrics for a target canister. The metrics currently contain some basic cycles related ones but can be extended in the future to contain more. The necessary boilerplate is added to wire the new endpoint through the code stack. As defined in the interface spec, the API is available both in replicated mode as well as a query call. A few tests are also added to ensure that the endpoint correctly returns the values that are present in the replicated state.
This PR introduces a management canister API that will be used to report canister related metrics to canister controllers or subnet admins. The API currently only includes a set of cycles consumed by different use cases but is designed that it can be easily extended to include more metrics (whether cycles related or even calls processed etc).
.mdxfile format to support the previous two components./sidebars.js, otherwise, it will not appear in theside navigation bar.
.github/CODEOWNERSfile iscontains any new directories or specific documents that you added that should be reviewed by a specific teamm.