feat(observability): auto-provision GoodData-CN dashboards via Grafan…#104
Draft
nortonsk wants to merge 11 commits intogooddata:masterfrom
Draft
feat(observability): auto-provision GoodData-CN dashboards via Grafan…#104nortonsk wants to merge 11 commits intogooddata:masterfrom
nortonsk wants to merge 11 commits intogooddata:masterfrom
Conversation
…a sidecar Enable the Grafana sidecar in observability.tf so it watches the observability namespace for ConfigMaps labelled grafana_dashboard=1. foldersFromFilesStructure=true maps subdirectory names to Grafana folders. Add grafana-dashboards.tf with a kubernetes_config_map_v1 resource for the gooddata-cn-overall-health dashboard. Terraform replace() substitutes GDMIMIR->prometheus and GDLOKI->loki at plan time so the dashboard resolves to local datasources automatically. No manual import step needed after terraform apply — the dashboard appears in the GoodData-CN folder in Grafana within ~10 seconds of apply. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
b97d463 to
264c308
Compare
…ions Covers three deployment options: automatic via gooddata-cn-terraform, Grafana UI import, and kubectl ConfigMap for any Kubernetes environment. Documents datasource UID substitution and how to update the JSON. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
c041ccf to
39b46d4
Compare
- Auto-provision dashboard via Grafana sidecar ConfigMap (grafana_dashboard=1 label, grafana_folder annotation) so it appears in GoodData-CN folder without manual import - Alias GDMIMIR datasource UID to local Prometheus so all 80+ Prometheus panels work out of the box on local (k3d) deployments without a dedicated Mimir instance - Set $cluster variable allValue=".*" + includeAll=true so local deployments (which have no cluster_name label) match all series when "All" is selected - Fix all cluster_name filters to use regex match (=~) so the ".*" allValue works correctly; previously exact-match would return no data when "All" was selected - Replace nginx ingress log/metric queries with api-gw container queries - Replace removed forward_call_* metrics with OTel http_server_request_duration_seconds_* on API 5xx Error Rate, API Latency Distribution, and Gateway 5xx Error Count panels - Keep forward_call_response_status_count_total on API Request Rate by Upstream Service (better upstream-host granularity); add migration note in panel description - Fix OOM Kills stat panel to use max_over_time(…[$__range]) for full time-window view - Remove Calcique metadata lookup times panel - Pre-install prometheus-operator-crds before k8s-local so CNPG PodMonitor works without CRD-missing errors; kube-prometheus-stack skips CRDs (managed separately) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
ff02e12 to
309b714
Compare
… collision - Remove duplicate isDefault=true from GD Loki datasource; Grafana rejects configs with more than one default datasource per org and crashed on startup - Fix openssl passwd treating passwords starting with '-' as flags by adding '--' end-of-options sentinel in gooddata-orgs.tf password hash script Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
309b714 to
5ea6255
Compare
GoodData.CN 4.0.0 gen-ai uses alembic which builds its DB URL via Python configparser — % is reserved for interpolation and causes a ValueError when it appears in the password. Remove % from override_special so the generated password is safe for use in URL/configparser contexts. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…n config - gdcn-size-dev: add full JVM options and resource limits matching Agora (metadataApi, apiGateway, authService, calcique, pdfStaplerService, resultCache, scanModel, sqlExecutor, exportBuilder, visualExporterService, apiGw, redis-ha; all other services get resource limits/requests) - gdcn-base: enable deployQuiverGeoCollections, enableGeoArea, enableNewGeoPushpin, mapIngestionJob, resultCache pulsar invalidation - gdcn-local: add quiver geo collections S3 config via new SeaweedFS bucket - k8s-local: add gooddata-geo-collections SeaweedFS bucket - k8s-common: add local_s3_geo_collections_bucket variable Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…token secret) The mapIngestionJob requires an external map token managed via Vault in production. Local k3d installs don't have this secret, causing the job to fail with BackoffLimitExceeded. Override to disabled in gdcn-local. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…and cluster_name label - Deploy grafana-image-renderer as a sidecar service for panel/dashboard PNG export and dashboard image embedding - Enable external snapshot sharing via snapshots.raintank.io - Add nginx proxy-body-size: 50m on Grafana ingress to fix 413 on snapshot publish - Add cluster_name external label to kube-prometheus-stack so dashboards using cluster_name label selector populate correctly Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…res map token secret)" This reverts commit a5859cd.
…roduction config" This reverts commit d58985f.
skip_crds=true caused ServiceMonitor CRD errors on fresh cluster installs where the Prometheus Operator CRDs have never been deployed. Let Helm manage the CRD lifecycle, which is the safe default for a fresh install. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…a sidecar
Enable the Grafana sidecar in observability.tf so it watches the observability namespace for ConfigMaps labelled grafana_dashboard=1. foldersFromFilesStructure=true maps subdirectory names to Grafana folders.
Add grafana-dashboards.tf with kubernetes_config_map_v1 resources for gooddata-cn-overall-health and panther-overall dashboards. Terraform replace() substitutes GDMIMIR->prometheus and GDLOKI->loki at plan time so imported dashboards resolve to local datasources automatically.
Add modules/k8s-common/dashboards/ with the dashboard JSON files so the module is self-contained. Add dashboards/ tooling: export.sh, import.sh, Makefile (with sync target), docker-compose.test.yml for local testing.
No manual import step needed after terraform apply — dashboards appear in the GoodData-CN folder in Grafana within ~10 seconds of apply.