Skip to content

fix wrong prometheus-config in working-with-prometheus-in-control-plane#968

Open
XiShanYongYe-Chang wants to merge 1 commit intokarmada-io:mainfrom
XiShanYongYe-Chang:fix-prometheus-config-in-control-plane
Open

fix wrong prometheus-config in working-with-prometheus-in-control-plane#968
XiShanYongYe-Chang wants to merge 1 commit intokarmada-io:mainfrom
XiShanYongYe-Chang:fix-prometheus-config-in-control-plane

Conversation

@XiShanYongYe-Chang
Copy link
Member

What type of PR is this?

/kind documentation

What this PR does / why we need it:

When I deploy Prometheus deployment in karmada control plane refer to the doc: https://karmada.io/docs/administrator/monitoring/working-with-prometheus-in-control-plane, I find that I can't get the metrics with the comment karmada-controller-manager and karmada-scheduler.

I checkout the promethus job health:

# curl -s "http://localhost:xxxx/api/v1/targets" | jq '.data.activeTargets[] | select(.labels.job=="karmada-controller-manager") | {labels: .labels, health: .health, lastError: .lastError}'
{
  "labels": {
    "instance": "10.244.0.12:8080:8080",
    "job": "karmada-controller-manager"
  },
  "health": "down",
  "lastError": "Get \"http://10.244.0.12:8080:8080/metrics\": dial tcp: lookup 10.244.0.12:8080: no such host: address 10.244.0.12:8080:8080: too many colons in address"
}
{
  "labels": {
    "instance": "10.244.0.13:8080:8080",
    "job": "karmada-controller-manager"
  },
  "health": "down",
  "lastError": "Get \"http://10.244.0.13:8080:8080/metrics\": dial tcp: lookup 10.244.0.13:8080: no such host: address 10.244.0.13:8080:8080: too many colons in address"
}

# curl -s "http://localhost:xxxx/api/v1/targets" | jq '.data.activeTargets[] | select(.labels.job=="karmada-scheduler") | {labels: .labels, health: .health, lastError: .lastError}'
{
  "labels": {
    "instance": "10.244.0.15:8080:10351",
    "job": "karmada-scheduler"
  },
  "health": "down",
  "lastError": "Get \"http://10.244.0.15:8080:10351/metrics\": dial tcp: lookup 10.244.0.15:8080: no such host: address 10.244.0.15:8080:10351: too many colons in address"
}
{
  "labels": {
    "instance": "10.244.0.14:8080:10351",
    "job": "karmada-scheduler"
  },
  "health": "down",
  "lastError": "Get \"http://10.244.0.14:8080:10351/metrics\": dial tcp: lookup 10.244.0.14:8080: no such host: address 10.244.0.14:8080:10351: too many colons in address"
}

It can see that their health was down, so I fiexed it.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

@karmada-bot karmada-bot added the kind/documentation Categorizes issue or PR as related to documentation. label Jan 31, 2026
@karmada-bot karmada-bot requested review from Poor12 and Tingtal January 31, 2026 07:16
@karmada-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rainbowmango for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 31, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @XiShanYongYe-Chang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue in the Prometheus monitoring documentation for Karmada, where the provided configuration prevented Prometheus from correctly scraping metrics from the karmada-controller-manager and karmada-scheduler. The changes rectify the Prometheus job configurations by correcting the target address generation logic and ensuring the correct ports are used, thereby enabling successful metric collection and improving the accuracy of the monitoring setup described in the documentation.

Highlights

  • Prometheus Configuration Fix: Corrected the Prometheus job configurations for karmada-scheduler and karmada-controller-manager to properly scrape metrics.
  • Target Address Resolution: Modified relabel_configs to use __meta_kubernetes_pod_ip for constructing target addresses, resolving 'too many colons in address' errors.
  • Port Correction: Updated the karmada-scheduler metrics port from 10351 to 8080 in the Prometheus configuration.
  • Namespace Filtering: Added a namespaces filter to the kubernetes_sd_configs to ensure Prometheus only discovers pods within the karmada-system namespace for these jobs.
  • TLS Configuration Removal: Removed the tls_config from the Prometheus jobs as the scheme is http, making the TLS configuration unnecessary and potentially misleading.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes an issue in the Prometheus configuration documentation that caused target scrape errors. The change to use __meta_kubernetes_pod_ip for constructing the target address is the right approach.

I have two main points of feedback:

  1. One of the documentation files has an unrelated kubernetes-apiserver job removed, which seems to be a mistake.
  2. The metrics port for karmada-scheduler is changed to 8080. While this appears correct for the local-up-karmada.sh environment, it differs from the default port (10351). I've suggested adding comments to clarify this for future readers.

Please see the detailed comments below.

@XiShanYongYe-Chang
Copy link
Member Author

Hi @jabellard I see you have been working on some control plane metrics recently. Could you please help review this PR?

Signed-off-by: changzhen <changzhen5@huawei.com>
@XiShanYongYe-Chang XiShanYongYe-Chang force-pushed the fix-prometheus-config-in-control-plane branch from f1499b5 to 816c61e Compare January 31, 2026 07:30
@jabellard
Copy link
Member

Hi @jabellard I see you have been working on some control plane metrics recently. Could you please help review this PR?

Thanks for looking into this. Does that tutorial work end to end? I was thinking of updating it to provide guidance on how to run a Prom stack in-cluster using kube-prom-stack in another PR.

That chart also installs the Prometheus operator which takes care of dynamically programming the Prometheus instance. This makes it very easy for users to setup scraping of pod metrics without having to be Prometheus experts.

@XiShanYongYe-Chang , @RainbowMango : What do you guys think?

@XiShanYongYe-Chang
Copy link
Member Author

Thanks @jabellard

Thanks for looking into this. Does that tutorial work end to end?

The process can be guided and run end to end, and I also performed the testing based on this guidance.

Looking forward to your new document.

@jabellard
Copy link
Member

Thanks @jabellard

Thanks for looking into this. Does that tutorial work end to end?

The process can be guided and run end to end, and I also performed the testing based on this guidance.

Looking forward to your new document.

Thanks. Relabeling rules are bit hard to read, but based on your test reports, things generally look good.

/lgtm

/cc @RainbowMango for another look.

I'll follow up with the updated guide for this soon.

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 4, 2026
@XiShanYongYe-Chang
Copy link
Member Author

cc @RainbowMango

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/documentation Categorizes issue or PR as related to documentation. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments