feat(inputs.system): Add operating-system information#18834
feat(inputs.system): Add operating-system information#18834bilkoua wants to merge 5 commits intoinfluxdata:masterfrom
Conversation
|
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
|
@skartikey @srebhan @Hipska Hi, could you take a look at this PR when you have a moment? |
| | `platform_family` | string | Platform family (e.g. `debian`, `rhel`) | | ||
| | `platform_version` | string | Platform / distribution version (e.g. `26.04`) | | ||
| | `kernel_version` | string | Kernel release as returned by `uname -r` (e.g. `7.0.0-7-generic`) | | ||
| | `kernel_arch` | string | Kernel architecture as returned by `uname -m` (e.g. `x86_64`) | |
There was a problem hiding this comment.
Can we name this arch or architecture and use GOARCH to fill it?
| ttl := time.Duration(s.OSCacheTTL) | ||
| expired := ttl > 0 && now.Sub(s.osCachedAt) >= ttl | ||
| if s.osCachedAt.IsZero() || expired { |
There was a problem hiding this comment.
How about
| ttl := time.Duration(s.OSCacheTTL) | |
| expired := ttl > 0 && now.Sub(s.osCachedAt) >= ttl | |
| if s.osCachedAt.IsZero() || expired { | |
| if time.Since(s.osCachedAt) > time.Duration(s.OSCacheTTL) { |
| if err != nil { | ||
| acc.AddError(err) | ||
| } | ||
| s.osFields = osFields | ||
| s.osCachedAt = now |
There was a problem hiding this comment.
Does it make sense to override the fields if an error occurred?
| var errs []error | ||
|
|
||
| platform, family, version, err := host.PlatformInformation() | ||
| if err != nil && !strings.Contains(err.Error(), "not implemented") { | ||
| errs = append(errs, fmt.Errorf("reading platform information: %w", err)) | ||
| } | ||
| kernelVersion, err := host.KernelVersion() | ||
| if err != nil && !strings.Contains(err.Error(), "not implemented") { | ||
| errs = append(errs, fmt.Errorf("reading kernel version: %w", err)) | ||
| } | ||
| kernelArch, err := host.KernelArch() | ||
| if err != nil && !strings.Contains(err.Error(), "not implemented") { | ||
| errs = append(errs, fmt.Errorf("reading kernel architecture: %w", err)) | ||
| } | ||
|
|
||
| if platform == "" && family == "" && version == "" && kernelVersion == "" && kernelArch == "" { | ||
| return nil, errors.Join(errs...) | ||
| } | ||
| return map[string]interface{}{ | ||
| "os": runtime.GOOS, | ||
| "platform": platform, | ||
| "platform_family": family, | ||
| "platform_version": version, | ||
| "kernel_version": kernelVersion, | ||
| "kernel_arch": kernelArch, | ||
| }, errors.Join(errs...) |
There was a problem hiding this comment.
Why do we want to join the errors instead of directly returning as soon as we encounter the first issue?
| inputs.Add("system", func() telegraf.Input { | ||
| return &System{} | ||
| return &System{ | ||
| OSCacheTTL: config.Duration(defaultOSCacheTTL), |
There was a problem hiding this comment.
Please use the concrete value here instead of defining a useless constant obfuscating the actual value to the reader
| OSCacheTTL: config.Duration(defaultOSCacheTTL), | |
| OSCacheTTL: config.Duration(5 * time.Minute), |
| const testOSRelease = `NAME="Telegraf Test OS" | ||
| ID=telegraftest | ||
| VERSION_ID="1.0" | ||
| PRETTY_NAME="Telegraf Test OS 1.0" | ||
| ` | ||
|
|
||
| // setupOS points gopsutil at a synthetic os-release file via HOST_ETC. | ||
| // Kernel fields still come from the live uname syscall. | ||
| func setupOS(t testing.TB) bool { | ||
| t.Helper() | ||
| mockOSRelease(t, testOSRelease) | ||
| return true | ||
| } | ||
|
|
||
| func mockOSRelease(t testing.TB, content string) { | ||
| t.Helper() | ||
| etcDir := os.Getenv("HOST_ETC") | ||
| if etcDir == "" { | ||
| etcDir = filepath.Join(t.TempDir(), "etc") | ||
| require.NoError(t, os.MkdirAll(etcDir, 0750)) | ||
| t.Setenv("HOST_ETC", etcDir) | ||
| } | ||
| writeOSRelease(t, etcDir, content) | ||
| } | ||
|
|
||
| func writeOSRelease(t testing.TB, etcDir, content string) { | ||
| t.Helper() | ||
| require.NoError(t, os.WriteFile(filepath.Join(etcDir, "os-release"), []byte(content), 0640)) | ||
| } |
There was a problem hiding this comment.
No! Simply create a testdata directory in the plugin, place the file with the required content there and use t.Setenv("HOST_ETC") to the testdata directory in the actual test!
| func newOSPlugin(ttl time.Duration) *System { | ||
| return &System{ | ||
| Include: []string{"os"}, | ||
| OSCacheTTL: config.Duration(ttl), | ||
| Log: &testutil.Logger{}, | ||
| } | ||
| } |
There was a problem hiding this comment.
Please define this in the tests instead of doing it in a function. For a reviewer (or future coder) your indirection will hide the actual test setup and you need to navigate around in the code to find out what the test actually does.
| m, found := acc.Get("system_os") | ||
| require.True(t, found, "system_os metric not produced") | ||
|
|
||
| require.Equal(t, "linux", m.Fields["os"]) | ||
| require.Equal(t, "telegraftest", m.Fields["platform"]) | ||
| require.Empty(t, m.Fields["platform_family"]) | ||
| require.Equal(t, "1.0", m.Fields["platform_version"]) | ||
| require.IsType(t, "", m.Fields["kernel_version"]) | ||
| require.NotEmpty(t, m.Fields["kernel_version"]) | ||
| require.IsType(t, "", m.Fields["kernel_arch"]) | ||
| require.NotEmpty(t, m.Fields["kernel_arch"]) |
There was a problem hiding this comment.
Please define the expected metric(s) of type []telegraf.Metric using metric.New and then use testutil.RequireMetricsEqual to verify the collected metric(s) match the expectation!
Same below.
| func BenchmarkGatherOS(b *testing.B) { | ||
| setupOS(b) | ||
|
|
||
| s := newOSPlugin(defaultOSCacheTTL) | ||
| require.NoError(b, s.Init()) | ||
|
|
||
| var acc testutil.Accumulator | ||
| for b.Loop() { | ||
| acc.ClearMetrics() | ||
| if err := s.Gather(&acc); err != nil { | ||
| b.Fatal(err) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
I don't think we need a benchmark here...
There was a problem hiding this comment.
Get rid of this. Simply limit the test to Linux and be good.
Summary
This is the second step of the plan agreed in #18533: add an
osvalueto the
includeoption ofinputs.systemthat emits a separatesystem_osmeasurement with operating system release andunameinformation.
The data is gathered through
gopsutilfor cross-platform support andis read with the granular
host.PlatformInformation(),host.KernelVersion()andhost.KernelArch()calls instead ofhost.Info()to avoid unrelated probes for virtualization, boot timeand process counts.
The result is cached for
os_cache_ttl(default5m) between gatherssince these values are effectively static at runtime; setting
os_cache_ttl = "0s"keeps the cache for the lifetime of theprocess, while a tiny positive value such as
"1ns"effectivelydisables the cache and forces a fresh read on every gather. The TTL
handling matches the convention used by
inputs.sqlserver,inputs.logql,inputs.promql,outputs.iotdbandoutputs.parquetwhere a zero duration disables the time bound.Fields emitted by the
system_osmeasurementosruntime.GOOSplatform/etc/os-releaseetc.platform_family/etc/os-releaseetc.platform_version/etc/os-releaseetc.kernel_versionuname -rkernel_archuname -mOn platforms where gopsutil cannot provide a particular value (e.g.
parts of FreeBSD/OpenBSD/Solaris) the corresponding field is left
empty; if no field can be gathered the metric is skipped entirely.
Backward compatibility
The default
includeset is unchanged, so existing deployments keepemitting only the
systemmeasurement and do not see the newsystem_osmetric until they opt in.Example output
Checklist