Modernize s3proxy: Go 1.26, slog, metrics, traces, hardening#41
Merged
Conversation
Update go.mod and go.work directives from 1.25.3 to 1.26. Installed toolchain and latest upstream are both 1.26.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AGENTS.md captures Tier C rules for AI-assisted development on this repo (applicable sections only: code quality, vulnerability scanning, vendoring, testing/architecture, project layout, DI, error handling, logging, metrics, alerting, dashboards, tracing, Vault secrets, SBOM, Renovate policy). docs/DEVELOPMENT.md is the accompanying developer workflow guide, adapted for GitHub (gh CLI) and the s3proxy Go backend with no database. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Package rename
--------------
internal/crypto → internal/cryptoutil to avoid shadowing the Go stdlib
crypto package (revive var-naming, resolves the remaining golangci-lint
finding on main).
cmd/main.go
-----------
- Add ReadHeaderTimeout (10s) on http.Server to mitigate Slowloris (G112).
- Wrap ListenAndServe{TLS} return values with fmt.Errorf for wrapcheck.
- Replace panic(err) on startup with log.WithError(err).Fatal(...) for
consistent structured JSON failure output.
- Extract log level switch into setLogLevel helper (reduces main's
cyclomatic complexity and drops stale TODO blocks).
- Remove unused --kms flag and kmsEndpoint field (never wired up).
internal/router
---------------
- handleHealthEndpoints: drop superfluous second WriteHeader(500) that ran
after Write had already committed the status line.
- handleForwards: replace http.DefaultClient with a named HTTP client with
Timeout and tuned Transport; stream the upstream response via io.Copy
instead of buffering it entirely; preserve multi-value response headers
(Set-Cookie, Vary, …) by copying slices rather than using Get/Set which
only keeps the first value.
- PutObject logging: fix incorrect logrus key/value usage that was treated
as positional formatting args.
- Rename the get/put method-allow wrappers to requireGET/requirePUT and
correct the wrong doc comment on the former put wrapper.
- Drop the commented-out containsBucket helper.
- parseErrorCode: use errors.As against *awshttp.ResponseError.HTTPStatusCode
instead of regex-scraping err.Error(); no more hot-path regex compilation.
internal/s3
-----------
- ErrorRawResponse.Error() now returns the wrapped error message instead of
the raw HTTP body, preventing accidental leaks of upstream response bodies
through log.Error(err) / error-chain dumps. Callers that specifically need
the raw body (handleGetObjectError) now read the RawResponse field
explicitly.
Verified: golangci-lint clean, go test -race ./... passes, govulncheck
reports no vulnerabilities.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Correctness + efficiency fixes from the branch review. Router now constructs the S3 client once in New() and stores it ------------------------------------------------------------------ Previously Router.Serve called s3.NewClient for every incoming request. The AWS SDK client is safe for concurrent use and maintains its own credentials cache and HTTP connection pool; rebuilding it per request defeated both, re-registered middleware N times, and amplified credentials-refresh traffic under load. New now takes a context.Context used only during client construction, and the built client is reused by every request. s3.NewClient takes a context.Context ------------------------------------ The SDK LoadDefaultConfig call is now tied to the caller-supplied context so startup cancellation / timeout can flow through. main passes context.Background(). Upstream S3 operations get a bounded timeout -------------------------------------------- object.get and object.put still use context.WithoutCancel to avoid aborting an S3 call mid-flight when the client disconnects, but the detached context is now wrapped with a 2-minute timeout (s3OperationTimeout constant). This prevents a hung upstream from producing zombie goroutines and memory pressure. PutObject no longer computes its own Content-MD5 ------------------------------------------------ The AWS SDK is configured with RequestChecksumCalculationWhenRequired, so it signs a CRC/SHA checksum when S3 demands integrity. The explicit md5.Sum(body) was redundant, forced a second full hash pass over the ciphertext buffer, and was the non-obvious second MD5 pointed out in review (the first being client Content-MD5 validation of the plaintext body). Removed the package-level dekTag eager init ------------------------------------------- dekTag was initialised at package-load time via config.GetDekTagName, which ran before main called config.LoadConfig — so a non-default S3PROXY_DEKTAG_NAME env var was silently ignored. Usages now call config.GetDekTagName() at request time, which resolves after config is loaded. Tests updated accordingly. Verified: golangci-lint clean, go test -race ./... passes, govulncheck reports no vulnerabilities. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related security hardening changes from the review.
PutObject body size cap
-----------------------
PutObject currently buffers the full request body in memory so it can
encrypt it before forwarding to S3. Previously ContentLength was only
validated against the 5 GiB S3 hard ceiling, which let a single concurrent
request allocate up to 5 GiB and trivially OOM the process.
- New config knob S3PROXY_PUTBODY_MAX (bytes), default 256 MiB, capped at
MaxObjectSize. Exposed via config.GetMaxPutBodySize.
- New config.ValidatePutBodySize checks the declared Content-Length against
that cap and returns 413 before we allocate anything.
- readBody now takes an explicit maxBytes argument and wraps the reader
with io.LimitReader so a missing/lying Content-Length cannot trick us
into reading past the cap; exceeded bodies return a sentinel
errBodyTooLarge which handlePutObject translates to 413.
- GetObject read path passes MaxObjectSize as the cap (we still trust
upstream S3 but refuse bodies larger than the S3 hard ceiling).
KEK derivation via HKDF-SHA256 with versioning
----------------------------------------------
generateKEKFromString derived the 32-byte KEK as plain sha256(seed). That
is vulnerable to rainbow-style precomputation against low-entropy seeds
and is below current best practice even with a high-entropy seed.
- New internal/cryptoutil/kek.go introduces KEKProvider, which holds every
supported derivation so that already-stored objects remain readable.
* Legacy (KEKVersionLegacy, ""): sha256(seed) — prior behaviour.
* v1 (KEKVersionV1, "1"): HKDF-SHA256 with a fixed salt and the
info label "s3proxy/kek/v1".
KEKVersionCurrent selects the version used for new writes (v1).
- Each encrypted object now records its KEK version in object metadata
under a new tag ("<dektag>-kek-ver" by default, configurable via
S3PROXY_DEKTAG_KEKVER / config.GetKEKVersionTagName). Absence of the tag
is interpreted as legacy for forward-compatible reads.
- Router stores a cryptoutil.KEKProvider instead of a single [32]byte KEK;
handlers receive the provider and pick the right derivation per object
on read and use Current() on write.
- Tests exercise v1 round-trip, version-tagged reads, mismatched seeds,
and a legacy-object read (no version tag → SHA-256 fallback).
go.mod now promotes golang.org/x/crypto to a direct dependency (already
transitively present) for golang.org/x/crypto/hkdf.
Verified: golangci-lint clean, go test -race ./... passes, govulncheck
reports no vulnerabilities.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related structural changes.
/readyz distinguishes readiness from liveness
---------------------------------------------
Previously /healthz and /readyz both returned 200 unconditionally. The
route now splits:
- /healthz: unchanged — attests that the process is alive.
- /readyz : issues a ListBuckets probe against the upstream S3 endpoint
with a 2-second timeout. Returns 200 on success and 503 with
a short textual reason (no upstream client / upstream
unreachable) otherwise.
New s3.Client.Ping wraps s3client.ListBuckets so Router can exercise both
connectivity and credentials without needing a specific bucket.
Config struct for dependency injection (M3)
-------------------------------------------
config used a package-level koanf instance with free-floating getter
functions, which violates the AGENTS.md rule against package-scope
mutable state and made the package hard to test with alternate config
sources.
Introduce config.Config holding its own *koanf.Koanf, with methods for
every getter (Host, DekTagName, KEKVersionTagName, EncryptKey,
ThrottlingRequestsMax, MaxPutBodySize) plus Validate. config.Load
returns a fresh *Config; new code threads it explicitly. main() now
calls config.Load() + cfg.Validate() and hands *Config to runServer.
The pre-existing package-level GetX helpers and ValidateConfiguration()
are kept as thin shims over an internal Default() Config so that call
sites not yet migrated (router internals) continue to compile and run.
Default() returns an empty Config when LoadConfig has not been run,
which keeps tests that construct handlers directly working.
Verified: golangci-lint clean, go test -race ./... passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Unit tests ---------- - config/validation_test.go: table-driven coverage of ValidateBucketName (valid / length bounds / leading/trailing dash / uppercase / double dots / IPv4-like / underscore), ValidateObjectKey (empty, at-limit, over-limit, unicode), ValidateContentLength (zero, positive, S3 ceiling, negative), and ValidatePutBodySize. - router/helpers_test.go: covers match path handling, getMetadataHeaders (case-insensitive prefix + multi-value join), parseRetentionTime (empty/RFC3339/invalid), sha256sum, and the is-unwanted endpoint predicates. These helpers had no direct coverage before. The config package went from 0% to 41% coverage; router from 40.9% to 43.7%. Targeted fills behind the handler surface that previously required integration tests to exercise. CI workflow ----------- .github/workflows/ci.yml complements the existing golangci-lint workflow with three parallel jobs: - test: go test -race with coverage artifact upload. - vuln: go install + govulncheck ./... . - build: go build -v ./... . All three share go-version-file: go.mod so a toolchain bump in go.mod automatically propagates to CI. Renovate -------- renovate.json configures: - config:best-practices baseline + weekly schedule in Europe/Paris. - Security/CVE alerts auto-merge squash. - Go patch updates grouped and auto-merged; minor/major held for review. - AWS SDK v2 modules grouped in a single PR. - GitHub Actions and container images grouped; actions auto-merge on minor/patch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run go work vendor to materialise the module graph under vendor/. Tier C requires a vendored tree so CI builds are reproducible without network access to the Go module proxy, and so supply-chain review (govulncheck -mod=vendor, SBOM via syft/grype) can be run against an exact frozen copy of the build inputs. vendor/ is intentionally not in .gitignore and is committed alongside the code that depends on it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace sirupsen/logrus with the standard library log/slog across the codebase. Loggers now emit JSON to stdout with a service=s3proxy attribute and a shared LevelVar so -level still flips between debug/info/warn/error. Per-request loggers bind request_id via log.With, keeping the field on every emitted record without threading it through call sites. Drops the logrus direct dependency; go work vendor re-sync removes the vendored tree. Existing unit tests continue to pass with a discard slog.Handler.
Add an internal/monitoring package that owns a dedicated Registry and publishes: - http_requests_total / http_request_duration_seconds for every served request (method, normalised path, status) - errors_total keyed by a validation class - service_crashes_total driven by a panic-recover wrapper in the HTTP middleware - s3proxy_encrypt_duration_seconds / s3proxy_decrypt_duration_seconds histograms around the crypto hot paths - s3proxy_upstream_errors_total around S3 GetObject/PutObject/forward calls - s3proxy_throttled_total when the throttler rejects a request Metrics is threaded through Router, handlers and the throttling middleware as a nullable pointer so existing tests (which build routers directly) are unaffected. The proxy HTTP server now registers /metrics alongside the S3 routes; health probes and metrics scrapes bypass the throttler to stay observable under load.
Wire OpenTelemetry into the proxy so requests can be correlated across the ingress and the upstream S3 call: - internal/tracing sets up a global TracerProvider backed by the OTLP/HTTP exporter and a composite TraceContext + Baggage propagator. When OTEL_EXPORTER_OTLP_ENDPOINT (or the traces-specific variant) is unset the setup is a no-op, so unconfigured environments stay cheap. - The proxy's top-level HTTP handler is wrapped in otelhttp.NewHandler, which extracts the inbound W3C TraceContext, opens a server span, and propagates the span through r.Context() to downstream handlers. - forwardHTTPClient now uses otelhttp.NewTransport so signed requests to S3 automatically inject trace headers and produce a client span under the inbound request. - tracing.SpanContextHandler wraps the slog JSON handler and injects trace_id/span_id whenever a log record is emitted with a context carrying an active span. Adds unit tests for the handler wrapper's behaviour with and without an active span.
Add monitoring/ assets so deployments can consume the new metrics without each environment re-authoring them: - monitoring/alerts/s3proxy.yaml defines four Prometheus alerts: S3ProxyHighErrorRate (5xx share over 5%), S3ProxyHighLatency (p95 above 2s), S3ProxyServiceDown (scrape target missing) and S3ProxyHighCrashRate (handler panics recovered in the last 5m). - monitoring/dashboards/s3proxy.json defines a Grafana dashboard covering RPS per operation, 5xx error ratio, HTTP p50/p95/p99 latency, crash rate, encrypt/decrypt p95 duration and throttling rejection rate. Panels filter on a per-instance template variable.
Add s3proxy/internal/e2e/proxy_e2e_test.go guarded by //go:build e2e.
The test spins up an ephemeral MinIO container via testcontainers-go,
creates a bucket with a direct SDK client, then wires a real Router +
/metrics stack behind an httptest.Server and exercises it from an AWS
SDK client that points at the proxy:
- PutObject round-trip: plaintext flows through the proxy and the
stored object inspected directly on MinIO is opaque ciphertext with
the DEK metadata tag set.
- GetObject round-trip: the proxy decrypts and returns the original
plaintext byte-for-byte.
- /healthz, /readyz and /metrics stay reachable while traffic flows.
Supporting changes:
- Add an S3PROXY_INSECURE toggle (Config.Insecure / GetInsecure) so the
upstream client and request-repackaging use http:// instead of https://
when talking to plaintext MinIO. Default remains https.
- testcontainers-go modules/minio and testcontainers-go are added as
e2e-only dependencies; the main binary does not import them.
The test is explicitly not vendored for the default module graph, so it
must be run with:
GOWORK=off GOFLAGS=-mod=mod go test -tags=e2e ./s3proxy/internal/e2e/...
When Docker is unavailable the test skips cleanly.
Emit a WARN-level slog record at boot when the insecure upstream scheme toggle is active, making it clear the flag is intended for local/test/demo use and must not be enabled in production.
Strengthen the MinIO e2e test so the encrypted-at-rest claim is visible from -v output and enforced by assertions: - Log plaintext, raw ciphertext (hex), DEK metadata tag and KEK version tag after reading the object directly from MinIO, so `go test -v` shows the opaque bytes that land on disk. - Assert the plaintext does not appear verbatim anywhere in the stored payload. - Assert the stored payload is strictly longer than the plaintext to reflect the AES-GCM nonce + authentication tag overhead. - Assert both the DEK and KEK-version metadata tags are populated.
… bodies Switch the proxy's upstream S3 client to aws.RequestChecksumCalculationWhenSupported / ResponseChecksumValidationWhenSupported so every PutObject persists a CRC32 on MinIO/S3 and every GetObject can validate the payload end-to-end. This aligns with the AWS SDK v2 post-2025 defaults and silences "Response has no supported checksum. Not validating response payload." log lines in tooling that scrapes s3proxy traffic. The proxy, however, transforms bodies: PutObject uploads ciphertext and GetObject returns decrypted plaintext, so the checksum the upstream computes (over the ciphertext at rest) does not match what the downstream client sees. Forwarding x-amz-checksum-* in PutObject/GetObject responses would therefore cause the client SDK to reject the body. Remove the forwarded x-amz-checksum-crc32/crc32c/sha1/sha256 headers in both setGetObjectHeaders and setPutObjectHeaders and explain the reason in place. Also bump the e2e MinIO image to RELEASE.2025-09-07T16-13-09Z (the 2025-10 tag does not exist on Docker Hub) and disable response checksum validation on the test-only SDK clients used to inspect at-rest ciphertext and to drive the proxy, since neither payload carries a usable checksum by design.
The module requires Go 1.26 since commit c0e45f5, but the Dockerfile was still pinned to golang:1.25.3, which caused the ghcr build-push-action step to fail with: go: go.mod requires go >= 1.26 (running go 1.25.3; GOTOOLCHAIN=local) Pin the build stage to golang:1.26.2 (latest 1.26 patch at time of writing) so the container image builds again.
Wire the new observability surface into the s3proxy Helm chart so clusters that install via the chart can opt into metrics, tracing and alerting without bespoke plumbing. - deployment.yaml now forwards OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_TRACES_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS from .Values.config, and accepts a free-form .Values.extraEnv list. - templates/servicemonitor.yaml ships an opt-in ServiceMonitor pointed at the existing service/port with a configurable path, scheme, tls config and relabeling rules. - templates/prometheusrule.yaml ships the four s3proxy alerts (HighErrorRate, HighLatency, ServiceDown, HighCrashRate) behind a single toggle, with per-alert thresholds and "for" durations exposed in values.yaml. - templates/grafana-dashboard.yaml packages dashboards/s3proxy.json as a ConfigMap carrying the grafana_dashboard sidecar label when .Values.grafanaDashboard.enabled is true. - charts/s3proxy/dashboards/s3proxy.json mirrors the root monitoring/dashboards/s3proxy.json so the ConfigMap can Files.Get it. All three observability templates default to disabled. Bumped chart + app version to 1.8.0.
…kill Adopt the new template that codifies the agent response language policy (English by default, French only on explicit request, Caveman preference) alongside the existing English-only rule for code, comments, and docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rrent iagen-dev rules Add missing Tier C baseline sections (API Design REST with s3proxy S3-protocol exception, Authentication via Keycloak, Product Validation opt-out for Go backend), refresh Workflow Skills to include the session-start gate, and apply caveman compression to template-derived sections. Project-specific content (Logging, Metrics, Grafana, Distributed Tracing, Secrets, Local Dev) preserved as-is. Also drops the invalid `govulncheck -mod=vendor` flag in favor of `GOFLAGS=-mod=vendor`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-GB GET/PUT held plaintext and ciphertext simultaneously and left the process RSS near peak long after each request, which tripped OOM kills on hosts with tight memory budgets. Drop the slice references between phases (post-decrypt, post-write, post-encrypt, post-upload) and call runtime/debug.FreeOSMemory past a 100 MiB threshold so freed pages actually return to the OS. Also stop forwarding an empty x-amz-server-side-encryption header on zero-byte PutObject responses; some upstreams (e.g. Hetzner Object Storage) return an empty algorithm there and strict SDK clients choke on the empty value. Refs #28
Integrates v1.7.2 (a8fc821, 58a92b9) with the modernize branch. Carried-forward changes from main: - Opt-in decryption fallback for objects encrypted without a KEK (S3PROXY_DECRYPTION_FALLBACK). Ported onto the new KEKProvider architecture: when primary decrypt fails and the flag is on, the proxy retries with an all-zero [32]byte KEK and logs the fallback. - Helm: deploymentStrategy and extraEnv knobs, README documentation, and a router_test for the fallback path (now exercised against the KEKProvider). Conflicts resolved in favour of the modernize branch where it had already replaced the old logger / [32]byte KEK / crypto package with slog, cryptoutil.KEKProvider, and Config-struct injection; the v1.7.2 chart bump (1.7.2) is dropped in favour of the modernize chart version 1.8.0 already on this branch.
values.yaml shipped --level=-4 (Debug) which left every deployment of the chart running with verbose logging. Switch the default to --level=0 (Info) to match the binary's documented default and avoid leaking request detail into cluster logs. Also remove the duplicate extraEnv key (second one silently overrode the first) and expand the inline documentation so operators can discover the observability + S3PROXY_* surface without grepping the source: - args: full log-level table (-1..2) + example for debug - config: map each key to its S3PROXY_* env var, note OTLP precedence - extraEnv: list opt-in S3PROXY_* knobs (INSECURE, PUTBODY_MAX, DECRYPTION_FALLBACK, DEKTAG_NAME, DEKTAG_KEKVER) with caveats - serviceMonitor / prometheusRule: note kube-prometheus-stack selector, semantics of each threshold and "for" window - grafanaDashboard: note grafana-operator / sidecar requirement Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Chart.yaml: bump version 1.8.0 -> 1.8.1 (config-only fix from the
previous commit; appVersion stays at 1.8.0 since the binary is unchanged)
- README.md: refresh the Helm chart parameter list, add an Observability
section, and fix stale references:
* helm "ugprade" typo
* cmd/main.go logging library: logrus -> log/slog
* internal/crypto -> internal/cryptoutil (renamed)
* throttling is concurrent in-flight, not RPS
* document the S3PROXY_* env var mapping for each `config` key
* list extraEnv opt-in knobs (INSECURE, PUTBODY_MAX,
DECRYPTION_FALLBACK, DEKTAG_NAME, DEKTAG_KEKVER)
* cover serviceMonitor / prometheusRule / grafanaDashboard,
/metrics, OTLP tracing, slog log-trace correlation
* note that the readiness probe pings upstream S3
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructure the project README from a short "what is this" blurb into a reference document operators and contributors can use without reading the source. The previous version was accurate as far as it went, but missed the entire observability surface, the security caveats, the env var matrix, and the operational knobs that have landed since the fork. New top-level sections, with anchored TOC: - Quickstart: minimal Docker + Helm one-liners - Configuration: full env var table (required/default/description) and the CLI flag list - Operations: metric names + labels straight from internal/monitoring, the four shipped alerts with real PromQL and default thresholds, the OTLP tracing surface (including W3C TraceContext propagation and the slog log-trace correlation), logs, /healthz vs /readyz semantics, troubleshooting recipes - Security: cipher, KWP, HKDF-SHA256, body cap, multipart-default-block, S3PROXY_INSECURE / S3PROXY_DECRYPTION_FALLBACK caveats, honest gap on KEK rotation - Architecture: Mermaid sequence diagram for PUT/GET, package roles - Helm chart reference: full values.yaml key table and an end-to-end observability rollout example - Development: Go version, vendored deps, e2e build tag, CI workflows - Badges (CI, lint, Go version, license) No code or chart changes; documentation only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dges - CONTRIBUTING.md: drop the stale logrus/Go 1.24 hints, document the vendored toolchain, `e2e` build tag, govulncheck invocation, Conventional Commits convention, chart vs app version split, and point at AGENTS.md / CHANGELOG.md. - CHANGELOG.md: new file. Keep-a-Changelog format with separate Application (Git tag) and Chart streams. Covers v1.4.0 through v1.7.2 plus an Unreleased section for the in-flight chart 1.8.1 / docs work. - Chart.yaml: add icon (Intrinsec org avatar), home, sources, keywords and maintainers so the chart renders cleanly on Artifact Hub and clears the only remaining `helm lint` INFO. - README.md: add Docker and Helm CI badges and switch the Go-version badge to a dynamic shield reading the live go.mod. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a JSON Schema (draft-07) that Helm picks up automatically on
install / upgrade / template / lint. The schema is strict on known
keys and permissive on unknown top-level keys so existing user
overrides keep flowing through.
Notable constraints, picked to catch the bugs we have actually seen:
- args[*] uses if/then so any --level=* flag must be one of
-1 (Debug), 0 (Info), 1 (Warn), 2 (Error). The previous chart
defaulted to --level=-4, a silent out-of-range value that mapped to
Debug; the schema makes that misconfiguration fail at the helm CLI.
- service.port: 1..65535
- replicaCount / autoscaling.* / probe percentages: bounded integers
- image.pullPolicy: enum Always/IfNotPresent/Never
- cert.issuerRefKind: enum ClusterIssuer/Issuer
- prometheusRule.thresholds.highErrorRate: 0..1 (fraction, not %)
- prometheusRule.thresholds.{highLatencySeconds,crashes}: ≥0
- prometheusRule.for.* and serviceMonitor.{interval,scrapeTimeout}:
Go duration regex ^([0-9]+(ns|us|µs|ms|s|m|h))+$
- extraEnv[*]: requires `name`, and exactly one of `value` (string —
k8s rejects bool/int) or `valueFrom`
- config.otlp{Endpoint,TracesEndpoint}: format uri
Tested with helm template:
- helm template --set-string 'args[0]=--level=-4' → fails as expected
- helm template --set replicaCount=0 → fails
- helm template --set service.port=99999 → fails
- helm template --set prometheusRule.thresholds.highErrorRate=2 → fails
- helm template --set prometheusRule.for.highErrorRate=10minutes → fails
- helm template --set 'extraEnv[0].value=foo' → fails (missing name)
- helm template --set image.pullPolicy=Sometimes → fails
- helm template (defaults) → passes
- helm template --set serviceMonitor.enabled=true … → passes
Also noted in CHANGELOG.md under [Unreleased] → Chart → Added.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update AGENTS.md against iagen-dev skills v0.6.3 baseline. Added (mandatory for Tier C Go projects, were missing): - Generated Code — guardrails for generator output (mocks, sqlc, wire, oapi-codegen, *.pb.go); committed not gitignored; CI diff-check. - Documentation Coherence — post-change README/docs sync + mandatory pre-release sweep. - Changelog — Keep-a-Changelog conventions, release cut process, user-facing wording rules. Refreshed (drifted from current template): - Dependency Management — added the new "vendor/ is read-only" paragraph forbidding hand-edits under vendor/ and documenting the fork-and-replace path for upstream gaps. - Logging — removed the stale "codebase uses sirupsen/logrus, migration planned" note (slog migration landed in d261330). Preserved (project-specific, untouched): - API Design (REST) s3proxy wire-protocol exception. - Local Development MinIO/KMS specifics. - Metrics s3proxy_* business metrics. - Grafana Dashboards s3proxy operation coverage. - Distributed Tracing S3 outbound propagation. - Authentication SigV4 management-vs-S3-surface note. - Secrets Management KEK/AWS/TLS specifics. - Product Validation Workflow opt-out (kept disabled, mode = maintenance). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps Go toolchain from 1.26 to 1.26.3 across go.mod, go.work, and
Dockerfile builder image. Refreshes AWS SDK v2 (s3 v1.97.3 -> v1.101.0,
config, credentials, sts, smithy-go v1.25.1), tink-go v2.6.0, prometheus
common/procfs, grpc v1.81.1, grpc-gateway v2.29.0, otel keeps v1.43.0,
golang.org/x/{crypto,net,sys,text}, genproto split modules, and others.
Migrates github.com/knadh/koanf v1.5.0 to v2.3.4 to drop the etcd/v3
transitive dependency, which pinned google.golang.org/genproto to
v0.0.0-20210602131652 and caused ambiguous-import collisions against
the newer split genproto/googleapis/{api,rpc} modules. Public API of
koanf is unchanged in our usage; only the import path moves.
All gates green: build, vet, go test (82 tests / 7 packages),
golangci-lint, govulncheck.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
log/slog, add CI + Renovate + AGENTS.md/DEVELOPMENT.md./readyzupstream probe, bounded S3 timeouts, single reused S3 client, drop redundant MD5 hash pass, quick crypto-pkg rename + fixes./metrics(HTTP, errors, crashes, encrypt/decrypt, upstream errors, throttling), OpenTelemetry tracing (OTLP/HTTP, W3C propagation, trace_id in slog records), ready-to-use Grafana dashboard + alert rules undermonitoring/.S3PROXY_INSECUREtoggle (with a loud startup warn) and a//go:build e2eMinIO roundtrip test that proves at-rest encryption.x-amz-checksum-*from upstream responses since the proxy transforms bodies; flip the upstream SDK to SDK-v2 defaults (WhenSupportedchecksum modes).Scope
16 commits on
chore/modernize-and-review:Test plan
golangci-lint run ./...— 0 issuesgo test -race -cover ./...— all packages passgovulncheck ./...— no vulnerabilitiesGOWORK=off GOFLAGS=-mod=mod go test -tags=e2e -count=1 -v -run TestProxyRoundtripAgainstMinio ./s3proxy/internal/e2e/...— passes end-to-end against a MinIO container; ciphertext at rest verifiedS3PROXY_INSECUREtoggle andmonitoring/assets before merge; deploy the alert rules + dashboard in the target environmentOperator notes
OTEL_EXPORTER_OTLP_ENDPOINT(or_TRACES_ENDPOINT) enables trace export; unset = noop./metricsis served on the same listener as the S3 routes and bypasses the throttling middleware along with/healthz//readyz.S3PROXY_INSECURE=1switches upstream S3 traffic to plaintext HTTP and is intended for local/demo/e2e only — logged loudly at startup.🤖 Generated with Claude Code