You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fluent Bit 5.0.3 (cmetrics 2.1.2) segfaults inside the opentelemetry input plugin when receiving OTLP/HTTP histogram payloads from Quarkus services that use Micrometer's OTLP exporter. The crash is a NULL dereference in compute_metric_hash (lib/cmetrics/src/cmt_decode_opentelemetry.c). The function dereferences map->opts->fqname and label_value->name without NULL checks, while its peer function get_or_create_metric_metadata_context — called immediately after, from the same caller — does have the proper guards. The inconsistency makes the trigger crystal clear once you read both functions.
Each histogram publish kills the worker, the pod restarts, the buffered chunk on disk replays the same payload, and fluent-bit re-crashes on startup (CrashLoopBackOff) until the chunk hits its retry limit and is dropped.
To Reproduce
Identical stack trace on every crash:
[engine] caught signal (SIGSEGV)
#0 get_or_create_data_point_metadata_context() at lib/cmetrics/src/cmt_decode_opentelemetry.c:331
#1 decode_histogram_data_point() at lib/cmetrics/src/cmt_decode_opentelemetry.c:1021
#2 decode_histogram_data_point_list() at lib/cmetrics/src/cmt_decode_opentelemetry.c:1061
#3 decode_histogram_entry() at lib/cmetrics/src/cmt_decode_opentelemetry.c:1158
#4 decode_metrics_entry() at lib/cmetrics/src/cmt_decode_opentelemetry.c:1545
#5 decode_scope_metrics_entry() at lib/cmetrics/src/cmt_decode_opentelemetry.c:1763
#6 decode_resource_metrics_entry() at lib/cmetrics/src/cmt_decode_opentelemetry.c:1871
#7 decode_service_request() at lib/cmetrics/src/cmt_decode_opentelemetry.c:1931
#8 cmt_decode_opentelemetry_create() at lib/cmetrics/src/cmt_decode_opentelemetry.c:1954
#9 process_payload_metrics_ng() at plugins/in_opentelemetry/opentelemetry_prot.c:526
#10 opentelemetry_prot_handle_ng() at plugins/in_opentelemetry/opentelemetry_prot.c:1019
#11 flb_http_server_client_activity_event_handler() at src/http_server/flb_http_server.c:365
#12 flb_engine_start() at src/flb_engine.c:1267
#13 flb_lib_worker() at src/flb_lib.c:909
Steps to reproduce:
Run a Quarkus 3.32.x service. Either of these emitter setups reproduces:
io.quarkus:quarkus-micrometer-opentelemetry (the core bridge), or
io.quarkiverse.micrometer.registry:quarkus-micrometer-registry-otlp:3.5.0 (the native Micrometer OTLP registry — bypasses the Quarkus OTel SDK entirely).
Configure metric publication to fluent-bit OTLP HTTP every 10s, e.g.:
[INPUT]
Name opentelemetry
Listen 0.0.0.0
Port 4318
Tag otlp.app
Within ~10 s of app boot, the first histogram-bearing publish (default Micrometer binders emit jvm.gc.pause, http.server.connections.duration, etc., as Timer → OTLP Histogram) reaches the OTLP input → SIGSEGV.
The crash reproduces with both emitters, so the root cause is downstream of Quarkus, in cmetrics.
Root cause analysis
decode_data_point_labels falls into its else branch when an OTLP attribute's AnyValue.value_case is unrecognised (e.g. NOT_SET = 0):
This stores a label with name == NULL in sample->labels. compute_metric_hash then calls cfl_sds_len(label_value->name) without a NULL guard — cfl_sds_len(NULL) calls CFL_SDS_HEADER(NULL)->len, dereferencing (struct cfl_sds *)(NULL - 16) — segfault. The peer function get_or_create_metric_metadata_context, called two lines later from the same caller, already guards against map->opts->fqname == NULL; compute_metric_hash does not.
Expected behavior
OTLP histogram payloads from Java/Quarkus clients should be ingested without crashing the worker. Malformed or unknown attribute value cases should not produce a NULL-named label that subsequently segfaults the hash function.
Your Environment
Version used: Fluent Bit 5.0.3 (Helm chart fluent/fluent-bit-0.57.3, app version 5.0.3)
cmetrics: 2.1.2 (current master also affected)
Configuration: opentelemetry input on :4318, prometheus_remote_write output to VictoriaMetrics
Environment: Kubernetes (k3s)
Operating System: Linux
Client: Quarkus 3.32.4 with quarkus-micrometer-registry-otlp 3.5.0 (also reproduced with quarkus-micrometer-opentelemetry core bridge)
Additional context
The crash address 0x...028 points to compute_metric_hash; the symbol the unwinder reports (get_or_create_data_point_metadata_context) is the calling function because debug info maps to its entry line.
Possibly related but distinct: Quarkus issue #51741 (histogram buckets miscounted in the quarkus-micrometer-opentelemetry bridge). Our crash reproduces with both Quarkus paths, so the cmetrics NULL deref is independent of #51741.
Bug Report
Describe the bug
Fluent Bit 5.0.3 (cmetrics 2.1.2) segfaults inside the
opentelemetryinput plugin when receiving OTLP/HTTP histogram payloads from Quarkus services that use Micrometer's OTLP exporter. The crash is a NULL dereference incompute_metric_hash(lib/cmetrics/src/cmt_decode_opentelemetry.c). The function dereferencesmap->opts->fqnameandlabel_value->namewithout NULL checks, while its peer functionget_or_create_metric_metadata_context— called immediately after, from the same caller — does have the proper guards. The inconsistency makes the trigger crystal clear once you read both functions.Each histogram publish kills the worker, the pod restarts, the buffered chunk on disk replays the same payload, and fluent-bit re-crashes on startup (CrashLoopBackOff) until the chunk hits its retry limit and is dropped.
To Reproduce
Identical stack trace on every crash:
Steps to reproduce:
Run a Quarkus 3.32.x service. Either of these emitter setups reproduces:
io.quarkus:quarkus-micrometer-opentelemetry(the core bridge), orio.quarkiverse.micrometer.registry:quarkus-micrometer-registry-otlp:3.5.0(the native Micrometer OTLP registry — bypasses the Quarkus OTel SDK entirely).Configure metric publication to fluent-bit OTLP HTTP every 10s, e.g.:
Fluent-bit
opentelemetryinput on:4318:[INPUT] Name opentelemetry Listen 0.0.0.0 Port 4318 Tag otlp.appWithin ~10 s of app boot, the first histogram-bearing publish (default Micrometer binders emit
jvm.gc.pause,http.server.connections.duration, etc., asTimer→ OTLPHistogram) reaches the OTLP input → SIGSEGV.The crash reproduces with both emitters, so the root cause is downstream of Quarkus, in cmetrics.
Root cause analysis
decode_data_point_labelsfalls into itselsebranch when an OTLP attribute'sAnyValue.value_caseis unrecognised (e.g.NOT_SET = 0):This stores a label with
name == NULLinsample->labels.compute_metric_hashthen callscfl_sds_len(label_value->name)without a NULL guard —cfl_sds_len(NULL)callsCFL_SDS_HEADER(NULL)->len, dereferencing(struct cfl_sds *)(NULL - 16)— segfault. The peer functionget_or_create_metric_metadata_context, called two lines later from the same caller, already guards againstmap->opts->fqname == NULL;compute_metric_hashdoes not.Expected behavior
OTLP histogram payloads from Java/Quarkus clients should be ingested without crashing the worker. Malformed or unknown attribute value cases should not produce a NULL-named label that subsequently segfaults the hash function.
Your Environment
fluent/fluent-bit-0.57.3, app version5.0.3)opentelemetryinput on:4318,prometheus_remote_writeoutput to VictoriaMetricsquarkus-micrometer-registry-otlp3.5.0 (also reproduced withquarkus-micrometer-opentelemetrycore bridge)Additional context
0x...028points tocompute_metric_hash; the symbol the unwinder reports (get_or_create_data_point_metadata_context) is the calling function because debug info maps to its entry line.decode_data_point_labelsstores""instead of NULL for unrecognisedAnyValue.value_case;compute_metric_hashguardsfqnameandlabel->namefor defence in depth. Regression test included.quarkus-micrometer-opentelemetrybridge). Our crash reproduces with both Quarkus paths, so the cmetrics NULL deref is independent of #51741.