Bug Report
Describe the bug
While testing the in_tail input plugin with exit_on_eof on, I sent a kill (SIGTERM) to terminate the Fluent Bit process during the test.
The process did not exit: the pipeline thread pinned one CPU core at 100% and no further log lines were emitted.
The grace period elapsed but the service never reached the "service has stopped" state.
To Reproduce
- Rubular link if applicable:
- Example log message if applicable:
[2026/04/21 15:10:43.361] [ warn] [engine] service will shutdown in max 600 seconds
[2026/04/21 15:10:43.361] [ info] [engine] pausing all inputs..
...
[2026/04/21 15:10:44.065] [ warn] [engine] service will shutdown in max 600 seconds
[2026/04/21 15:10:44.065] [ info] [engine] pausing all inputs..
[2026/04/21 15:10:44.065] [ info] [input] pausing storage_backlog.1
- Steps to reproduce the problem:
- Configure
in_tail with an exit_on_eof path.
- When the input reaches its termination condition it calls
flb_engine_exit() internally (first STOP, exit_on_eof).
- Within a short window, send an external
SIGTERM (second STOP via flb_stop() → flb_engine_exit()).
- The engine busy-loops and never exits within the grace period.
Expected behavior
Any number of STOP signals should be idempotent. The engine should complete shutdown within the configured grace period and exit cleanly (exit code from exit_status_code).
Screenshots
Your Environment
- Version used: 5.0.3
- Configuration:
[SERVICE]
flush 1
grace 60
log_level info
log_file /tmp/testing/logs/testing.log
parsers_file /tmp/testing/parsers.conf
plugins_file /tmp/testing/plugins.conf
http_server on
http_listen 0.0.0.0
http_port 22002
storage.path /tmp/testing/storage
storage.metrics on
storage.max_chunks_up 512
storage.sync full
storage.checksum off
storage.backlog.mem_limit 100M
[INPUT]
Name tail
Path /tmp/testing.input
Exclude_Path *.gz,*.zip
Tag testing
Key message
Offset_Key log_offset
Read_from_Head true
Refresh_Interval 3
Rotate_Wait 31557600
Buffer_Chunk_Size 1MB
Buffer_Max_Size 16MB
Inotify_Watcher false
storage.type filesystem
storage.pause_on_chunks_overlimit true
DB /tmp/testing/storage/testing.db
DB.sync normal
DB.locking false
exit_on_eof on
Alias input_log
[OUTPUT]
Name file
Match *
File /tmp/testing.out
- Environment name and version: bare-metal
- Server type and version: x86_64
- Operating System and version: RHEL 8.10 (kernel 4.18) and Ubuntu 22.04 (kernel 6.8) — both affected
- Filters and plugins:
in_tail also reproducible with minimal in_lib + out_null
Additional context
From my analysis, the issue appears to be triggered by the second STOP re-entering the handler block in flb_engine_start(), which resets config->event_shutdown->status to MK_EVENT_NONE even though the shutdown timerfd is still registered in the kernel's epoll set.
The dispatcher would then likely drop the timer event (via the status != MK_EVENT_NONE guard in flb_event_load_bucket_queue()), while the level-triggered timerfd keeps reporting EPOLLIN — which would explain the infinite busy-loop in the pipeline thread and why grace_count never advances, leaving flb_engine_shutdown() unreachable.
Bug Report
Describe the bug
While testing the
in_tailinput plugin withexit_on_eof on, I sent akill(SIGTERM) to terminate the Fluent Bit process during the test.The process did not exit: the pipeline thread pinned one CPU core at 100% and no further log lines were emitted.
The grace period elapsed but the service never reached the "service has stopped" state.
To Reproduce
in_tailwith an exit_on_eof path.flb_engine_exit()internally (first STOP, exit_on_eof).SIGTERM(second STOP viaflb_stop()→flb_engine_exit()).Expected behavior
Any number of STOP signals should be idempotent. The engine should complete shutdown within the configured grace period and exit cleanly (exit code from
exit_status_code).Screenshots
Your Environment
in_tailalso reproducible with minimalin_lib+out_nullAdditional context
From my analysis, the issue appears to be triggered by the second STOP re-entering the handler block in
flb_engine_start(), which resetsconfig->event_shutdown->statustoMK_EVENT_NONEeven though the shutdown timerfd is still registered in the kernel's epoll set.The dispatcher would then likely drop the timer event (via the
status != MK_EVENT_NONEguard inflb_event_load_bucket_queue()), while the level-triggered timerfd keeps reporting EPOLLIN — which would explain the infinite busy-loop in the pipeline thread and whygrace_countnever advances, leavingflb_engine_shutdown()unreachable.