engine: shutdown hangs at 100% CPU when duplicate STOP signals arrive

## Bug Report

**Describe the bug**
While testing the `in_tail` input plugin with `exit_on_eof on`, I sent a `kill` (SIGTERM) to terminate the Fluent Bit process during the test.
The process did not exit: the pipeline thread pinned one CPU core at 100% and no further log lines were emitted.
The grace period elapsed but the service never reached the "service has stopped" state. 

**To Reproduce**
- Rubular link if applicable:
- Example log message if applicable:
```
[2026/04/21 15:10:43.361] [ warn] [engine] service will shutdown in max 600 seconds
[2026/04/21 15:10:43.361] [ info] [engine] pausing all inputs..
...
[2026/04/21 15:10:44.065] [ warn] [engine] service will shutdown in max 600 seconds
[2026/04/21 15:10:44.065] [ info] [engine] pausing all inputs..
[2026/04/21 15:10:44.065] [ info] [input] pausing storage_backlog.1
```
- Steps to reproduce the problem:

1. Configure `in_tail` with an exit_on_eof path. 
2. When the input reaches its termination condition it calls `flb_engine_exit()` internally (first STOP, exit_on_eof). 
3. Within a short window, send an external `SIGTERM` (second STOP via `flb_stop()` → `flb_engine_exit()`). 
4. The engine busy-loops and never exits within the grace period.

**Expected behavior**
Any number of STOP signals should be idempotent. The engine should complete shutdown within the configured grace period and exit cleanly (exit code from `exit_status_code`).

**Screenshots**


**Your Environment**

* Version used: 5.0.3
* Configuration: 
```
[SERVICE]
    flush 1
    grace 60
    log_level info
    log_file /tmp/testing/logs/testing.log
    parsers_file /tmp/testing/parsers.conf
    plugins_file /tmp/testing/plugins.conf
    http_server on
    http_listen 0.0.0.0
    http_port 22002

    storage.path /tmp/testing/storage
    storage.metrics on
    storage.max_chunks_up 512
    storage.sync full
    storage.checksum off
    storage.backlog.mem_limit 100M

[INPUT]
    Name tail
    Path /tmp/testing.input
    Exclude_Path *.gz,*.zip
    Tag testing
    Key message
    Offset_Key   log_offset

    Read_from_Head true
    Refresh_Interval 3
    Rotate_Wait 31557600

    Buffer_Chunk_Size 1MB
    Buffer_Max_Size 16MB
    Inotify_Watcher false

    storage.type filesystem
    storage.pause_on_chunks_overlimit true

    DB /tmp/testing/storage/testing.db
    DB.sync normal
    DB.locking false

    exit_on_eof on

    Alias input_log

[OUTPUT]
    Name file
    Match *
    File /tmp/testing.out
```
* Environment name and version: bare-metal
* Server type and version: x86_64
* Operating System and version: RHEL 8.10 (kernel 4.18) and Ubuntu 22.04 (kernel 6.8) — both affected 
* Filters and plugins: `in_tail` also reproducible with minimal `in_lib` + `out_null` 

**Additional context**
From my analysis, the issue appears to be triggered by the second STOP re-entering the handler block in `flb_engine_start()`, which resets `config->event_shutdown->status` to `MK_EVENT_NONE` even though the shutdown timerfd is still registered in the kernel's epoll set.
The dispatcher would then likely drop the timer event (via the `status != MK_EVENT_NONE` guard in `flb_event_load_bucket_queue()`), while the level-triggered timerfd keeps reporting EPOLLIN — which would explain the infinite busy-loop in the pipeline thread and why `grace_count` never advances, leaving `flb_engine_shutdown()` unreachable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

engine: shutdown hangs at 100% CPU when duplicate STOP signals arrive #11744

Bug Report

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

engine: shutdown hangs at 100% CPU when duplicate STOP signals arrive #11744

Description

Bug Report

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions