Skip to content

Add IO stats and ext4 FS stats through new ext4 collector#3577

Open
john-morales wants to merge 10 commits intoprometheus:masterfrom
john-morales:issue_3005
Open

Add IO stats and ext4 FS stats through new ext4 collector#3577
john-morales wants to merge 10 commits intoprometheus:masterfrom
john-morales:issue_3005

Conversation

@john-morales
Copy link

@john-morales john-morales commented Mar 8, 2026

This PR attempts to revive Issue #3005 / Pull Request #3295 via implementing support for ext4 and fs error statistics through use of the procfs library.

Work here builds on the progress from #3295. I'm unclear myself if/how to properly attribute @mshahzeb's efforts, but happy to do so.

Re-iterating the original goals, this PR implements support for:

# HELP node_disk_iodone_total Number of completed or rejected IO commands.
# TYPE node_disk_iodone_total counter
node_disk_iodone_total{device="sda"} 775
node_disk_iodone_total{device="sr0"} 1.29433517e+08
# HELP node_disk_ioerr_total Number of IO commands that completed with an error.
# TYPE node_disk_ioerr_total counter
node_disk_ioerr_total{device="sda"} 11
node_disk_ioerr_total{device="sr0"} 41

...from:

/sys/block/<disk>/device/ioerr_cnt: number of SCSI commands that completed with an error
/sys/block/<disk>/device/iodone_cnt: number of completed or rejected SCSI commands

And implements support for:

# HELP node_ext4_errors_total Number of ext4 filesystem errors.
# TYPE node_ext4_errors_total counter
node_ext4_errors_total{partition="nvme0n1p5"} 0
# HELP node_ext4_messages_total Number of ext4 filesystem log messages.
# TYPE node_ext4_messages_total counter
node_ext4_messages_total{partition="nvme0n1p5"} 2
# HELP node_ext4_warnings_total Number of ext4 filesystem warnings.
# TYPE node_ext4_warnings_total counter
node_ext4_warnings_total{partition="nvme0n1p5"} 0

...from:

/sys/fs/ext4/<partition>/errors_count: number of ext4 errors
/sys/fs/ext4/<partition>/warning_count: number of ext4 warning log messages
/sys/fs/ext4/<partition>/msg_count: number of other ext4 log messages

The new ext4 collector is enabled by default. It also has support for (mutually exclusive) partition-exclude and partition-include regexp flags, which seemed to be a standard practice just from my looking around at other disk-related collectors. I also elected to make the default partition-exclude equal to ^features$, to intentionally avoid the built-in "features" directory at /sys/fs/ext4/features. It is not a real partition, and does not contain the 3 errors_count, warning_count, msg_count metrics the collector emits.

Corresponding procfs change that enabled this work: prometheus/procfs#651

Thank you for your consideration!

Comment on lines +773 to +781
# HELP node_ext4_errors_total Number of ext4 filesystem errors.
# TYPE node_ext4_errors_total counter
node_ext4_errors_total{device="sdb1"} 12
# HELP node_ext4_messages_total Number of ext4 filesystem log messages.
# TYPE node_ext4_messages_total counter
node_ext4_messages_total{device="sdb1"} 567
# HELP node_ext4_warnings_total Number of ext4 filesystem warnings.
# TYPE node_ext4_warnings_total counter
node_ext4_warnings_total{device="sdb1"} 34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that sda is a device in node_disk_ioerr_total, perhaps sdb1 in node_ext4_warnings_total should be a partition?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - will do.

@john-morales
Copy link
Author

Updated ext4 metrics label from device to partition. For consistency, also updated the corresponding command line flags from device-exclude / device-include to partition-exclude / partition-include. (Edited PR description above as well.)

@john-morales john-morales requested a review from anarcat March 9, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants