-
Notifications
You must be signed in to change notification settings - Fork 164
fix: Return empty GPFS performance metric rather than raising exception #8201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug where GPFS performance metrics would cause an unhandled exception when unavailable. Instead of raising GPFSNoMetricError (which was not properly handled as an HTTP exception), the code now returns a zero-valued FSPerfMetric when metrics cannot be retrieved.
Changes:
- Return empty performance metrics (all zeros) instead of raising
GPFSNoMetricErrorwhen GPFS metrics are unavailable - Remove unused import of
GPFSNoMetricErrorfrom the GPFS volume implementation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return FSPerfMetric( | ||
| iops_read=0, | ||
| iops_write=0, | ||
| io_bytes_read=0, | ||
| io_bytes_write=0, | ||
| io_usec_read=0, | ||
| io_usec_write=0, | ||
| ) |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using -1 instead of 0 for unavailable metrics to be consistent with the VAST volume implementation. The VAST implementation uses -1 values to explicitly indicate that metrics are unavailable, which is more semantically correct than 0 (which could be confused with actual zero activity). This would make the error handling pattern more consistent across different storage backend implementations.
| metrics = await self.api_client.get_metric(query) | ||
| latest_metric = metrics["performanceData"]["rows"][-1]["values"] | ||
| except (KeyError, IndexError): | ||
| raise GPFSNoMetricError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPFSNoMetricError is not used anywhere, so we can remove it
|
Is there a reason for changing the behavior that previously raised an exception? |
1ff4fc3 to
e14010f
Compare
HyeockJinKim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like return an empty value instead of raising an error might be riskier.
What's the reason for this work?
resolves #NNN (BA-MMM)
Summary
Problem
The GPFSNoMetricError exception was raised when GPFS API returned no performance metric data (due to KeyError or IndexError), but this exception was not caught or handled anywhere in the codebase. This caused unexpected crashes when metrics were temporarily unavailable.
Solution
Instead of raising an unhandled exception, return a valid FSPerfMetric object with all values set to zero. This allows the system to gracefully handle cases where metric data is unavailable without disrupting normal operations.
Checklist: (if applicable)
ai.backend.testdocsdirectory