Add HTTP parser memory regression tests#12871
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #12871 +/- ##
=======================================
Coverage 98.95% 98.95%
=======================================
Files 131 131
Lines 47719 47743 +24
Branches 2475 2475
=======================================
+ Hits 47219 47243 +24
Misses 376 376
Partials 124 124
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. |
Signed-off-by: Vinod Kumar <codingkiddo@gmail.com>
bedeaec to
6996a9e
Compare
Merging this PR will not alter performance
Comparing Footnotes
|
|
|
||
|
|
||
| def _get_peak_memory_used(func: Callable[[], None]) -> int: | ||
| tracemalloc = pytest.importorskip("tracemalloc") |
There was a problem hiding this comment.
In what situation is this unavailable? I can't see any notice in the docs suggesting it's not always available..
| try: | ||
| func() | ||
| _, peak = tracemalloc.get_traced_memory() | ||
| return peak | ||
| finally: | ||
| tracemalloc.stop() |
There was a problem hiding this comment.
Should probably be:
| try: | |
| func() | |
| _, peak = tracemalloc.get_traced_memory() | |
| return peak | |
| finally: | |
| tracemalloc.stop() | |
| try: | |
| func() | |
| else: | |
| _, peak = tracemalloc.get_traced_memory() | |
| finally: | |
| tracemalloc.stop() | |
| return peak |
| text = b"GET /" + (b"-" * 1_000_000) + b" HTTP/1.1\r\nHost: a\r\n\r\n" | ||
|
|
||
| def parse() -> None: | ||
| with pytest.raises(http_exceptions.LineTooLong): | ||
| parser.feed_data(text) | ||
|
|
||
| peak = _get_peak_memory_used(parse) | ||
|
|
||
| assert peak < 5 * 1024 * 1024 |
There was a problem hiding this comment.
I'm just thinking through this test with you here, but I don't this would produce our regression.
I think what we want to do here is something more like:
for i in range(10):
parser.feed_data(text[chunk_len*i : chun_len*(i+1)])
So we are allocating memory for each new chunk, but the parser should avoid allocating memory for the entire line (i.e. it should be discarding the data or raising an exception as it processes these chunks).
Also, the test currently asserts the peak is <5MiB, but the payload is closer to 1MiB? Maybe we need a larger payload to test this properly?
There was a problem hiding this comment.
If at all possible, it would great if you could validate the tests against an older release (e.g. 3.12, or 3.10 if you can get it running) where I expect most of them to fail.
| response.feed_eof() # Should complete without exception | ||
|
|
||
|
|
||
| def _get_peak_memory_used(func: Callable[[], None]) -> int: |
There was a problem hiding this comment.
Given the number of tests we want to look at, probably best to define this as a fixture in conftest.py.
It could even return a context manager possibly? Something that can be used like:
def test_foo(memory_recorder):
with memory_recorder:
...
assert memory_recorder.peak < foo
Just an idea..
What do these changes do?
This adds HTTP parser memory regression coverage for oversized inputs.
The new tests cover a small first slice of #12736:
The tests measure peak memory usage while feeding already-created input into the parser, so the large test input allocation itself is not included in the measured peak.
Are there changes in behavior for the user?
No. This is a test-only change and does not change runtime behavior or public APIs.
Is it a substantial burden for the maintainers to support this?
No. The change only adds focused regression tests for existing HTTP parser behavior.
The memory threshold is intentionally conservative to reduce the chance of CI flakiness across platforms and Python versions. The tests are also limited to a small HTTP parser scope, leaving the broader multipart and chunked-parser cases from #12736 for follow-up PRs.
Related issue number
Part of #12736
Checklist
CONTRIBUTORS.txtCHANGES/folder