Add default limit for fetching failed rows in DbSample#1991
Add default limit for fetching failed rows in DbSample#1991NathanBick wants to merge 2 commits intosodadata:v3from
Conversation
|
BickieSmalls seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
for more information, see https://pre-commit.ci
|
|
Hi, thanks for the contribution! Without taking the configured limit into consideration we cannot merge this as-is unfortunately. Yes indeed the comment acknowledges a potential issue if too many rows are fetched, but the way this works is that the failed rows samples queries have a built in LIMIT clause in case samples limit configuration is present, with 100 being a default. That being said:
|
m1n0
left a comment
There was a problem hiding this comment.
Just applying the limit here would break the samples limit configuration, please see my comment about how I suggest we proceed here

Github Issue: #1985
Currently the
failed rowscheck sample fetches all rows in dataset that fails. This fetches arbitrarily large data into memory, causing bugs. A comment in the code currently acknowledges this risk. The check does not respect either the default limit or the specified limit in a check yaml.In PR proposes to resolve this bug by always respecting the default limit of 100. It is a very simple and contained PR. However, it does not resolve the issue of not respecting a user-specified samples limit. Nonetheless, this is an improvement. It makes the situation better and no worse.