Add default limit for fetching failed rows in DbSample by NathanBick · Pull Request #1991 · sodadata/soda-core

NathanBick · 2024-01-18T15:08:43Z

Github Issue: #1985

Currently the failed rows check sample fetches all rows in dataset that fails. This fetches arbitrarily large data into memory, causing bugs. A comment in the code currently acknowledges this risk. The check does not respect either the default limit or the specified limit in a check yaml.

In PR proposes to resolve this bug by always respecting the default limit of 100. It is a very simple and contained PR. However, it does not resolve the issue of not respecting a user-specified samples limit. Nonetheless, this is an improvement. It makes the situation better and no worse.

CLAassistant · 2024-01-18T15:08:52Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

BickieSmalls seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

for more information, see https://pre-commit.ci

sonarqubecloud · 2024-01-18T15:09:24Z

Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

2 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

m1n0 · 2024-01-25T23:34:54Z

Hi, thanks for the contribution! Without taking the configured limit into consideration we cannot merge this as-is unfortunately. Yes indeed the comment acknowledges a potential issue if too many rows are fetched, but the way this works is that the failed rows samples queries have a built in LIMIT clause in case samples limit configuration is present, with 100 being a default.
There main reason why this is not done while fetching the results but in the query is that a large, non-limited query might cause performance issues, even if only a subset of rows is fetched afterwards.

That being said:

there might still be a case where LIMIT is not applied correctly - if so, could you please report it by creating an issue?
IF the configured sample is propagated into the DbSample fetch method then yes, limiting the result there would make 100% sense as a last resort

m1n0

Just applying the limit here would break the samples limit configuration, please see my comment about how I suggest we proceed here

Add default limit for fetching failed rows in DbSample

0a4d9f8

[pre-commit.ci] auto fixes from pre-commit.com hooks

d2fcf82

for more information, see https://pre-commit.ci

NathanBick mentioned this pull request Jan 23, 2024

Apply sample limit to UDFailedRowsExpressionQuery #1986

Draft

m1n0 requested changes Jan 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add default limit for fetching failed rows in DbSample#1991

Add default limit for fetching failed rows in DbSample#1991
NathanBick wants to merge 2 commits intosodadata:v3from
NathanBick:issue-1985-default-sample-limit

NathanBick commented Jan 18, 2024

Uh oh!

CLAassistant commented Jan 18, 2024 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Jan 18, 2024

Uh oh!

m1n0 commented Jan 25, 2024

Uh oh!

m1n0 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

NathanBick commented Jan 18, 2024

Uh oh!

CLAassistant commented Jan 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Jan 18, 2024

Quality Gate passed

Uh oh!

m1n0 commented Jan 25, 2024

Uh oh!

m1n0 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

CLAassistant commented Jan 18, 2024 •

edited

Loading