Skip to content

Add default limit for fetching failed rows in DbSample#1991

Open
NathanBick wants to merge 2 commits intosodadata:v3from
NathanBick:issue-1985-default-sample-limit
Open

Add default limit for fetching failed rows in DbSample#1991
NathanBick wants to merge 2 commits intosodadata:v3from
NathanBick:issue-1985-default-sample-limit

Conversation

@NathanBick
Copy link

Github Issue: #1985

Currently the failed rows check sample fetches all rows in dataset that fails. This fetches arbitrarily large data into memory, causing bugs. A comment in the code currently acknowledges this risk. The check does not respect either the default limit or the specified limit in a check yaml.

In PR proposes to resolve this bug by always respecting the default limit of 100. It is a very simple and contained PR. However, it does not resolve the issue of not respecting a user-specified samples limit. Nonetheless, this is an improvement. It makes the situation better and no worse.

@CLAassistant
Copy link

CLAassistant commented Jan 18, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


BickieSmalls seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@sonarqubecloud
Copy link

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

2 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@m1n0
Copy link
Contributor

m1n0 commented Jan 25, 2024

Hi, thanks for the contribution! Without taking the configured limit into consideration we cannot merge this as-is unfortunately. Yes indeed the comment acknowledges a potential issue if too many rows are fetched, but the way this works is that the failed rows samples queries have a built in LIMIT clause in case samples limit configuration is present, with 100 being a default.
There main reason why this is not done while fetching the results but in the query is that a large, non-limited query might cause performance issues, even if only a subset of rows is fetched afterwards.

That being said:

  • there might still be a case where LIMIT is not applied correctly - if so, could you please report it by creating an issue?
  • IF the configured sample is propagated into the DbSample fetch method then yes, limiting the result there would make 100% sense as a last resort

Copy link
Contributor

@m1n0 m1n0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just applying the limit here would break the samples limit configuration, please see my comment about how I suggest we proceed here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments