-
-
Notifications
You must be signed in to change notification settings - Fork 215
fix: Don't skip spans whose parent has already been processed #1012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
FugiTech
wants to merge
1
commit into
getsentry:master
Choose a base branch
from
FugiTech:group_traces
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+59
−38
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Descendant spans can be sent in duplicate transactions
Medium Severity
The
span_complete?filter excludes incomplete children from a transaction but still includes their complete descendants (becauseget_all_descendantsrecurses through incomplete nodes). When the incomplete child later finishes and its parent is gone from storage, it becomes its own transaction root and re-collects those same descendants — sending them to Sentry twice. Theremove_child_spanscleanup doesn't prevent this because it only removes direct children, not nested descendants.Additional Locations (1)
lib/sentry/opentelemetry/span_processor.ex#L44-L49There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FugiTech this sounds legit, WDYT? I was wondering if we should just halt pruning of roots for as long as there are in-progress children.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I believe this is a legit problem with the PR as it stands today. Codex came up with FugiTech@16924b4 to solve it, which I've been running in production for a bit and seems to work?
But the core issue is I'm not actually that familiar with the Sentry Elixir SDK so I don't know what the proper fix is. Delaying emitting the entire transaction until all children are done probably wouldn't be good enough as children could start again after that point (delayed tasks, oban, handle_event in liveviews, etc). But I don't fully understand the SpanStorage/batching characteristics of this flow.
If you're OK with it, I'd love for you to take over and implement a fix however you see fit. You're welcome to use this PR and the commit I linked above as inspiration, but I just lack the confidence to say that it'll properly fix the issue and not introduce other weird behaviors.