Skip to content

CSQ masks non-coding gene annotations #2548

@MattWellie

Description

@MattWellie

We've identified a corner case where a clinically relevant non-coding gene (RNU2-2[P]) overlaps a non-clinically-relevant gene (WDR74). bcftools csq's logic only searches for non-coding transcript consequences if there are no coding-transcript hits.

https://github.com/samtools/bcftools/blob/develop/csq.c#L3694

We are using BCFtools in a workflow where csq annotates variant consequences, but also to associates variants with genes, so annotation on non-coding genes is still important. This csq decision was obscured for a while because in our hands it was annotating some non-coding genes just fine (e.g. RNU4-2), though that now appears to be because RNU4-2 doesn't overlap a coding transcript, so this condition was never triggered.

We were able to overcome this issue by splitting the GFF into coding and non-coding, and doing two non-conflicting annotation loops, but we've also solved the problem in code and wondered if this was a change you might be interested in adopting.

  • Original logic: If a CDS, UTR, or splice consequence is annotated on the variant record, don't run the transcript scan (here is the only point in code where non-coding CSQs originate)
  • New logic: Record whether a CDS, UTR, or splice consequence is annotated on the variant record, then run the transcript scan. If a coding variant was detected, skip all coding transcripts, but annotate non-coding transcripts as normal.

develop...populationgenomics:bcftools:develop

In practice this leaves the coding annotation unchanged, and always checks for overlapping non-coding gene annotations, removing the conflict between the two entities.

I appreciate non-coding annotation is not always useful, so this might not be useful for most users. It would be ideal if this was a CLI-switch behaviour to allow users to opt in to more non-coding annotation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions