Skip to content

Harmonize the feature name columns params#646

Open
delfiterradas wants to merge 4 commits intonf-core:devfrom
delfiterradas:diff_column
Open

Harmonize the feature name columns params#646
delfiterradas wants to merge 4 commits intonf-core:devfrom
delfiterradas:diff_column

Conversation

@delfiterradas
Copy link

@delfiterradas delfiterradas commented Feb 18, 2026

Closes #618

  • Remove the --differential_feature_name_column parameter and consolidate its functionality into --features_name_col.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/differentialabundance branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@delfiterradas delfiterradas self-assigned this Feb 18, 2026
@github-actions
Copy link

github-actions bot commented Feb 18, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 3d2c865

+| ✅ 380 tests passed       |+
#| ❔  11 tests were ignored |#
#| ❔   1 tests had warnings |#
!| ❗  20 tests had warnings |!
Details

❗ Test warnings:

  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in nextflow.config: Update the field with the details of the contributors to your pipeline. New with Nextflow version 24.10.0
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • schema_lint - Parameter input is not defined in the correct subschema (input_output_options)
  • schema_description - No description provided in schema for parameter: deseq2_seed
  • schema_description - No description provided in schema for parameter: dream_p_value
  • schema_description - No description provided in schema for parameter: dream_lfc
  • schema_description - No description provided in schema for parameter: dream_confint
  • schema_description - No description provided in schema for parameter: dream_proportion
  • schema_description - No description provided in schema for parameter: dream_stdev_coef_lim
  • schema_description - No description provided in schema for parameter: dream_trend
  • schema_description - No description provided in schema for parameter: dream_robust
  • schema_description - No description provided in schema for parameter: dream_winsor_tail_p
  • schema_description - No description provided in schema for parameter: dream_ddf
  • schema_description - No description provided in schema for parameter: dream_reml
  • schema_description - No description provided in schema for parameter: dream_apply_voom
  • schema_description - No description provided in schema for parameter: dream_adjust_method

❔ Tests ignored:

❔ Tests fixed:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2026-02-27 18:48:09

@delfiterradas delfiterradas marked this pull request as ready for review February 18, 2026 18:50
@pinin4fjords
Copy link
Member

Could you flesh out the description here (and on #618)? It's hard to evaluate without more context on the reasoning.

I think there's an issue with the limma_soft profile. features_name_col and differential_feature_name_column aren't always interchangeable - one refers to the column in the features/annotation table, the other to the column as it appears in the differential results output. For limma_soft these were intentionally different:

  • features_name_col: Gene Symbol (column in the SOFT features table)
  • differential_feature_name_column: Symbol (column in limma's topTable output)

After this PR, the report looks for Gene Symbol in the DE results, won't find it (limma outputs Symbol), and falls back to showing probe IDs on volcano/upset plots instead of gene symbols.

The other profiles happen to have matching values so they're fine, but this case shows the two parameters existed for a reason. Could you either restore the SOFT handling, or ensure the column name gets harmonized earlier in the pipeline so they always match?

Also, please add a CHANGELOG entry for the parameter removal.

@grst
Copy link
Member

grst commented Feb 20, 2026

Ok, thanks for providing that additional context. If it's not too complicated I think we should address this and make sure that the column names from SOFT files get carried over correctly and automatically.

@delfiterradas
Copy link
Author

Thanks, I will look into this and make the changes.

@delfiterradas
Copy link
Author

delfiterradas commented Feb 20, 2026

@pinin4fjords I have checked and the Symbol column was not present in any of the tables used when using the limma_soft profile.
With the changes implemented now the report can find the gene symbols and does not have to fall back to the IDs.

This is the before:
Screenshot 2026-02-20 144645


and after:
Screenshot 2026-02-20 145059

@pinin4fjords
Copy link
Member

I still worry about the assumption of the gene symbol column being identical between these two use cases. What's your rationale for assuming their identity? Are we guaranteeing it somewhere?

@delfiterradas
Copy link
Author

I still worry about the assumption of the gene symbol column being identical between these two use cases. What's your rationale for assuming their identity? Are we guaranteeing it somewhere?

Hi @pinin4fjords! The features_name_col refers to the column in the feature metadata/GTF which is directly used in the report, regardless of the source.

The report joins the differential results and the features so the features_name_col will always be present:

merged <- merge(features, diff, by.x = params$meta$params$features_id_col, by.y = params$meta$params$differential_feature_id_column)

@pinin4fjords
Copy link
Member

Thanks for the thorough explanation, this makes sense to me now. The merge with the features table always happens before labeling, so features_name_col will be available in the merged data for all our standard profiles.

One small thing: differential_feature_name_column should also be removed from the profile configs (conf/soft.config, conf/affy.config, conf/rnaseq.config, conf/maxquant.config). They're dead params now and will just be confusing if left in.

@delfiterradas
Copy link
Author

delfiterradas commented Feb 27, 2026

One small thing: differential_feature_name_column should also be removed from the profile configs (conf/soft.config, conf/affy.config, conf/rnaseq.config, conf/maxquant.config). They're dead params now and will just be confusing if left in.

@pinin4fjords I am not sure if I understand this correctly, but these profiles don't exist any more in the dev branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants