This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
An R package for mass spectrometry-based label-free quantification (LFQ) proteomics analysis. It provides a complete workflow: QC, normalization, protein aggregation, statistical modelling, hypothesis testing, and sample size estimation. Data is always in long (tidy) format. Branch Modelling2R6 is the active development branch.
make test # Run testthat suite (runs document first)
make check-fast # R CMD check without vignettes (quick validation)
make check # Full R CMD check (document → build → check)
make document # Generate roxygen2 docs (NAMESPACE + man/)
make install # Install package locally
make lint # Run lintr static analysis
make format # Format with air
make build-vignettes # Build vignettes into inst/doc
make site # Build pkgdown site locallySingle test file:
Rscript -e "testthat::test_file('tests/testthat/test-LFQData.R')"Library setup:
Rscript -e ".libPaths()"Use the normal user / system R libraries for this workspace; renv autoload is disabled.
- Line length: 120 chars, indentation: 2 spaces (
.lintr) object_name_linteris disabled — the codebase uses camelCase for R6 classes and snake_case/mixed for functions- NAMESPACE is auto-generated by roxygen2 — never edit directly; run
make document - Roxygen is configured with
r6 = TRUEfor R6 class documentation - Never use
\dontrun{}or\donttest{}in@examples— all examples must run during R CMD check. If an example is too slow, optimize it instead of skipping it.
Raw Data + AnalysisConfiguration → LFQData
├── get_Transformer() → LFQDataTransformer (log2, robscale, normalize)
├── get_Aggregator() → LFQDataAggregator (peptide → protein rollup)
├── get_Stats() → LFQDataStats (CV, variance per group)
├── get_Plotter() → LFQDataPlotter (heatmaps, PCA, boxplots)
├── get_Summariser() → LFQDataSummariser (missingness, hierarchy counts)
└── get_Imputer() → LFQDataImp (missing value imputation)
LFQData → build_contrast_analysis(lfqdata, modelstr, contrasts, method)
└── Returns a Facade with uniform API:
$get_contrasts(), $get_missing(), $get_Plotter(), $to_wide()
build_contrast_analysis() is the recommended entry point. Each method dispatches to a Facade class that wires strategy → model → contrasts → moderation internally.
Aggregated input (protein-level, subject_Id == hierarchy_keys):
lm, rlm, lm_missing, lm_impute, limma, deqms, firth
Nested input (peptide-level, subject_Id is strict subset of hierarchy_keys):
lmer, ropeca
config$nr_children names the column tracking child-feature counts (e.g. peptides per protein). After get_Aggregator() rollup, each protein×sample row gets its own count — nr_children is sample-wise. For peptide/precursor-level data it is typically 1.
Two distinct uses:
-
Fitting weights (sample-wise): Aggregated facades (
lm,limma,lm_missing,lm_impute,deqms) passnr_childrenasweightsby default tolm()orlimma::lmFit(). This down-weights protein intensities derived from fewer peptides in a given sample. Disable withweights = NULL. -
DEqMS variance moderation (experiment-wide):
ContrastsDEqMSFacadeadditionally aggregatesnr_childrenviamax()per protein across all samples for count-dependent variance shrinkage. This is separate from the fitting weights.
Protein-level input must carry nr_children. If the column is missing, setup_analysis() adds it set to 1 with a warning — but this defeats the purpose for aggregated data where the actual peptide count matters.
Decorator/Composition: LFQData factory methods (get_Transformer(), get_Plotter(), etc.) return decorator objects that wrap the LFQData. Decorators hold a reference in their lfq field.
Method chaining: Transformer methods return self for chaining, access result via $lfq:
lfqdata <- lfqdata$get_Transformer()$log2()$robscale()$lfqStrategy pattern for models: Strategy R6 classes for models: StrategyLM, StrategyRLM, StrategyLmer, StrategyLogistf — each with model_fun, isSingular, contrast_fun, df_residual, sigma methods. Wrapper functions strategy_lm(), strategy_rlm(), strategy_lmer(), strategy_logistf() create instances. strategy_limma() returns a plain list (formula, trend, robust, weights) consumed by build_model_limma().
Config immutability: AnalysisConfiguration is always deep-cloned when passed to new LFQData instances. Never modify config in-place on an existing LFQData.
| Category | Classes | Files |
|---|---|---|
| Core data | LFQData, AnalysisConfiguration |
LFQData.R, AnalysisConfiguration.R |
| Decorators | LFQDataTransformer, LFQDataAggregator, LFQDataStats, LFQDataPlotter, LFQDataSummariser, LFQDataImp |
LFQData*.R |
| Model interfaces | ModelInterface, Model, ModelFirth, ModelLimma |
Model*.R, ContrastsLimma.R |
| Contrast interfaces | ContrastsInterface, Contrasts, ContrastsModerated, ContrastsLimma, ContrastsROPECA, ContrastsMissing, ContrastsFirth, ContrastsTable |
Contrasts*.R, ContrastFirth.R, ContrastsSimpleImpute.R |
| Visualization | ContrastsPlotter |
ContrastsPlotter.R |
| Utilities | MissingHelpers |
tidyMS_missingness_imputation.R |
Flat R6 class that maps column roles in the data:
- hierarchy: ordered measurement levels (protein_Id → peptide_Id → precursor_Id → fragment_Id).
hierarchy_depthcontrols which level is modelled. - factors: explanatory variables (group, treatment).
factor_depthcontrols interaction depth. - work_intensity: response column. Uses a stack (
set_response()/pop_response()/get_response()) for working with multiple intensity columns. - file_name: sample identifier column.
Concrete config factories (e.g. create_config_Skyline(), create_config_Spectronaut_Peptide()) were in tidyMS_R6_ConcreteConfigurations.R (now removed — create_config_MQ_peptide() was dead code). Remaining factories are in downstream packages.
build_contrast_analysis(lfqdata, modelstr, contrasts, method)— main entry point, returns a Facade (in build_contrast_analysis.R)setup_analysis(data, config)— prepare data for analysis (in tidyMS_data_setup.R)build_model(data, strategy, subject_Id)— fit per-protein models (in tidyMS_build_model.R)build_model_impute(lfqdata, strategy)— fit with LOD imputation + borrowed covariance for missing groups (in tidyMS_build_model.R)build_model_limma(lfqdata, strategy)— fit limma matrix model (in ContrastsLimma.R)StrategyLM,StrategyRLM,StrategyLmerR6 classes +strategy_lm/rlm/lmer()wrappers (tidyMS_R6_Modelling.R);StrategyLogistf+strategy_logistf()(logistf.R)strategy_limma()— limma matrix model strategy (in ContrastsLimma.R)sim_lfq_data_peptide_config()— simulate test data (in simulate_LFQ_data.R)
R/LFQData*.R— Core data container and its decorator classesR/Model*.R,R/Contrasts*.R— Modelling and hypothesis testingR/AnalysisConfiguration.R— Configuration (column role mapping + serialization)R/tidyMS_data_setup.R—setup_analysis,complete_cases,sample_subsetR/tidyMS_summarize_hierarchy.R—table_factors,hierarchy_counts, etc.R/tidyMS_R6_Modelling.R— Strategy R6 classes (StrategyLM,StrategyRLM,StrategyLmer)R/tidyMS_build_model.R—build_model,model_analyse, imputation internalsR/tidyMS_contrasts.R—linfct_*family,compute_contrast,contrasts_linfct,pivot_model_contrasts_to_wideR/tidyMS_moderation.R—moderated_p_limma*,adjust_p_values, ROPECA, FisherR/tidyMS_*.R— Other utility functions (plotting, stats, aggregation, missingness)R/utilities.R— Shared helpers (make_interaction_column,.error_handler)
options(prolfqua.vectorize = TRUE) activates vectorized implementations of compute_contrast and linfct_matrix_contrasts (matrix multiplication instead of per-row loops). Affects all Wald test facades (lm, rlm, firth, lmer) and limma's linfct path. Results are numerically identical. Default is FALSE.
- When fixing a bug, first add a test that reproduces it, then fix. This ensures regressions are caught.
11 test files in tests/testthat/:
test-LFQData.R— Core data container and decoratorstest-Model.R— Model fitting and coefficient extractiontest-Contrasts.R— Contrast computation (Wald test path)test-ContrastsFacades.R— All facade classes andbuild_contrast_analysis()test-ContrastsLimma.R— Limma backend (ModelLimma, ContrastsLimma, merge, 2-factor)test-ContrastsModeratedDEqMS.R— DEqMS moderation and facadetest-ContrastsPlotter.R— Contrast visualizationtest-ImputeModel.R— LOD imputation with borrowed covariancetest-plotting_functions.R— Low-level plotstest-tidyconfig_functions.R— Configuration and utilitiestest-vectorize-contrasts.R— Side-by-side original vs vectorized contrast functions
prolfqua is part of the prolfqua ecosystem (see ../CLAUDE.md). Downstream packages depend on its R6 classes and exported API:
- prolfquapp — CLI wrapper for core facility workflows
- prophosqua — Phosphoproteomics analysis
- prolfquabenchmark — Benchmarking vignettes
Renaming R6 methods, changing exported function signatures, or modifying AnalysisConfiguration fields can silently break these packages.