-
HUMAN Phase Validation: Fixed critical bug where
HUMAN-MAIN_TR01(with underscore) was incorrectly accepted as a valid phase. Enforced strictHUMAN-MAIN-TR##format validation acrossset_phase(),check_viallabel_dmaqc(), and all QC/release writer functions -
Phase Folder Parsing: Fixed
validate_phase()to extract onlyHUMANfrom folder paths, preventingHUMAN-MAIN-TR##from being parsed as a folder name. Added lookahead(?=/|$)to matchHUMANonly as a complete path segment -
Hyphen Counting in Phase Parsing: Replaced buggy
lengths(gregexpr())withstringr::str_count()incheck_viallabel_dmaqc(). The previous approach returned 1 for strings with no hyphens (e.g., plainHUMAN), causing incorrect branch selection -
LAB_CONV Assay Classification: Fixed
assay_codes.csvto classify LAB_CONV as "Clinical Chem,Clinical Chemistry" instead of "Metabolomics,Metabolome"
-
Uppercase Enforcement:
set_phase()now requires the phase string inmetadata_phase.txtto be UPPER CASE, stopping with a clear error if lowercase is detected -
Phase Error Messages: Improved
validate_phase()error messages to list all expected phase formats and append contextual guidance when a HUMAN folder is detected -
HUMAN Phase Branch Handling:
check_viallabel_dmaqc()now explicitly handles all HUMAN hyphen variants: 0 hyphens (plainHUMAN, tranche "00"), 1 hyphen (must be exactlyHUMAN-PRECOVID), 2 hyphens (must matchHUMAN-MAIN-TR##), and 3+ hyphens (warning) -
Release Folder Logic: Updated all QC writers (lab, metabolomics, olink, proteomics) to set
phase_folder_releasetohuman-mainonly whenphase_detailsmatches^human-main-tr\\d{2}$
-
PROT_AC Legacy Batch Support:
validate_batch()now accepts bothBATCH#_YYYYMMDDand legacyBATCH_YYYYMMDD(no batch number) formats for PROT_AC paths -
Vignette Overhaul: Revamped vignette styling with modular CSS, table of contents, Prism code highlighting, and improved print support
- Added comprehensive unit tests for HUMAN phase validation covering valid formats, invalid formats (underscore, lowercase, wrong tranche, extra hyphens), folder-phase parsing, uppercase enforcement, and
set_phase()metadata content validation
-
PDF Plot Generation: Fixed critical issue where PDF plots were not being created when running
validate_labfrom scripts. Changed frominvisible(gridExtra::grid.arrange())togridExtra::arrangeGrob()+grid::grid.draw()pattern, which properly renders to PDF device without printingTableGroboutput to console -
DMAQC Extra Samples: Fixed bug where extra samples (submitted but not expected in DMAQC) were reported but not counted as issues. Now properly sets
ic = "FAIL"when extra samples are detected -
Release Writer: Fixed bug in
write_lab_releaseswhere results data was accessed aslab_df$rinstead of the correctlab_df$r_o(matching the key returned byload_lab_batch)
-
DMAQC Validation Summary: Added informative summary message showing expected, submitted, and matched sample counts (e.g., "DMAQC summary: 761 expected, 967 submitted, 761 matched")
-
DMAQC Extra Samples Message: Improved message to show count of extra samples (e.g., "CAS SITE IS PROVIDING 206 SAMPLE IDS THAT ARE NOT IN DMAQC")
-
Code Optimization: Simplified multiple
grepl()calls with alternation pattern (e.g.,grepl("PH|AC|UB|OX", assay)instead of chained|operators)
- Human Study Support: Enhanced handling of HUMAN phase formats, including main and pre-COVID tranches, for more accurate validation of human study submissions
- DMAQC Validation: Fixed a critical bug where DMAQC validation failures were not being counted in the total issues. The
check_viallabel_dmaqcfunction returns a string status ("OK","FAIL"), which was incorrectly checked withis.numeric(). Now properly increments the issue count when validation fails - Proteomics: Fixed missing sample labels when reference (Ref) channels are absent in proteomics datasets
- Metabolomics: Fixed incorrect variable assignment in
write_metabolomics_releaseswhere cleaned sample metadata was incorrectly assigned tom_s_uinstead ofm_s_n - LAB QC Plots: Fixed plot layout issues in
plot_basic_lab_qcthat caused plots to print to console instead of saving to PDF. Refactored to usegridExtra::grid.arrange()properly
- QC Date Format: Standardized QC date format to
YYYYMMDDacross all validation modules (validate_lab,validate_metabolomics,validate_olink,validate_proteomics), ensuring consistency in output filenames and reports - Phase Parsing: Improved phase parsing in
check_viallabel_dmaqcto handle complex HUMAN phase formats (e.g.,HUMAN-MAIN-TR04) correctly - Release Writer: Added handling for
human-mainphase in release folder logic for proper directory structure generation - PDF Output: Improved plot output flow—suppresses console noise when saving to PDF and logs saved file locations
- Updated
.Rbuildignoreto use explicit regex patterns for more reliable build process - Various code refactoring for better maintainability
- Assay Codes: Added new assay code for Whole Genome Sequencing (WGS) to
inst/extdata/assay_codes.csv
- Assay Codes Updates: Populated
submission_codevalues for several existing transcriptomics and epigenomics entries; corrected typo for 'Phosphoproteomics' - File Location Enforcement: Modified
set_phasefunction to enforce thatmetadata_phase.txtmust be located directly within the batch folder (changedrecursiveparameter fromTRUEtoFALSE)
- Updated the data object
assay_codes:- Updated
omics_textvalues to omics technologies - Added
ome_textfor the "omes": the complete set of a given biological entity (all genes, all proteins, etc.) - Added
assayas a copy ofassay_code
- Updated
- Updated the data object
assay_codeswith 2 new variables:assay_short_text: Abbreviated name specific to the assay for use in graphs and tablesome_text: Omic class measured by the given assay
- Updated style of the vignettes
- Fixed minor issue with
dl_read_gcp
- Updated
dl_read_gcp: now supportsgcloudin addition togsutil
- Clinical Chemistry Support: Added QC support for clinical chemistry assays: glucagon, insulin, cortisol, and creatine kinase
- Conventional metabolites (previously
metab-t-conv) now expected as a new assay within this category (lab-conv)
- Numerous bug fixes and enhancements
- Download and read file from GCP function can create recursive folders (@christopherjin)
- Adjustments in metabolomics metadata sample files QC to enable processing of old submissions (before batch related variables were required)
- Proteomics: Added QC support for TMT-18
- Enhanced and improved
dl_read_gcp:- Check if
gsutilpath is correct and report back to the user if it is not - Handle spaces in folder names (although not recommended)
- Improved error source detection
- Improved verbosity and feedback to the user
- Check if
- Critical Update: Fixed
validate_refmetnameto ensure checking the refmet standardized name; updated refmet tests - Updated
get_and_validate_mdd():- Updated REST service URL
- Updated documentation
- Removed dependency on data.table
- Enhanced: only one
metadata_phasefile allowed - Enhanced
dl_read_gcp: replaced data.table by read_delim - Enhanced
open_file: accepts only tab-delimited files
- Critical Update: Resolved an issue where the validation of refmet names was compromised due to updates to the Metabolomics Workbench REST service. This version introduces adjustments to ensure accurate validation of refmet names.
- New assay:
PROT_OX - Fixed package conflicts
- OLINK: write release adjustments
- Added support for OLINK datasets (check
olink_qcvignette to find out more)
- Adjusted function to download data from GCP (
dl_read_gcp): automatically detects the operating system (argumentsignore_std_errandignore_std_outdeprecated) - Multiple fixes and enhancements
- Fixed bug preventing the processing of BICRESULTS folders (proteomics)
- Made clear that the
metadata_phase.txtfile is required - Other enhancements
- Added 24-hour time support for the
acquisition_date(MM/DD/YYYY HH:MM:SS)
- Added QC for the new required batching variables
- Replaced deprecated
ggplotfunction - Fixed issues with
dl_read_gcp - Other adjustments
- Minor adjustments
- New tissue codes, abbreviations, and colors available for lateral Gastrocnemius and vena cava
- Added new
dl_read_gcp - Replaced dplyr
summarisefunction (deprecated) byreframe - Fixed bug affecting
proteomics_plots
- Fixed typo
- Fixed bug affecting IMM assays
- New metabolomics targeted assays (IMM_GLC, IMM_INS, IMM_CTR)
- Removed exception of non-unique raw files allowed for CONV assays, and added to IMM assays
- Improved metabolomics documentation
- Added exception: unique raw files are now not required for metabolomics CONV assay
- Updated
assay_codes: immunoassay/IMMUNO added. The table now also includes assay hex colours and assay abbreviation
- Bug fix
- Better handling of large proteomics datasets:
- Proteomics RII plots are skipped if the dataset is too large
- Larger PDF size for proteomics ratio plots
- Several improvements and enhancements
- Improved DMAQC validation
- Updated
write_proteomicsaccording to latest updates on data/file structure - Fixed bugs affecting
metabolomics_qcand checks onfile_manifest
- Metabolomics plots: check if enough compounds to generate plots
- Updated
assay_codes: conventional assays code added (CONV)
- Adjustments to generate data releases (deal with pass1a/1c)
- Version number will be added to upcoming releases
- Updated a package's dependency
Updates affecting the proteomics validation:
validate_proteomics: renamed argumentrun_by_bictocheck_only_results. Default is stillFALSE(does not affect CAS)- Adjusted size of PDF output depending on the number of samples
- Support
BICRESULTS_YYYYMMDDfolder validation (similar to the currently supportedRESULTS_YYYYMMDDandPROCESSED_YYYYMMDDfolders). This folder is the output of the proteomics pipeline run by the BIC
- Updated MoTrPAC color abbreviations
- Metabolomics: new density plots
- Code optimizations
- Bug fixes affecting file manifest checks
- Refactored the DMAQC validation. A new file will be required when:
- Two phases are combined in the same batch (e.g.,
PASS1A-06|PASS1C-06) - The phase content is different from the input folder name (e.g.,
PASS1C-06might be submitted but the input folder name isPASS1A-06)
- Two phases are combined in the same batch (e.g.,
- DMAQC validation: print out missing vial labels
- Metabolomics QC: added mz/rt density plots
- Improved reporting and handling of required files
- Fixed minor bugs
- Metabolomics: fixed manifest checks
- Metabolomics: adjusted metabolomic plots to deal with a large number of samples
- Metabolomics new plot: sum of intensity/concentration
- Metabolomics: detects negative values
- Metabolomics: updated vignette
- Proteomics: support for TMT-16
- Metabolomics: improved verbosity for wrong tissue code
- Fixed bug affecting the validation of refmet_names
- Updated data objects (immunoassay added)
- Refactored the validation of refmet_name. It now checks one at a time using the RefMet API. It also validates multipeak isoforms
- The function
get_and_validate_mdd()downloads the entire RefMet database (warning: >15MB) - The
metabolomics_data_dictionarydata object will be deprecated soon
- Support DMAQC validation of human submissions
- Adjustments for PASS1C-06
- Bug fixes affecting HUMAN phase processing
- New Metabolomics QC plots: number and proportion of named vs unnamed features identified
- New proteomics QC plots for protein coverage
- Bug fix
- Added human tissue codes
- Improved version for checking the file manifest from metabolomics submissions
- Enabled DMAQC validation for submissions combining multiple phases (e.g. PASS1A-06 + PASS1C-06)
- Metabolomics QC: new metabolomics QC plots, including number of IDs per sample, intensity distribution, and percentage of NA values
- Markdown: replaced
prettydocbyrmdformats - New assay code: CONV (Targeted Conventional metabolites or clinical analytes, provided by Duke)
- New Phase: HUMAN (name of the new project folder for the human studies)
- Proteomics QC: new proteomics QC plot, number of unique IDs per sample
- Proteomics QC: improved QC plots
- New metabolomics
sample_type:QC-ReCAS, Global reference biological material prepared at CAS
- Updated data dictionary (GTech's KEGG revision)
- Bug fixes
- Updated README
- Proteomics QC: enabled option to check data processed at the BIC
- Bug fixes (color code)
- Proteomics QC: updated warnings affecting
gene_symbolandentrez_idwhen missing IDs - Fixed issue affecting Windows machines (Pierre J-B)
- Several color code fixes for assay (Nicole G)
- Bug fixes
- New QC checks for the new proteomics requirements
- Updated metabolomics data dictionary, including:
- Broad Metabolomics revision (39 new KEGG IDs + minor corrections)
- Targeted refmet_name
- Fixed text in warning in case of missing manifest file
- Added and improved tests
- Fixed and improved check for new required manifest file for untargeted metabolomics datasets
- Updated
tissue_colsdata object (Nicole Gay)
- Fixed bug in
validate_metabolomicsaffecting unnamed sample check
- Improved QC for the required manifest file (proteomics and metabolomics)
- Metabolomics data dictionary available as a data object:
metabolomics_data_dictionary - The function
get_and_validate_mdd()still works, but does not pull the data from Metabolomics Workbench. Just returnsmetabolomics_data_dictionary - Bug fix: the manifest issue count now properly displays the number of issues detected
- Bug fix: restored previous version of
bic_animal_tissue_code
- Added new assays
- Bug fix on DMAQC to deal with missing data in DMAQC table
- Proteomics: addressed 130C missing channel issue affecting the Broad
- Proteomics: added additional QC check point to validate that all values in vial_label column are unique
- Proteomics:
write_proteomicsupdated (only required columns selected) - Fixed bugs and typos
- Added check point when expected files are not available in the manifest file
- Fixed bugs affecting manifest files
- Adjustments when files don't meet requirements
- Check new required manifest file in both proteomics and metabolomics submissions
- Raw file manifest now optional
- Proteomics write proteomics
- Proteomics load metabolomics
- Proteomics QC support
- New assays
- Bug fixes