-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
307 lines (217 loc) · 13.8 KB
/
README.Rmd
File metadata and controls
307 lines (217 loc) · 13.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r knitr-opts, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(MotrpacHumanPreSuspensionAnalysis)
```
# MotrpacHumanPreSuspensionAnalysis
<!-- badges: start -->
<!-- badges: end -->
## Overview
This R package provides the public release of summary statistics, differential
analysis results, and downstream modeling outputs from the **Molecular
Transducers of Physical Activity Consortium (MoTrPAC)** human pre-COVID
suspension cohort.
The first human cohort of MoTrPAC enrolled sedentary adults prior to study
suspension during the COVID-19 pandemic (N=175), randomized to endurance
exercise (EE), resistance exercise (RE), or non-exercise control (CON).
**This package focuses on the acute exercise bout** from that cohort.
Participants were randomized in an approximate 8:8:3 ratio to EE, RE, or CON
groups and also to temporal profiles of biospecimen collection. A non-exercising
group was deemed critical to control for the molecular effects of circadian
rhythm, fasting, tissue sampling, and any other non-exercise intervention
stimulus. The majority of participants were female (72%). Mean age was
41 ± 15 years, the average BMI was 26.9 ± 4.0 kg/m², and average VO2peak was
24 ± 7.0 ml/kg/min. See the MoTrPAC manuscripts for full cohort details.
There is a larger cohort of subjects being analyzed by the MoTrPAC Consortium for recruitment following the COVID suspension, and that analysis will cover
many more details about subgroup differences, including information about
response to longitudinal training, heterogeneity, etc.
### What is included
- **Differential analysis** results (all tissues and omic platforms)
- **Group-level summary statistics** (n, mean, SD per feature/group/tissue)
- **Enrichment results** (CAMERA-PR pathway analysis)
- **Fuzzy c-means clustering** and cluster-level enrichment
- **Feature-to-gene mapping** (Ensembl v105 / GENCODE 39)
- Visualization functions for heatmaps, PCA, enrichment, and single-feature plots
- **Splicing analysis** significant results (FDR < 0.05)
### What is NOT included
To protect participant privacy and comply with data-use governance policies,
individual-level (subject-level) molecular or phenotypic data are **not**
included. Such data are available only through formal data access requests to the
MoTrPAC consortium.
### Versioning note
The functions in version 0.2.0 were those used to generate the initial bioRxiv
pre-print. Over the course of reviews and additional analysis, modifications may
occur—refer to previous release history if you need to exactly recreate pre-print
Figures. We will aim to provide version information via the GitHub “Releases” section for major version milestones.
---
# Installation
## Requirements
**R >= 4.4 is required.** Several dependencies (e.g. TMSig) require R 4.4 or
later. The package will not install on older R versions.
## Important: Bioconductor Dependencies
This package relies on several Bioconductor packages (e.g. ComplexHeatmap,
Mfuzz, Biobase, TMSig). You must set the correct Bioconductor version for your
R installation before installing. Using the wrong Bioconductor version will
cause dependency failures.
| R version | Bioconductor version |
|-----------|---------------------|
| R 4.4.x | 3.20 |
| R 4.5.x | 3.22 |
**For R 4.4:**
```{r install-r44, eval = FALSE}
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install(version = "3.20")
devtools::install_github("MoTrPAC/MotrpacHumanPreSuspensionAnalysis",
build_vignettes = TRUE)
```
**For R 4.5:**
```{r install-r45, eval = FALSE}
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install(version = "3.22")
devtools::install_github("MoTrPAC/MotrpacHumanPreSuspensionAnalysis",
build_vignettes = TRUE)
```
You can check your R version with `R.version.string` and your Bioconductor
version with `BiocManager::version()`.
We recommend building the vignettes when installing
(use `vignette(package = "MotrpacHumanPreSuspensionAnalysis")` to browse them).
The vignette `package_overview` describes how to use the package functions in
detail, but the most common functions for those just looking to view the results
are also described here in this README.
## Troubleshooting
- **macOS (especially Apple Silicon):** If you see compilation errors or
missing tools, install Xcode Command Line Tools first:
```bash
xcode-select --install
```
- **Bioconductor errors** (e.g. version mismatch warnings or failed
binaries): Re-run `BiocManager::install(version = "X.YZ")` with the
correct version for your R installation (see table above) before the
GitHub install.
After installation, load the package:
```{r load-library, eval = FALSE}
library(MotrpacHumanPreSuspensionAnalysis)
```
## Consortium-only optional features
Most functionality in this package is fully public and works without additional
access. Some advanced workflows rely on the private package
`MotrpacHumanPreSuspensionData` that is available only to MoTrPAC consortium
members.
At the moment, the primary functions with this optional dependency are:
- `run_SCION()`
- `plot_precovid_cca()`
If the private package is not installed, these functions will return a clear
error message with access guidance.
## Getting help
For questions, bug reporting, and data requests for this package, please
[submit a new issue](https://github.com/MoTrPAC/MotrpacHumanPreSuspensionAnalysis/issues){target="_blank"}
and include as many details as possible.
---
# Usage
## Omic modeling summary statistics (Differential Analysis)
See more documentation via `?load_differential_analysis` or the vignette
Note that the public release of epigenetic files is through AWS CDN, and the file sizes are quite a bit larger than the other omes, so setting `epigen = TRUE` can be very slow!
```{r differential analysis}
differential_analysis = load_differential_analysis(
selected_omes = "all",
selected_tissues = "all",
single_matrix = FALSE,
epigen = FALSE,
combine_with_featgene = FALSE,
verbose = TRUE
)
names(differential_analysis)
names(differential_analysis[["blood"]])
```
By default,`load_differential_analysis` loads in the dataset in a nested list first by tissue, then by ome. Choose whichever tissues or omes you'd like via `selected_omes` or `selected_tissues`. You can find available tissues via `tissue_available_list()` or `ome_available_list()`. Or if you enter in a wrong mistaken tissue/ome, a warning or error will help.
If you would instead like to stack the matrixes more easily, use the `single_matrix` function, which basically unlists the list and sticks everything into a data.frame object.
```{r da-single-matrix}
single_matrix = load_differential_analysis(single_matrix = TRUE)
colnames(single_matrix)
```
For a quick explanation of each of the columns, you can find this via `?load_differential_analysis`
Importantly, this loads the differential analysis for each of comparisons mentioned in the methods, including the comparison between the endurance or resistance group relative to time, fasting, biopsy, etc. matched controls, comparison between the endurance and resistance groups directly, and finally comparisons within group without a matched control.
The majority of the analysis is done via exercise groups relative to the controls ("exercise_with_controls"). Make sure you filter to whichever category you prefer before continuing with analysis.
```{r contrast-types}
single_matrix %>% dplyr::pull(contrast_type) %>% unique()
```
If you'd like to display things in terms of the specific groups being compared instead, you can use the 'contrast_category' column. (EE-CON, RE-CON would be subsets of the 'exercise_with_controls' category from above, for example.)
```{r contrast-categories}
single_matrix %>% dplyr::pull(contrast_category) %>% unique()
```
The splicing data was processed in a separate analysis effort, but significant results (FDR < 0.05) are available in this R package as well. The full set is available on the motrpac data hub, but is not included here because of file size limitations. See "Exercise modulation of the alternative splicing landscape in human tissues" for more information.
```{r splicing-da}
names(SPLICING_DA)
head(SPLICING_DA$adipose$`AS-rMATS`, 3)
```
## Omic modeling summary statistics
See more documentation via `?load_summary_stats`
```{r summary stats}
summary_stats = load_summary_stats(
selected_omes = "all",
selected_tissues = "all",
single_matrix = FALSE,
verbose = TRUE
)
names(summary_stats)
names(summary_stats[["blood"]])
```
By default, load_summary_stats() loads group- and timepoint-level summary statistics for normalized expression data in a nested list structure, organized identically to the differential-analysis datasets. The top level corresponds to tissues, and the second level corresponds to molecular assays or platforms.
You may subset the data using selected_tissues and selected_omes. Available options can be queried via `tissue_available_list()` and `ome_available_list()`. If an invalid tissue or assay is supplied, informative warnings or errors are raised to guide correction.
If a stacked representation is preferred, set `single_matrix = TRUE`. This unlists the nested structure and returns a single `data.frame` with all selected tissues and assays.
Again - note that the sample level data is available for researchers upon request via the Motrpac Consortium.
```{r sumstats-single-matrix}
single_matrix = load_summary_stats(single_matrix = TRUE)
colnames(single_matrix)
```
Summary statistics were filtered to only those that qualified for differential analysis. This means for proteomics/phosphoproteomics, samples required a paired n>=3 to be included. See the methods in the manuscript for more information.
For metabolomics assays, summary statistics are computed after filtering redundant metabolites. Details of this filtering procedure are described in the Methods section of the manuscript.
For epigenetic assays (ATAC, methyl), only significant features are included, due to file size limitations.
## Enrichment Results
```{r camera-results-cols}
colnames(CAMERA_RESULTS)
```
Quick summary: CAMERA-PR is a method of enrichment that incorporates all features to generate a comparison of the test statistics between in-pathway vs out-of-pathway test statistics to see if the statistics within pathway are significant.
This file structure is more or less just an enrichment level match for the comparisons described in the single-matrix differential analysis results, where all tissues and assays are included in all the analysis.
## Feature to gene file
```{r feature-to-gene}
head(HUMAN_FEATURE_TO_GENE)
```
The feature-to-gene map links each feature tested in differential analysis to a gene, using Ensembl version 105 (mapped to GENCODE 39) as the gene identifier source. Proteomics feature IDs (UniProt IDs) were mapped to gene symbols and Entrez IDs using UniProt’s mapping files. Epigenomics features were mapped to the nearest gene using the ChIPseeker::annotatePeak() function with Homo sapiens Ensembl release 105 gene annotations. Gene symbols, Entrez IDs, and Ensembl IDs were assigned to features using biomaRt version 2.58.2 (Bioconductor 3.18). This file links all of the features included in any ome/tissue in our analysis. Use this to see how some levels of omic analysis (e.g. ATAC, RNAseq) may link up in terms of ome names.
# For Developers
## Testing installation with Docker
A `Dockerfile` is included to verify that the package and all its dependencies
install correctly in a clean Linux environment. This is useful for catching
missing system libraries or silent dependency failures before release.
```bash
# Build the image (installs all Imports, Suggests, and Remotes)
docker build -t motrpac-presuspension-test .
# Run the container to confirm the package loads
docker run --rm motrpac-presuspension-test
```
To render vignettes inside the container:
```bash
docker run --rm -v "$(pwd)/vignette_output:/output" motrpac-presuspension-test bash -c "
apt-get update && apt-get install -y --no-install-recommends pandoc &&
R -e \"
rmarkdown::render('vignettes/package_overview.Rmd', output_dir='/output');
rmarkdown::render('vignettes/differential_analysis.Rmd', output_dir='/output')
\"
"
```
## Acknowledgements
MoTrPAC is supported by the National Institutes of Health (NIH) Common
Fund through cooperative agreements managed by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute of Arthritis and Musculoskeletal Diseases (NIAMS), and National Institute on Aging (NIA).
Specifically, the MoTrPAC Study is supported by NIH grants U24OD026629 (Bioinformatics Center), U24DK112349, U24DK112342, U24DK112340, U24DK112341, U24DK112326, U24DK112331, U24DK112348 (Chemical Analysis Sites), U01AR071133, U01AR071130, U01AR071124, U01AR071128, U01AR071150, U01AR071160, U01AR071158 (Clinical Centers), U24AR071113 (Consortium Coordinating Center), U01AG055133, U01AG055137 and U01AG055135 (PASS/Animal Sites).
## Data Use Agreement
Recipients and their Agents agree that in publications using **any** data from MoTrPAC public-use data sets they will acknowledge MoTrPAC as the source of data, including the version number of the data sets used, e.g.:
* Data used in the preparation of this article were obtained from the Molecular Transducers of Physical Activity Consortium (MoTrPAC) database, which is available for public access at [motrpac-data.org](motrpac-data.org).
* Data used in the preparation of this article were obtained from the Molecular Transducers of Physical Activity Consortium (MoTrPAC) Pre-CovidSuspension Data release version 1.3.0.