- Inputs:
input_dim,hidden_dim - Define a multihead attention block (
nn.MultiheadAttention) with:embed_dim = input_dimnum_heads = 8
- Add a fully connected layer to project attention output to
hidden_dim. - Forward Pass:
- Compute attention output using the input
x. - Pass the attention output through the fully connected layer.
- Compute attention output using the input
- Inputs:
input_dim,hidden_dim,attention_dim - Add a linear layer to project
input_dimto2048. - Use the
AttentionBlockto apply attention to the output. - Add another fully connected layer to project
hidden_dimto512. - Add a dropout layer with probability
0.1. - Forward Pass:
- Pass input through the first linear layer and apply ReLU.
- Apply attention on the result.
- Pass through the second linear layer with ReLU.
- Apply dropout.
- Inputs:
input_dim,hidden_dim,output_dim,signature_matrix - Add a linear layer to project
512to1024. - Add another linear layer to project to
output_dim(cell fractions output). - Define
gep_matrixas a learnable parameter (nn.Parameter). - Store the
signature_matrix. - Forward Pass:
- Pass input through the linear layer and ReLU.
- Compute
cell_fractions:- Normalize values between
0-1using min-max normalization. - Normalize rows to sum to 1.
- Normalize values between
- Compute
reconstructed_pseudobulk:- Use matrix multiplication between
cell_fractionsandsignature_matrix.
- Use matrix multiplication between
- Return
cell_fractions,reconstructed_pseudobulk, andgep_matrix.
- Combine
EncoderandDecodercomponents. - Forward Pass:
- Pass input through the encoder.
- Pass encoded output through the decoder.
- Inputs:
data,sig,freq,org,normalized - Split the data into
trainandvalidationsets usingtrain_test_split. - Convert inputs into PyTorch tensors (
X_train,Y_train, etc.). - Initialize the
DeconvolutionModel1with required dimensions andsig_matrix. - Define:
- Loss function:
MSELoss - Optimizer:
Adam
- Loss function:
- Set hyperparameters:
- Learning rate
- Loss weights (
l1,l2,l3,l4)
- Training Loop:
- For each epoch:
- Forward pass through the model to get predictions.
- Compute losses:
- Cell fraction loss (
MSELoss) - Pseudobulk reconstruction loss
- Pseudo-GEP loss
- GEP-signature loss
- KL divergence loss
- Cell fraction loss (
- Combine weighted losses into total loss.
- Backpropagate and update model parameters.
- Evaluate on validation data and calculate:
- Validation loss
- CCC and RMSE metrics.
- Print progress every 10 epochs.
- For each epoch:
- Return the trained model and tensors for further predictions.
- Inputs:
data,sig,freq,org,normalized - Load the data.
- Train the model using
deconvolution_train. - Evaluate the model on test data (
X_test_tensor) to get:test_cell_fractions
- Initialize a second model (
model2) with adjusted parameters. - Training Loop (Prediction):
- Similar to the training loop, with test data.
- Generate predictions:
gep_predictions1→ Gene expression profile.test_cell_fractions→ Cell fractions.
- Return results as Pandas DataFrames.
- Load the pre-trained model (
lr_model.joblib) and store:- Model's feature names
- Class labels for predictions.
- Input: Path to CSV data.
- Load the data using
pandas. - Ensure the columns match the model's features.
- Perform transformations:
- Rank data (
rankdata) - Apply
log2transformation. - Standardize (z-score normalization).
- Rank data (
- Return processed data.
- Predict the class probabilities or labels:
- If
prob=True, return probabilities as a DataFrame. - If
prob=False, return predicted class labels.
- If
- Allow calling the class object as a function to run predictions.
- Train the model using the
deconvolution_trainfunction. - Use the
Deconvoluterfunction to test and validate predictions. - Apply the
Predictorclass for downstream predictions with preprocessed data.