acscend2/pseudocode.md at main · SML-CompBio/acscend2

Pseudocode

1. Model Definitions

AttentionBlock

Inputs: input_dim, hidden_dim
Define a multihead attention block (nn.MultiheadAttention) with:
- embed_dim = input_dim
- num_heads = 8
Add a fully connected layer to project attention output to hidden_dim.
Forward Pass:
- Compute attention output using the input x.
- Pass the attention output through the fully connected layer.

Encoder

Inputs: input_dim, hidden_dim, attention_dim
Add a linear layer to project input_dim to 2048.
Use the AttentionBlock to apply attention to the output.
Add another fully connected layer to project hidden_dim to 512.
Add a dropout layer with probability 0.1.
Forward Pass:
- Pass input through the first linear layer and apply ReLU.
- Apply attention on the result.
- Pass through the second linear layer with ReLU.
- Apply dropout.

Decoder

Inputs: input_dim, hidden_dim, output_dim, signature_matrix
Add a linear layer to project 512 to 1024.
Add another linear layer to project to output_dim (cell fractions output).
Define gep_matrix as a learnable parameter (nn.Parameter).
Store the signature_matrix.
Forward Pass:
- Pass input through the linear layer and ReLU.
- Compute cell_fractions:
  - Normalize values between 0-1 using min-max normalization.
  - Normalize rows to sum to 1.
- Compute reconstructed_pseudobulk:
  - Use matrix multiplication between cell_fractions and signature_matrix.
- Return cell_fractions, reconstructed_pseudobulk, and gep_matrix.

DeconvolutionModel1

Combine Encoder and Decoder components.
Forward Pass:
- Pass input through the encoder.
- Pass encoded output through the decoder.

2. Training Function

deconvolution_train

Inputs: data, sig, freq, org, normalized
Split the data into train and validation sets using train_test_split.
Convert inputs into PyTorch tensors (X_train, Y_train, etc.).
Initialize the DeconvolutionModel1 with required dimensions and sig_matrix.
Define:
- Loss function: MSELoss
- Optimizer: Adam
Set hyperparameters:
- Learning rate
- Loss weights (l1, l2, l3, l4)
Training Loop:
- For each epoch:
  1. Forward pass through the model to get predictions.
  2. Compute losses:
    - Cell fraction loss (MSELoss)
    - Pseudobulk reconstruction loss
    - Pseudo-GEP loss
    - GEP-signature loss
    - KL divergence loss
  3. Combine weighted losses into total loss.
  4. Backpropagate and update model parameters.
  5. Evaluate on validation data and calculate:
    - Validation loss
    - CCC and RMSE metrics.
  6. Print progress every 10 epochs.
Return the trained model and tensors for further predictions.

3. Prediction Function

Deconvoluter

Inputs: data, sig, freq, org, normalized
Load the data.
Train the model using deconvolution_train.
Evaluate the model on test data (X_test_tensor) to get:
- test_cell_fractions
Initialize a second model (model2) with adjusted parameters.
Training Loop (Prediction):
- Similar to the training loop, with test data.
Generate predictions:
- gep_predictions1 → Gene expression profile.
- test_cell_fractions → Cell fractions.
Return results as Pandas DataFrames.

4. Stem Cell Class Predictor

Initialization

Load the pre-trained model (lr_model.joblib) and store:
- Model's feature names
- Class labels for predictions.

Data Preprocessing

Input: Path to CSV data.
Load the data using pandas.
Ensure the columns match the model's features.
Perform transformations:
- Rank data (rankdata)
- Apply log2 transformation.
- Standardize (z-score normalization).
Return processed data.

Prediction

Predict the class probabilities or labels:
- If prob=True, return probabilities as a DataFrame.
- If prob=False, return predicted class labels.

Call Method

Allow calling the class object as a function to run predictions.

5. Execution Flow

Train the model using the deconvolution_train function.
Use the Deconvoluter function to test and validate predictions.
Apply the Predictor class for downstream predictions with preprocessed data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pseudocode

1. Model Definitions

AttentionBlock

Encoder

Decoder

DeconvolutionModel1

2. Training Function

deconvolution_train

3. Prediction Function

Deconvoluter

4. Stem Cell Class Predictor

Initialization

Data Preprocessing

Prediction

Call Method

5. Execution Flow

FilesExpand file tree

pseudocode.md

Latest commit

History

pseudocode.md

File metadata and controls

Pseudocode

1. Model Definitions

AttentionBlock

Encoder

Decoder

DeconvolutionModel1

2. Training Function

deconvolution_train

3. Prediction Function

Deconvoluter

4. Stem Cell Class Predictor

Initialization

Data Preprocessing

Prediction

Call Method

5. Execution Flow