Setting a batch analysis for single alignment-wide ω

Hi,
I have identified orthologues with Orthofinder, and I am interested in identifying genes under selection with hyphy and if possible doing the analysis in parallel. 

I installed HyPhy with conda, and I am trying to understand how to set the parameters for the analysis.
Since I have duplications and potential paralogues I will restrict the analysis to alignments with single-copy orthologues.

I was reading the tutorial on the website and it says: "This tutorial assumes you are specifically using the HyPhy executable HYPHYMP. If you have installed a different executable (e.g. HYPHYMPI), you may need to alter some commands."
Since I installed with conda, I'd appreciate some help with the code.

The website indicates to use:
```
# Path to files of interest
FILEPATH=/absolute/path/to/directory/with/all/your/files/

# Run BUSTED on all files $FILEPATH/*.fas
for FILE in `ls $FILEPATH/*.fas`; do
    echo "Running BUSTED on $FILE"
    echo `(echo "10"; echo "4"; echo "1"; echo $FILE; echo "Y"; echo "1"; echo "d") | HYPHYMP`
done
```

I don't have HYPHYMP. First, I want to do a quick single alignment-wide ω  to find genes with signal of selection and then do BUSTED with one or two species of interest, and since I have access to a cluster I would like to submit several jobs with HYPHYMPI.

The code I thought would be:
```
#!/bin/bash

#PBS -l nodes=1:ppn=32
#PBS -l walltime=03:00:00
#PBS -l mem=100gb
#PBS -t 1-100
#PBS -k oe
#PBS -N single-msa-w
#PBS -o ${PBS_JOBNAME}_${PBS_JOBID}.log
#PBS -e ${PBS_JOBNAME}_${PBS_JOBID}.err
#PBS -l naccesspolicy=singlejob

# Load bash environment
source ~/.bashrc

# Init conda
source /data/common/MMA/mamba.init.sh
conda activate NGS-forge

cd /data/common/MMA/data/genomes/

# Save starting time
start_time=$(date +%s)
echo "Start ${PBS_JOBNAME}: Job started: $(date)"

# Get the sample name for this array job
base=$(sed -n "${PBS_ARRAYID}p" dn.dn.sps.txt)

#Calculate single alignment-wide ω

echo "Starting single alignment-wide ω on ${base} at $(date)"

HYPHYMPI CPU=32  ...
```

I can't find the code for running the single alignment-wide ω.
Could you please indicate the code to run the single alignment-wide ω?

Then, to run busted on one branch out of 11 (species), so I assume it would be: 

```
HYPHYMPI CPU=32  busted-ph --alignment path/to/${base}.fasta --tree my_tree.nwk --branches Hydlep --output ${base}.json
```

As mentioned before, I ran Orthofinder to identify orthologues and orthogroups. The issue/ perhaps advantage, is that it adds the file name as prefix to the protein fasta headeres, so the multiple sequence alignments look like:
Ailmel_rna-XM_034655643.1
Calurs_rna-XM_073887989.1
Canlup_rna-XM_038532347.1
Hydlep_GNX-026372-mRNA
Lepwed_LOC_00017029-mRNA-10
Mirang_rna-XM_064596364.1
Mirleo_rna-XM_035014847.1
Musnig_rna-XM_059416383.1
Neosch_rna-XM_044916755.1
Ursarc_rna-XM_026498632.3

In my case, the tree is:
((((Calurs,(Halgry,((((Hydlep,Lepwed),(Mirang,Mirleo))),Neosch))),Musnig),(Ursarc,Ailmel)),Canlup);

Can I reconcile the alignment and the tree with _trim-tree_, next either parse or save the reconciled files and then run the analysis?

Orthofinder uses protein data. I looked at the alignments and many seem to need trimming due to issues like gaps. Is there any program that you recommend for doing protein-alignment trimming? I am planning to use trimAI but I wanted to ask you first.

Lastly, after trimming the aa, can HyPhy run the single alignment-wide ω/ busted analysis with the protein alignment (set input data) or must I backtranslate to codon sequences?

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting a batch analysis for single alignment-wide ω #1955

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Setting a batch analysis for single alignment-wide ω #1955

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions