Lotka-Volterra Competition Model in DNA Sequence Evolution

Install the BarcoSim package

Install the package using the following commands

# for now you can install the developemental version of ltc
# first you need to install the devtools package 
# in case you have not already installed
install.packages("devtools") 
# and load it
library(devtools)

# then you can install the dev version of the ltc
devtools::install_github("loukesio/BarcoSim")
# and load it
library(BarcoSim)

1. Use the `gpseq` command to generate the parent sequences.

Parameters:

num_sequences: An integer specifying the number of DNA sequences to generate.
seq_length: An integer representing the length of each DNA sequence.
range_start: An integer indicating the start position of the barcoded sequence.
range_end: An integer indicating the end position of the barcoded sequence.

library(Biostrings) # Provides tools for working with biological sequences, such as DNA, RNA, and protein sequences
library(BarcoSim)   # BarcoSim: A package for simulating barcoded sequencing data
library(dplyr)      # A powerful package for data manipulation and transformation,


set.seed(123)       # sets the random seed to ensure the reproducibility of a random processes (generation of sequences)

# This function creates 5 parent sequences, each with 10 base pairs and a single barcode area spanning from base 3 to base 6.
df1.1 <- gpseq(num_sequences=5, seq_length=10, range_start=3, range_end=6)

df1.1 %>% 
  DNAStringSet()
  
#> DNAStringSet object of length 10:
#>      width seq
#> [1]    10 CGCAGCGTAA
#> [2]    10 CGTGTTGTAA
#> [3]    10 CGGCAAGTAA
#> [4]    10 CGTCGGGTAA
#> [5]    10 CGATGCGTAA

# Create five parent sequences, each consisting of 10 base pairs, with multiple barcoded regions spanning from base 2 to base 3 and 
# from base 6 to base 8.
df1.2 <- gpseq(5,10,range_start=c(2,6), range_end=c(3,8))

df1.2 %>% 
DNAStringSet()

DNAStringSet object of length 5:
    width seq
[1]    10 GCTTAGGACG
[2]    10 GTGTATGGCG
[3]    10 GCGTACTCCG
[4]    10 GGGTATGTCG
[5]    10 GATTAGCTCG

^{Created on 2023-04-15 with reprex v2.0.2}

The outcome of the gpseq contains the conserved sequences from 1-2 and 7-10, and the barcode sequences from 3-6 (see Figure1). In addition with the help of the function calcSeqSim we can quantify the similarity among sequences at each base pair.

2. Use the `calcSeqSim` function to plot sequence similarity across the parent sequences

parameters:

dna_seq: A character vector of DNA sequences obtained as the output of the gpseq function.

Plotting sequence similarity in a sequence with a single barcoded area:

library(BarcoSim)
library(ggplot2)
library(dplyr)

df1.1 = c("CGCAGCGTAA", "CGTGTTGTAA", "CGGCAAGTAA", "CGTCGGGTAA", "CGATGCGTAA")
df1.1
#> [1] "CGCAGCGTAA" "CGTGTTGTAA" "CGGCAAGTAA" "CGTCGGGTAA" "CGATGCGTAA"

#______________________________________________
# Find sequence similarity at each position
#______________________________________________
calSeqSim(df1.1)
#> [1] 100 100  40  40  60  40 100 100 100 100

#___________________________
# plotting example :)
#____________________________
# Create the tibble
df1.1_data <- calSeqSim(df1.1)  %>% 
  tibble() %>% 
  dplyr::rename(similarity=1)

# Create the geom area plot
ggplot(df1.1_data, aes(x = 1:nrow(df1.1_data), y = similarity, fill = similarity)) +
  geom_area(color = "#333333", fill = "#edae49") +
  xlab("Base pair (bp)") +
  ylab("Percentage of Similarity") +
  ggtitle("Sequnce similarity plot per bp") +
  scale_x_continuous(breaks=seq(1:10)) +
  theme(panel.grid = element_blank(),
        panel.background = element_rect(fill = "white"), 
        plot.title = element_text(hjust=0.5), 
        axis.text = element_text(size=12), 
        axis.ticks.length = unit(.2, "cm")) +
  geom_vline(xintercept =c(3,6), linetype="dashed")

# similarly you can plot the df1.2 data

3. Use the `r_gpseq` command to replicate parent sequences and make a barcode data set.

Parameters:

dna_seq: A character vector of DNA sequences, obtained as the output of the gpseq function.
num_replicates: An integer specifying the number of times each parent sequence should be replicated.
error_rate: A numeric value representing the probability error rate during the replication process.

library(BarcoSim)

print(df1.1)
#> [1] "CGCAGCGTAA" "CGTGTTGTAA" "CGGCAAGTAA" "CGTCGGGTAA" "CGATGCGTAA"

#With the current parameters of the r_gpseq function, you can replicate each parent DNA sequence in dna_seq twice with a 0.1 probability error rate.

r_gpseq(dna_seq=df1.1,num_replicates=2,error_rate=0.1)
#>    parent parent_seq  offspring
#> 1       1 CGCAGCGTAA CGGTGCGTAA
#> 2       1 CGCAGCGTAA CGCAGCGTAA
#> 3       2 CGTGTTGTAA CGTGTTGTAC
#> 4       2 CGTGTTGTAA CGTGTTGTAA
#> 5       3 CGGCAAGTAA CGGCAAGTAA
#> 6       3 CGGCAAGTAA CCGCAAGTAA
#> 7       4 CGTCGGGTAA CGTCGGGTAC
#> 8       4 CGTCGGGTAA  GTCGGGTAA
#> 9       5 CGATGCGTAA  GATGCGTAA
#> 10      5 CGATGCGTAA CCATGCGTAA

^{Created on 2023-06-24 with reprex v2.0.2}

4. `r_gpseq_csub` replicates parent sequences with a specified error rate while allowing the specification of substitution rates for each base. It is an extension of the r_gpseq function.

dna_seq: A character vector of DNA sequences, obtained as the output of the gpseq function.
num_replicates: An integer specifying the number of times each parent sequence should be replicated.
error_rate A numeric value between 0 and 1 representing the probability error rate during the replication process.
substitution_probs (list of length 5): Includes substitution probabilities for each base (A, C, G, T, and empty string).

library(BarcoSim)

df1.1 = c("GCTTAGGACG", "GTGTATGGCG", "GCGTACTCCG", "GGGTATGTCG", "GATTAGCTCG")
substitution_probs <- list("A" = 0.1, "C" = 0.2, "G" = 0.3, "T" = 0.4, " " = 0.1)

r_gpseq_csub(dna_seq=df1.1,num_replicates=2,error_rate = 0.1, substitution_probs)
#>    parent parent_seq  offspring
#> 1       1 GCTTAGGACG GCTT AGACG
#> 2       1 GCTTAGGACG GCTTACGACG
#> 3       2 GTGTATGGCG GTGTAGGGCG
#> 4       2 GTGTATGGCG GTGTATAGCG
#> 5       3 GCGTACTCCG GCGTGCTCCG
#> 6       3 GCGTACTCCG GCGTACGCCG
#> 7       4 GGGTATGTCG GGGTATGTCG
#> 8       4 GGGTATGTCG GGGTATGTCG
#> 9       5 GATTAGCTCG GGTTGGCTCG
#> 10      5 GATTAGCTCG GAT GGC CG

^{Created on 2023-06-24 with reprex v2.0.2}

Lotka-Volterra Competition Model in DNA Sequence Evolution

Classical Lotka-Volterra Competition Equations

The classical Lotka-Volterra competition model describes how multiple species compete for limited resources:

For Species i:

dNᵢ/dt = rᵢNᵢ(1 - (Nᵢ + Σⱼ≠ᵢ αᵢⱼNⱼ)/Kᵢ)

Where:

Nᵢ = Population size of species i
rᵢ = Intrinsic growth rate of species i
Kᵢ = Carrying capacity for species i
αᵢⱼ = Competition coefficient (effect of species j on species i)
t = Time

Adaptation to DNA Sequence Evolution

In our DNA sequence simulation, each unique sequence type represents a "species" competing for resources:

Modified Growth Equation:

dNᵢ/dt = (bᵢ - dᵢ - cᵢ)Nᵢ

Where:

Nᵢ = Number of individuals with sequence type i
bᵢ = Birth rate (reproduction rate) for sequence i
dᵢ = Death rate for sequence i
cᵢ = Competition-induced mortality for sequence i

Implementation Components

1. Birth Rate (bᵢ)

bᵢ = base_reproduction_rate × fitness_multiplier

Parameters:

num_replicates: Base number of offspring per parent
fitness: Calculated from sequence properties (GC content, length)
fitness_variance: Random variation in fitness

In the code:

# Fitness calculation
calculate_fitness <- function(sequence, base_fitness = 1.0) {
    gc_content <- (str_count(sequence, "G") + str_count(sequence, "C")) / nchar(sequence)
    length_factor <- nchar(sequence) / mean(nchar(dna_seq))
    
    gc_fitness <- 1 - abs(gc_content - 0.5)  # Optimal GC ≈ 50%
    length_fitness <- exp(-0.1 * abs(length_factor - 1))
    
    fitness <- base_fitness × gc_fitness × length_fitness × rnorm(1, 1, fitness_variance)
    return(max(0.1, fitness))
}

2. Base Death Rate (dᵢ)

dᵢ = baseline_death_rate

Parameters:

death_rate: Either fixed value for all sequences or sequence-specific rates
Can be constant or vary by sequence type

In the code:

# Fixed death rate
death_rate = 0.1  # 10% baseline mortality

# Or sequence-specific
death_rate = c("AAGA" = 0.1, "AATC" = 0.3, "GGCC" = 0.05)

3. Competition-Induced Mortality (cᵢ)

This is where the Lotka-Volterra competition comes in:

Resource Competition Model:

cᵢ = α × max(0, (Σⱼ Nⱼ - K)/K)

Similarity-Based Competition Model:

cᵢ = α × Σⱼ≠ᵢ (sᵢⱼ × Nⱼ)

Parameters:

α = competition_strength (0-1): How strongly competition affects mortality
K = carrying_capacity: Maximum sustainable population
sᵢⱼ = Sequence similarity between types i and j
Nⱼ = Population of competing sequence type j

In the code:

# Resource competition
if (resource_competition) {
    pressure <- competition_strength × (total_pop - carrying_capacity) / carrying_capacity
}

# Similarity-based competition  
else {
    for (j in seq_len(nrow(generation_data))) {
        if (i != j) {
            similarity <- sequence_similarity(seq_i, seq_j)
            competitor_effect <- similarity × pop_j × competition_strength
            pressure <- pressure + competitor_effect
        }
    }
}

4. Total Death Probability

total_death_probᵢ = min(0.95, dᵢ + cᵢ)

The survival for each sequence type follows:

survivors = Binomial(Nᵢ, 1 - total_death_probᵢ)

Discrete Time Implementation

Since we simulate discrete generations, the continuous differential equation becomes:

Nᵢ(t+1) = Binomial(offspring_produced, 1 - total_death_probᵢ)

Where:

offspring_produced = Nᵢ(t) × reproduction_rate × fitness_multiplier
total_death_probᵢ = death_rate + competition_pressure

Parameter Summary

Parameter	Symbol	Code Variable	Description
Population size	Nᵢ	`population_size`	Number of individuals per sequence
Growth rate	rᵢ	`num_replicates × fitness`	Reproduction rate × fitness
Death rate	dᵢ	`death_rate`	Baseline mortality
Competition strength	α	`competition_strength`	Intensity of competitive effects
Carrying capacity	K	`carrying_capacity`	Maximum sustainable population
Sequence similarity	sᵢⱼ	`sequence_similarity()`	Genetic similarity between sequences
Fitness variance	σ²	`fitness_variance`	Random variation in fitness

Key Differences from Classical Model

Discrete generations instead of continuous time
Stochastic processes (mutations, binomial survival) rather than deterministic
Sequence similarity as basis for competition coefficients
Dynamic fitness based on sequence properties
Genealogical tracking of lineages over time

Competition Scenarios

High Competition (α = 0.5, K = 100):

Strong selection pressure
Rapid extinction of unfit lineages
Winner-takes-all dynamics

Moderate Competition (α = 0.1, K = 500):

Gradual selection
Multiple lineages can coexist
Balanced population dynamics

No Competition (α = 0, K = ∞):

Pure drift and mutation
All lineages persist longer
Population grows exponentially until death rate limits

This implementation captures the essence of Lotka-Volterra competition while adding the complexity of genetic evolution, mutation, and stochastic processes that make it realistic for studying sequence evolution!

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
R		R
docs		docs
logo		logo
man		man
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
BarcoSim.Rproj		BarcoSim.Rproj
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install the BarcoSim package

1. Use the `gpseq` command to generate the parent sequences.

2. Use the `calcSeqSim` function to plot sequence similarity across the parent sequences

3. Use the `r_gpseq` command to replicate parent sequences and make a barcode data set.

4. `r_gpseq_csub` replicates parent sequences with a specified error rate while allowing the specification of substitution rates for each base. It is an extension of the r_gpseq function.

Lotka-Volterra Competition Model in DNA Sequence Evolution

Classical Lotka-Volterra Competition Equations

For Species i:

Adaptation to DNA Sequence Evolution

Modified Growth Equation:

Implementation Components

1. Birth Rate (bᵢ)

2. Base Death Rate (dᵢ)

3. Competition-Induced Mortality (cᵢ)

Resource Competition Model:

Similarity-Based Competition Model:

4. Total Death Probability

Discrete Time Implementation

Parameter Summary

Key Differences from Classical Model

Competition Scenarios

High Competition (α = 0.5, K = 100):

Moderate Competition (α = 0.1, K = 500):

No Competition (α = 0, K = ∞):

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Install the BarcoSim package

1. Use the gpseq command to generate the parent sequences.

2. Use the calcSeqSim function to plot sequence similarity across the parent sequences

3. Use the r_gpseq command to replicate parent sequences and make a barcode data set.

4. r_gpseq_csub replicates parent sequences with a specified error rate while allowing the specification of substitution rates for each base. It is an extension of the r_gpseq function.

Lotka-Volterra Competition Model in DNA Sequence Evolution

Classical Lotka-Volterra Competition Equations

For Species i:

Adaptation to DNA Sequence Evolution

Modified Growth Equation:

Implementation Components

1. Birth Rate (bᵢ)

2. Base Death Rate (dᵢ)

3. Competition-Induced Mortality (cᵢ)

Resource Competition Model:

Similarity-Based Competition Model:

4. Total Death Probability

Discrete Time Implementation

Parameter Summary

Key Differences from Classical Model

Competition Scenarios

High Competition (α = 0.5, K = 100):

Moderate Competition (α = 0.1, K = 500):

No Competition (α = 0, K = ∞):

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Use the `gpseq` command to generate the parent sequences.

2. Use the `calcSeqSim` function to plot sequence similarity across the parent sequences

3. Use the `r_gpseq` command to replicate parent sequences and make a barcode data set.

4. `r_gpseq_csub` replicates parent sequences with a specified error rate while allowing the specification of substitution rates for each base. It is an extension of the r_gpseq function.

Packages