Ideas for adding read alignment to kbo.
Design
- Implement a separate
align command, keep call and find assembly-only.
- Output from
align should probably be a .sam file.
Questions
align should take prebuilt indexes?
- Long reads only or also short? The former is much like mapping assemblies, so the theory should translate well.
- Should
map allow reads as input?
- Design kbo read alignment mainly for cases where we're mapping only against a single reference? Multi reference --> themisto
- Pangenome indexes and kbo? This would work well considering we can also identify the genome jumps via the 'RR' output from kbo.
Problems/considerations
- The index loses information about which sequence a k-mer belongs to ("colors")
- -> Multi reference alignment requires indexing everything separately.
SBWT and colors
- We don't have an efficient algorithm for extracting matching statistics on colored SBWTs.
- SBWT rust crate does not support colors although these could be ported over from the C++ code.
Algorithms
map relies on indexing the query and streaming the reference, this may not scale to (short) reads.
Ideas for adding read alignment to kbo.
Design
aligncommand, keepcallandfindassembly-only.alignshould probably be a .sam file.Questions
alignshould take prebuilt indexes?mapallow reads as input?Problems/considerations
SBWT and colors
Algorithms
maprelies on indexing the query and streaming the reference, this may not scale to (short) reads.