Invalid character results in wrong error message ("All sequences must have the same length")

I have one sequence (`hCoV_19_Norway_1539_2020_EPI_ISL_417487`) that `tn93` keeps thinking has one fewer characters than it actually has (or at least seems to have). I have attached a minimal working example below:

[example.txt](https://github.com/veg/tn93/files/4399270/example.txt)

I tried to run `tn93` as follows:

```bash
cat example.aln | tn93 -l 1 -t 1
```

But I get the following error message:

```
All sequences must have the same length (29811), but sequence 'hCoV_19_Norway_1539_2020_EPI_ISL_417487' had length 29810
```

However, I tried checking it in Python (`lines[3]` is the problematic sequence):

```python
lines = open('example.txt').readlines()

len(lines[1])  # prints 29812 (includes the newline at the end)
lines[1][:10]  # 'CTTCCCAGGT'
lines[1][-10:] # 'AATTTTAGT\n'
set(lines[1])  # {'\n', 'R', 'G', 'A', 'C', 'T', 'M'}

len(lines[3])  # prints 29812 (includes the newline at the end)
lines[3][:10]  # 'CTTCCCAGGT'
lines[3][-10:] # 'AATTTTAGT\n'
set(lines[3])  # {'V', 'S', '\n', 'R', 'G', 'I', 'A', 'C', 'Y', 'T'}

len(lines[5])  # prints 29812 (includes the newline at the end)
lines[5][:10]  # '----------'
lines[5][-10:] # 'AATTTTAGT\n'
set(lines[5])  # {'\n', 'G', 'A', '-', 'C', 'T'}
```

Excluding the newline character after every line (which is included in the lengths printed by the above code), each sequence has exactly 29811 characters.

The only weird character I see in the problematic sequence is `I`, which doesn't seem to be a standard [IUPAC character](https://www.bioinformatics.org/sms/iupac.html). Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid character results in wrong error message ("All sequences must have the same length") #19

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Invalid character results in wrong error message ("All sequences must have the same length") #19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions