Skip to content

timsconvert mzML files unreadable in DIA-NN #70

@sjust-seerbio

Description

@sjust-seerbio

We have observed an error with DIA-NN 1.8.1 reading certain mzML files created with timsconvert:

E:\EXP24054_2024ms0528X1_A_GA1_1_27896.mzML(1) : parseOffset() 2: Syntax error parsing XML.

Based on the results of running DIA-NN with strace it appears this crash occurs attempting to read the <indexList> element, which is not found at the reported indexListOffset.

The issue with the mzML can be verified using tail:

$ tail -c 200 EXP24054_2024ms0528X1_A_GA1_1_27896.mzML
 idRef="scan=51335">846541370</offset>
    </index>
  </indexList>
  <indexListOffset>846547400</indexListOffset>
  <fileChecksum>2a757d5a71d7aca0788a302e8b881e42d4045446</fileChecksum>

$ tail -c +846547400 EXP21063_2022bruker038bX25_A_BA4_1_757.mzML  | head
ML>
  <indexList count="1">
    <index name="spectrum">
      <offset idRef="scan=1">3832</offset>
      <offset idRef="scan=2">143201</offset>
      <offset idRef="scan=3">149715</offset>
      <offset idRef="scan=4">156293</offset>
      <offset idRef="scan=5">161496</offset>
      <offset idRef="scan=6">167723</offset>
      <offset idRef="scan=7">174251</offset>

The root cause of this error appears to be the update_spectra_count() function, which is not guaranteed to preserve byte offsets in the file.

We will provide a PR patching this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions