Skip to content
This repository was archived by the owner on Oct 15, 2020. It is now read-only.

Compare cbgen performance to bgen-reader-py #24

@eric-czech

Description

@eric-czech

This will primarily be helpful to understand if using dask for parallelism over a thread-safe reader library has any obvious disadvantages over putting parallelism in the reader library itself. Drawing this conclusion from different libraries won't be ideal, but Carl had these numbers handy for bgen-reader-py so we should make sure cbgen is comparable once #20 is done:

I added multithreading to the Numpy-inspired reader. Using this API, on my 6 processor machine from a SSD, I was able to read 109 variants/second (53 million distributions/second). This was on file ‘merged_487400x220000.bgen’, which tries to be like the UKBio Bank data.
(Single threaded performance is 31 variants/second and 15 million distributions/second. We also verified that the cbgen interface is thread-safe.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions