Skip to content

Package a public dataset as an annotator and explore adding categorical value checkboxes #372

@jasminebro

Description

@jasminebro

👋 Welcome to OpenCRAVAT!
This issue is perfect for contributors who want to learn how to build annotators, work with public genomic datasets, and help improve the OpenCRAVAT module ecosystem.


📝 Summary

Package a small, static public dataset (such as a subset of dbSNP or an educational gene dataset) into a minimal OpenCRAVAT annotator.
Optionally, help identify columns that could benefit from categorical filters (checkboxes) in the UI.

This will improve usability and provide an example for others learning to create new annotators.


💡 Description

OpenCRAVAT’s power lies in its extensible annotator framework, where each annotator wraps a dataset or algorithm to provide variant-level information.
We want to provide new contributors with a reproducible, lightweight example that demonstrates:

  1. How to package and register a small dataset as an annotator.
  2. How to add categorical filters (checkboxes) to an existing annotator’s configuration.
  3. How to ensure build scripts and data update logic are clear and accessible for maintainers.

The Karchin Lab team will identify one or two annotators that could benefit from categorical filters and make their build scripts available for contributors to explore.


🧭 Steps to Complete

Part 1 – Create a Minimal Annotator

  1. Select or receive a small, public dataset (e.g., dbSNP subset, 1000 Genomes region, or an educational dataset).
  2. Use the OpenCRAVAT CLI to create a new annotator:
    oc new annotator example_dataset
  3. Place the dataset (as .csv, .tsv, or .sqlite) into the annotator’s data/ directory.

Edit info.yml to include:

title: Example Dataset Annotator
version: 1.0.0
description: "Annotator wrapping a small public dataset for demonstration."

Implement a simple lookup in init.py that retrieves data based on chromosome and position.

Test locally using:

oc run example_input.tsv -a example_dataset

Part 2 – Add Categorical Value Checkboxes

Review one of the Karchin Lab–suggested annotators that may benefit from checkbox filters (e.g., dbnsfp, gnomad, or a population frequency dataset).

Inspect its schema and identify columns that contain categorical values (e.g., population name, variant consequence, functional class).

Suggest 1–2 columns where checkbox filters could improve UI usability.

If confident, edit the info.yml to define checkbox options in the output_columns metadata:

output_columns:

  • name: consequence
    type: string
    categories: [missense, nonsense, synonymous]

Rebuild and test the annotator locally to confirm the new filters appear in the web viewer.

Part 3 – Ensure Build Scripts are Available

Confirm that the build scripts for existing or related annotators are:

Present in the repository (scripts/ or build/ directory).

Properly documented in the README or comments.

If missing, coordinate with the OC team to make them available for future contributors.

✅ Acceptance Criteria

A minimal example annotator is created, installed, and runs successfully.

The annotator’s metadata and schema follow OpenCRAVAT conventions.

Optional: Categorical filter suggestions are documented or implemented.

The build script(s) for one or more existing annotators are confirmed to be available.

Pull Request includes documentation updates (e.g., in module README).

All CI checks and tests pass.

⚙️ Difficulty Level

🟡 Low–Medium — suitable for contributors with Python basics and familiarity with tables or datasets.


📚 Helpful Resources

Example Annotators Repository


🧑‍🤝‍🧑 Maintainer Checklist

Verify that Karchin Lab–selected annotators for checkbox addition are identified.

Ensure associated build scripts are available and documented.

Add labels: good first issue, help wanted, modules, data, python.

Link this issue to the “Annotator Development and Data Updates” milestone.

🎉 Thank you for helping expand the OpenCRAVAT module library and improving the user experience!
Your contribution strengthens our open-source variant annotation community and helps new developers learn by example.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions