-
Notifications
You must be signed in to change notification settings - Fork 41
Description
👋 Welcome to OpenCRAVAT!
This issue is perfect for contributors who want to learn how to build annotators, work with public genomic datasets, and help improve the OpenCRAVAT module ecosystem.
📝 Summary
Package a small, static public dataset (such as a subset of dbSNP or an educational gene dataset) into a minimal OpenCRAVAT annotator.
Optionally, help identify columns that could benefit from categorical filters (checkboxes) in the UI.
This will improve usability and provide an example for others learning to create new annotators.
💡 Description
OpenCRAVAT’s power lies in its extensible annotator framework, where each annotator wraps a dataset or algorithm to provide variant-level information.
We want to provide new contributors with a reproducible, lightweight example that demonstrates:
- How to package and register a small dataset as an annotator.
- How to add categorical filters (checkboxes) to an existing annotator’s configuration.
- How to ensure build scripts and data update logic are clear and accessible for maintainers.
The Karchin Lab team will identify one or two annotators that could benefit from categorical filters and make their build scripts available for contributors to explore.
🧭 Steps to Complete
Part 1 – Create a Minimal Annotator
- Select or receive a small, public dataset (e.g., dbSNP subset, 1000 Genomes region, or an educational dataset).
- Use the OpenCRAVAT CLI to create a new annotator:
oc new annotator example_dataset
- Place the dataset (as .csv, .tsv, or .sqlite) into the annotator’s data/ directory.
Edit info.yml to include:
title: Example Dataset Annotator
version: 1.0.0
description: "Annotator wrapping a small public dataset for demonstration."
Implement a simple lookup in init.py that retrieves data based on chromosome and position.
Test locally using:
oc run example_input.tsv -a example_dataset
Part 2 – Add Categorical Value Checkboxes
Review one of the Karchin Lab–suggested annotators that may benefit from checkbox filters (e.g., dbnsfp, gnomad, or a population frequency dataset).
Inspect its schema and identify columns that contain categorical values (e.g., population name, variant consequence, functional class).
Suggest 1–2 columns where checkbox filters could improve UI usability.
If confident, edit the info.yml to define checkbox options in the output_columns metadata:
output_columns:
- name: consequence
type: string
categories: [missense, nonsense, synonymous]
Rebuild and test the annotator locally to confirm the new filters appear in the web viewer.
Part 3 – Ensure Build Scripts are Available
Confirm that the build scripts for existing or related annotators are:
Present in the repository (scripts/ or build/ directory).
Properly documented in the README or comments.
If missing, coordinate with the OC team to make them available for future contributors.
✅ Acceptance Criteria
A minimal example annotator is created, installed, and runs successfully.
The annotator’s metadata and schema follow OpenCRAVAT conventions.
Optional: Categorical filter suggestions are documented or implemented.
The build script(s) for one or more existing annotators are confirmed to be available.
Pull Request includes documentation updates (e.g., in module README).
All CI checks and tests pass.
⚙️ Difficulty Level
🟡 Low–Medium — suitable for contributors with Python basics and familiarity with tables or datasets.
📚 Helpful Resources
🧑🤝🧑 Maintainer Checklist
Verify that Karchin Lab–selected annotators for checkbox addition are identified.
Ensure associated build scripts are available and documented.
Add labels: good first issue, help wanted, modules, data, python.
Link this issue to the “Annotator Development and Data Updates” milestone.
🎉 Thank you for helping expand the OpenCRAVAT module library and improving the user experience!
Your contribution strengthens our open-source variant annotation community and helps new developers learn by example.