Package a public dataset as an annotator and explore adding categorical value checkboxes

👋 **Welcome to OpenCRAVAT!**  
This issue is perfect for contributors who want to learn how to build annotators, work with public genomic datasets, and help improve the OpenCRAVAT module ecosystem.

---

## 📝 Summary
Package a small, static **public dataset** (such as a subset of dbSNP or an educational gene dataset) into a minimal OpenCRAVAT annotator.  
Optionally, help identify columns that could benefit from **categorical filters** (checkboxes) in the UI.

This will improve usability and provide an example for others learning to create new annotators.

---

## 💡 Description
OpenCRAVAT’s power lies in its extensible **annotator framework**, where each annotator wraps a dataset or algorithm to provide variant-level information.  
We want to provide new contributors with a reproducible, lightweight example that demonstrates:

1. How to **package and register a small dataset** as an annotator.  
2. How to add **categorical filters** (checkboxes) to an existing annotator’s configuration.  
3. How to ensure **build scripts and data update logic** are clear and accessible for maintainers.

The **Karchin Lab team** will identify one or two annotators that could benefit from categorical filters and make their build scripts available for contributors to explore.

---

## 🧭 Steps to Complete

### Part 1 – Create a Minimal Annotator
1. Select or receive a small, public dataset (e.g., dbSNP subset, 1000 Genomes region, or an educational dataset).  
2. Use the OpenCRAVAT CLI to create a new annotator:
   ```bash
   oc new annotator example_dataset
3. Place the dataset (as .csv, .tsv, or .sqlite) into the annotator’s data/ directory.

Edit info.yml to include:

title: Example Dataset Annotator
version: 1.0.0
description: "Annotator wrapping a small public dataset for demonstration."


Implement a simple lookup in __init__.py that retrieves data based on chromosome and position.

Test locally using:

oc run example_input.tsv -a example_dataset
---
Part 2 – Add Categorical Value Checkboxes

Review one of the Karchin Lab–suggested annotators that may benefit from checkbox filters (e.g., dbnsfp, gnomad, or a population frequency dataset).

Inspect its schema and identify columns that contain categorical values (e.g., population name, variant consequence, functional class).

Suggest 1–2 columns where checkbox filters could improve UI usability.

If confident, edit the info.yml to define checkbox options in the output_columns metadata:

output_columns:
  - name: consequence
    type: string
    categories: [missense, nonsense, synonymous]


Rebuild and test the annotator locally to confirm the new filters appear in the web viewer.
---
Part 3 – Ensure Build Scripts are Available

Confirm that the build scripts for existing or related annotators are:

Present in the repository (scripts/ or build/ directory).

Properly documented in the README or comments.

If missing, coordinate with the OC team to make them available for future contributors.
---
✅ Acceptance Criteria

 A minimal example annotator is created, installed, and runs successfully.

 The annotator’s metadata and schema follow OpenCRAVAT conventions.

 Optional: Categorical filter suggestions are documented or implemented.

 The build script(s) for one or more existing annotators are confirmed to be available.

 Pull Request includes documentation updates (e.g., in module README).

 All CI checks and tests pass.
---
⚙️ Difficulty Level

🟡 Low–Medium — suitable for contributors with Python basics and familiarity with tables or datasets.

---
📚 Helpful Resources


[Example Annotators Repository](https://github.com/KarchinLab/open-cravat-modules)

---
🧑‍🤝‍🧑 Maintainer Checklist

 Verify that Karchin Lab–selected annotators for checkbox addition are identified.

 Ensure associated build scripts are available and documented.

 Add labels: good first issue, help wanted, modules, data, python.

 Link this issue to the “Annotator Development and Data Updates” milestone.

🎉 Thank you for helping expand the OpenCRAVAT module library and improving the user experience!
Your contribution strengthens our open-source variant annotation community and helps new developers learn by example.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Package a public dataset as an annotator and explore adding categorical value checkboxes #372

📝 Summary

💡 Description

🧭 Steps to Complete

Part 1 – Create a Minimal Annotator

oc run example_input.tsv -a example_dataset

Rebuild and test the annotator locally to confirm the new filters appear in the web viewer.

If missing, coordinate with the OC team to make them available for future contributors.

All CI checks and tests pass.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Package a public dataset as an annotator and explore adding categorical value checkboxes #372

Description

📝 Summary

💡 Description

🧭 Steps to Complete

Part 1 – Create a Minimal Annotator

oc run example_input.tsv -a example_dataset

Rebuild and test the annotator locally to confirm the new filters appear in the web viewer.

If missing, coordinate with the OC team to make them available for future contributors.

All CI checks and tests pass.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions