Automatic Grounding Data Engine

Automatically generate phrase grounding annotations from image captions using spaCy and Rex-Omni.

🎯 What It Does

Extract phrases from captions using spaCy's noun phrase chunking
Ground phrases using Rex-Omni's detection capabilities
Generate annotations with character-level spans and bounding boxes
Save to JSONL format for training grounding models

📊 Visualization

Top: Groudning visualization results | bottom: Inpu caption

📋 Requirements

# Install Spacy
pip install spacy
python -m spacy download en_core_web_sm

🚀 Usage

Basic Example

CUDA_VISIBLE_DEVICES=0 python applications/_2_automatic_grounding_data_engine/gd_data_engine.py \
    --images tutorials/detection_example/test_images/cafe.jpg \
    --captions "The image shows a modern, vibrant office space with people interacting in a collaborative environment. There are a mix of individuals sitting, working, and socializing around a stylish lounge area with large yellow flowers on a coffee table. The room features a sleek kitchen area with bright yellow pendant lights, and there's a creative neon artwork of a robot on the wall. The space is filled with natural light, and the people seem to be in a relaxed, casual atmosphere, likely enjoying a break or having informal discussions. The design is contemporary, with a cozy and productive vibe." \
    --output-jsonl my_grounding_data.jsonl  \
    --output-dir applications/_2_automatic_grounding_data_engine

📊 Output Format

JSONL Annotations

Each line in the output JSONL file contains:

{
  "image": "path/to/image.jpg",
  "caption": "A cat sitting on a table with a laptop",
  "width": 1920,
  "height": 1080,
  "phrases": [
    {
      "phrase": "a cat",
      "start_char": 0,
      "end_char": 5,
      "boxes": [
        {
          "bbox": [100, 150, 300, 400],
        }
      ]
    },
    {
      "phrase": "a table",
      "start_char": 17,
      "end_char": 24,
      "boxes": [
        {
          "bbox": [50, 500, 800, 900],
        }
      ]
    }
  ]
}

🔧 Options

Argument	Description	Default
`--images`	List of image paths	None
`--captions`	List of captions (must match images)	None
`--input-json`	Input JSON file with image-caption pairs	None
`--rex-model`	Rex-Omni model path	`IDEA-Research/Rex-Omni`
`--backend`	Rex-Omni backend: `transformers`, `vllm`	`transformers`
`--output-jsonl`	Output JSONL file path	`grounding_annotations.jsonl`
`--output-dir`	Output directory for visualizations	`grounding_visualizations`
`--no-visualize`	Don't save visualizations	`False`

📝 Python API

from applications._2_automatic_grounding_data_engine.gd_data_engine import batch_process_images

# Prepare your data
image_caption_pairs = [
    ("image1.jpg", "A cat on a table"),
    ("image2.jpg", "A dog in the park"),
]

# Generate grounding annotations
batch_process_images(
    image_caption_pairs=image_caption_pairs,
    rex_model_path="IDEA-Research/Rex-Omni",
    output_jsonl="my_annotations.jsonl",
    output_dir="my_visualizations",
    visualize=True
)

🎓 How It Works

Step 1: Phrase Extraction

import spacy
nlp = spacy.load("en_core_web_sm")

caption = "A cat sitting on a table"
doc = nlp(caption)

# Extract noun phrases
phrases = [(chunk.text, chunk.start_char, chunk.end_char) 
           for chunk in doc.noun_chunks]
# Result: [("A cat", 0, 5), ("a table", 17, 24)]

Step 2: Phrase Grounding

from rex_omni import RexOmniWrapper

rex_model = RexOmniWrapper(model_path="IDEA-Research/Rex-Omni")
results = rex_model.inference(
    images=image,
    task="detection",
    categories=["a cat", "a table"]
)

Step 3: Annotation Generation

Combines character spans from spaCy with bounding boxes from Rex-Omni to create structured annotations.

📚 References

Rex-Omni: GitHub
spaCy: Website
Phrase Grounding: GLIP, MDETR, GroundingDINO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic Grounding Data Engine

🎯 What It Does

📊 Visualization

📋 Requirements

🚀 Usage

Basic Example

📊 Output Format

JSONL Annotations

🔧 Options

📝 Python API

🎓 How It Works

Step 1: Phrase Extraction

Step 2: Phrase Grounding

Step 3: Annotation Generation

📚 References

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Automatic Grounding Data Engine

🎯 What It Does

📊 Visualization

📋 Requirements

🚀 Usage

Basic Example

📊 Output Format

JSONL Annotations

🔧 Options

📝 Python API

🎓 How It Works

Step 1: Phrase Extraction

Step 2: Phrase Grounding

Step 3: Annotation Generation

📚 References