🖼️ gem-cap-chan

gem-cap-chan is a utility for batch captioning images with natural language using OpenAPI-compatible multimodal models like Gemma3. Designed for creating high-quality datasets for Stable Diffusion and LoRA training.

Features

API Flexibility: Works with any OpenAPI-compatible endpoint (local or cloud-based)
Batch Processing: Recursively process entire directories of training images
Optimized Captions: Default prompt tuned for Stable Diffusion/LoRA training
Smart Image Handling: Automatic resizing and format conversion
Progress Tracking: Real-time progress with ETA and performance metrics
Failure Recovery: Automatic retries with error skipping
Security: Token authentication for remote endpoints

Requirements

Python 3.7+
Pillow
Requests
OpenAPI-compatible multimodal endpoint (e.g., llama.cpp with mmproj support)

Installation

Clone the repository:
git clone https://github.com/2dameneko/gem-cap-chan
Install dependencies (if your system does not have these components installed by default):
```
pip install Pillow requests
```

Usage

Start your multimodal API server (example for llama.cpp):

llama-server --model "gemma3-27b.Q4_K_M.gguf" \
             --mmproj "gemma3-27b-mmproj.gguf" \
             --host 0.0.0.0 --port 5000

Run captioning:

python gem-cap-chan.py /path/to/training_images

Captions will be saved as .txt files in the output directory

Options

Run without arguments for default behavior. Available CLI options (python gem-cap-chan.py -h):

Argument	Description
`input_dir`	Directory containing images to caption (required)
`--api_base`	API base URL (default: `http://localhost:5000`)
`--api_token`	Authentication token for secure/remote endpoints
`--output_dir`	Output directory for caption files (default: same as input_dir)
`--max_size`	Max image dimension for resizing (pixels, default: `1024`)

Customizing Captions

Modify the DEFAULT_PROMPT variable in the script for different caption styles.

Supported File Formats

.jpg, .png, .webp, .jpeg, .bmp

Version History

0.1: Initial release with local endpoint support

Note

This project is a proof of concept and not production-ready

License

Apache License 2.0

Credits

OpenAPI Specification: OpenAI
llama.cpp: ggerganov/llama.cpp
Gemma3: Google DeepMind
Pillow: Python Imaging Library

Model Implementation Credits
Gemma3 27b · Gemma3 27b DPO Abliterated

Thank you for your interest in gem-cap-chan!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
gem-cap-chan.py		gem-cap-chan.py
start.bat		start.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🖼️ gem-cap-chan

Features

Requirements

Installation

Usage

Options

Customizing Captions

Supported File Formats

Version History

Note

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🖼️ gem-cap-chan

Features

Requirements

Installation

Usage

Options

Customizing Captions

Supported File Formats

Version History

Note

License

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages