Skip to content

marin-community/data_browser

Repository files navigation

Marin Data Browser

A web UI for browsing Marin pipeline outputs — JSONL, Parquet, JSON, and text files on local disk, GCS, or S3.

Live instance: https://marin.community/data-browser/

Prerequisites

  • Python 3.12+
  • Node.js 20+
  • uv

Installation

uv sync
npm install

Configuration

Configuration files in conf/ specify which paths the browser can access:

  • conf/local.conf — local files (e.g. ../local_store), port 5050
  • conf/gcp.conf — GCP Storage buckets (requires Google Cloud credentials)
  • conf/docker.conf — Docker container paths

Example (conf/local.conf):

root_paths:
  - ../local_store
port: 5050

Development

One-command dev loop (recommended)

uv run python run-dev.py --config conf/local.conf

This starts the Flask backend on the configured port and the React dev server on port 3000. The React server proxies /api/... calls to the backend. Browse at http://localhost:3000.

Press Ctrl+C to stop both. Pass --backend-only to skip the frontend.

Manual control

Terminal 1 (backend):

DEV=true uv run python server.py --config conf/local.conf

Terminal 2 (frontend):

npm start

API-only testing

DEV=true uv run python server.py --config conf/local.conf

Access: http://localhost:5050/api/view?path=local_store

API Endpoints

Endpoint Description
GET /api/config Server configuration (root paths, limits)
GET /api/view?path=PATH&offset=0&count=5 Browse files and directories
GET /api/download?path=PATH Download a file

Docker

Development

docker compose up

Production build

docker build -f Dockerfile.prod -t marin-data-browser .

Deployment (Google Cloud Run)

Create the service account key:

gcloud iam service-accounts keys create gcs-key.json \
  --iam-account=marin-data-browser@hai-gcp-models.iam.gserviceaccount.com

Deploy:

./deploy.sh

Check logs:

gcloud run services logs read marin-data-browser \
  --project=hai-gcp-models --platform=managed --region=us-central1

Delete the service:

gcloud run services delete marin-data-browser \
  --project=hai-gcp-models --region=us-central1

License

Apache 2.0 — see LICENSE.

About

Web UI for browsing Marin pipeline outputs (JSONL, Parquet, experiments)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors