Skip to main content

Architecture Overview

Waldo is a small constellation of services that share a Postgres database, Redis broker, and MinIO object store. Every service runs in its own container.

┌─────────────────┐
│ Browser │
└────────┬────────┘
│ HTTPS
┌────────▼────────┐
│ FastAPI app │ ── REST + WebSocket
│ (uvicorn :8000)│
└─┬──────┬──────┬─┘
writes │ │ │ enqueues
┌─────▼──┐ │ │
│Postgres│ │ ┌──▼────┐
└────────┘ │ │ Redis │
│ └──┬────┘
reads │ │ Celery tasks
┌────────────▼┐ ┌──▼────────────┐
│ MinIO │◄─┤ labeler / │
│ (S3 store) │ │ trainer │
└─────────────┘ │ workers │
└───────────────┘

Services

ServiceImagePurpose
apppython:3.11-slim + uvFastAPI HTTP/WebSocket API. Stateless.
labelernvidia/cuda (GPU) or python:3.11-slim (Apple)Celery worker running SAM 3 / SAM 3.1 inference.
trainernvidia/cuda (GPU) or localCelery worker running YOLO26 training.
postgrespostgres:16-alpinePrimary store for users, projects, jobs, annotations, models.
redisredis:7-alpineCelery broker + WebSocket pubsub + ephemeral cache.
miniominio/minioS3-compatible blob storage for videos, frames, and exported datasets.

Request flow: auto-labeling

  1. Upload — the browser POSTs a video to /api/v1/upload. The app stores it in MinIO and inserts a Video row.
  2. Frame extraction — the app dispatches a Celery task to the labeler. FFmpeg extracts frames at a configurable FPS and writes them back to MinIO.
  3. Labeling jobPOST /api/v1/label creates a LabelingJob row and enqueues SAM 3 inference per frame batch.
  4. Streaming — the labeler publishes detections to a Redis pubsub channel as it goes. The app forwards them over WebSocket so the UI updates live.
  5. Review — the user opens /review/<job>. Annotations are loaded from Postgres, edits PATCH back to the API.
  6. Export — clicking "Export" generates a YOLO-format dataset (images + label txt files) into MinIO. The download endpoint streams it back to the browser.

Tech choices

ConcernChoiceWhy
APIFastAPIAsync, type-checked, generates OpenAPI for free
ORMSQLAlchemy 2.xMature, async-friendly, alembic migrations
Task queueCelery + RedisBattle-tested for long-running ML jobs
Object storeMinIOS3-compatible, runs anywhere, no vendor lock-in
Detection modelYOLO26 (Ultralytics)Fast, accurate, easy to fine-tune
Segmentation modelSAM 3 / SAM 3.1Best-in-class video segmentation; MLX path on Apple Silicon
UIReact + Vite + TailwindFast HMR, modern hooks, zero config
AuthJWT bearer + API keysStateless, multi-tenant via workspaces

See Data Model for the schema and Security for the trust model.