Skip to main content

Data Model

The schema is defined in lib/db.py. Every table uses UUID primary keys.

Multi-tenancy

Workspace ──┬── WorkspaceMember ── User
└── Project ── Video ── Frame ── Annotation
└── LabelingJob ─────┘
└── TrainingRun ── ModelRegistry

A Workspace is the unit of isolation. Every project, video, frame, and annotation belongs to exactly one workspace via its parent project. Membership is granted by WorkspaceMember rows with a role (admin, editor, annotator, viewer).

Core tables

users

Email + bcrypt password hash + optional display_name. JWT sub claim is the user UUID.

api_keys

Long-lived credentials prefixed wld_. Stored as (key_prefix, key_hash) so lookups stay fast and the raw key is irretrievable.

workspaces / workspace_members

Tenant boundary + RBAC. All resource-access checks resolve to "is this user a member of the parent workspace?"

projects

A bucket of related videos. Belongs to a workspace.

videos

A single uploaded video file. Tracks MinIO key, codec, fps, duration, and the project it was uploaded to.

frames

Extracted still images. Indexed by (video_id, frame_number).

labeling_jobs

A run of SAM 3 against a video (or set of frames) with a particular prompt and threshold. Status flows: pending → running → done (or failed).

annotations

The output of labeling jobs and human edits. One row per object instance per frame:

  • frame_id, job_id
  • class_name, confidence
  • bbox (xyxy normalized)
  • mask (optional, RLE-encoded)
  • track_id (for multi-frame instance tracking)
  • accepted_by_user_id (set when a human reviews and confirms)

training_runs

A YOLO26 fine-tune. References the dataset slice and produces a ModelRegistry row on completion.

model_registry

Versioned model artifacts. Each row points to a MinIO key for weights and tracks alias (e.g. production, staging), mAP50, training metadata.

deployment_targets / edge_devices

Optional resources for pushing models to remote inference endpoints or edge hardware. Currently exposed only via API; UI integration is in progress.

Indexes

The schema ships with single-column indexes on the obvious foreign keys. The performance audit recommends adding these composites:

CREATE INDEX idx_labeling_job_project_status
ON labeling_jobs(project_id, status);

CREATE INDEX idx_annotation_job_frame
ON annotations(job_id, frame_id);

Add them via Alembic migration when you hit the scan threshold.