The HTTP Service and Where It Fits
horde-model-reference is three things in one package: a set of JSON reference files, a Python
library, and a FastAPI web service. This page explains what the HTTP service is for, who
calls it, and how it relates to the library and the wider AI-Horde
ecosystem. If you just want to start calling endpoints, jump to the
Using the HTTP API tutorial.
The public deployment lives at https://models.aihorde.net/api with interactive docs at
/api/docs.
What the service does
The service exposes the curated model-reference data - download URLs, checksums, baselines, NSFW flags, capabilities - over HTTP, plus search, statistics, and a moderated workflow for proposing changes. Reads are open to everyone; writes are gated (PRIMARY deployments only, authenticated, and routed through a review queue - see below).
Where it fits in the ecosystem
graph TD
subgraph Authoritative
PRIMARY["AI-Horde backend
(PRIMARY mode)
models.aihorde.net/api"]
end
subgraph Sources
GH["GitHub legacy repos
(Haidra-Org/AI-Horde-*-model-reference)"]
end
subgraph Consumers
WORKER["Workers"]
CLIENT["Clients / UIs"]
TOOLS["Scripts / dashboards"]
end
GH -. seeds / fallback .-> PRIMARY
PRIMARY -->|HTTP reads| WORKER
PRIMARY -->|HTTP reads| CLIENT
PRIMARY -->|HTTP reads| TOOLS
GH -. direct fallback .-> WORKER
WORKER -. uses library .-> LIB["horde-model-reference
(REPLICA mode)"]
CLIENT -. uses library .-> LIB
LIB -->|HTTPBackend| PRIMARY
There are three consumer roles, and the same service serves all of them:
- The AI-Horde backend runs this service in PRIMARY mode as the canonical source of truth.
It owns the data, accepts moderated writes, and serves everyone else. This is the deployment at
models.aihorde.net. - Workers (GPU nodes that fulfil generation requests) need to know which models exist, their download URLs, and their checksums. A worker typically consumes the data through the Python library in REPLICA mode, which fetches from the PRIMARY API and falls back to raw GitHub if the PRIMARY is unreachable. A worker can also call the HTTP API directly.
- Clients, UIs, and tooling (model browsers, admin dashboards, automation) call the HTTP API directly to list, search, and rank models, or to propose changes.
The service and the Python library serve the same data
The FastAPI service and the ModelReferenceManager Python API are two front-ends over the same
backend. When you run the library in REPLICA mode with a primary_api_url, its HTTPBackend
calls the same endpoints documented here:
Library call (REPLICA HTTPBackend) |
HTTP endpoint it requests |
|---|---|
| fetch a category (v2) | GET {primary_api_url}/model_references/v2/{category} |
| fetch a category (legacy fallback) | GET {primary_api_url}/model_references/v1/{category} |
So "consume the API from a worker" and "use the library in REPLICA mode" are the same operation at different layers. See Consume the HTTP API for both styles, and Offline & Resilient Reads for the fallback chain.
Two API versions: v1 (legacy) and v2 (current)
| Version | Path prefix | Format | Use it for |
|---|---|---|---|
| v2 | /api/model_references/v2 |
Current normalized format, with search, per-model retrieval, statistics, and typed per-category schemas | New integrations |
| v1 | /api/model_references/v1 |
Legacy GitHub-compatible format | Existing AI-Horde workers that already read the legacy JSON shape |
Both versions are readable on any deployment. Which version accepts writes depends on the
deployment's canonical_format setting - see Canonical Format. v1's outward
shape is frozen for backward compatibility; v2 is where new capabilities land.
Reads are open, writes are reviewed
Read endpoints are unauthenticated and safe to cache. Write endpoints (create / update / delete a model) are different in three ways:
- They require a PRIMARY deployment. A REPLICA instance returns
503for writes. - They require an
apikeybelonging to an allow-listed requestor. - They are not applied immediately. A successful write returns
202 Acceptedand enters a pending queue: a separate approver reviews the change, then it is applied to the live dataset (propose -> approve -> apply). This two-person workflow protects the canonical data that the whole network depends on.
See Submit Models via the API for the end-to-end write walkthrough, and the Request Lifecycle for how a request flows through the service internally.
Discovering a deployment's capabilities
Two unauthenticated endpoints let a client adapt to whatever deployment it is talking to:
GET /api/replicate_mode->{ "replicate_mode", "canonical_format", "writable" }. Call this on startup to learn whether writes are accepted and which API version handles them.GET /api/heartbeat-> service status plus the health of the upstream AI-Horde API (used by statistics/popularity endpoints).
Where to go next
- Using the HTTP API - hands-on first calls (curl + Python).
- HTTP API Conventions - base URL, auth, errors, pagination.
- v2 Endpoints - the full current-format reference.
- Pending Queue - the review workflow behind every write.