Read resiliently (caching, fallback, and being offline)

Goal: understand what happens when the network is slow or down, and write read code that degrades gracefully instead of crashing.

What a read actually does

By default the library runs in REPLICA mode - a read-only consumer (the opposite is PRIMARY, the authoritative server that owns the data). The first access for a category fetches JSON from the PRIMARY server (the service hosting the data, models.aihorde.net), and if that is unreachable it falls back to GitHub - the original model-reference repositories, which the library reads and converts on the fly. The result is validated into Pydantic records and cached in memory with a TTL (default 60 s). Subsequent reads inside the TTL are served from cache with no network call.

So a single cold read can succeed even if one source is down, and warm reads don't touch the network at all. (New to these terms? See the Glossary.)

See Architecture Overview for the full backend/caching picture.

Don't crash on missing data

Every read has two variants:

get_model_reference(category) / get_model(category, name) - raise if the data can't be produced.
get_model_reference_or_none(...) / get_model_or_none(...) - return None instead.

For resilient code, prefer the _or_none variants and handle the empty case:

from horde_model_reference import ModelReferenceManager

manager = ModelReferenceManager()

models = manager.get_model_reference_or_none("image_generation")
if not models:
    # Network down on a cold cache, or category genuinely empty.
    print("Model reference unavailable right now; using last-known/default set.")
    models = {}

Force a refresh (and when not to)

Reads serve from cache until the TTL expires. To bypass the cache and re-fetch immediately, pass overwrite_existing=True:

fresh = manager.get_model_reference("image_generation", overwrite_existing=True)

Use this sparingly - on a long-running service, prefer letting the TTL handle freshness so you don't add latency or load on every call.

Tune freshness vs. resilience

Two environment variables matter most for consumers:

Variable	Effect
`HORDE_MODEL_REFERENCE_CACHE_TTL_SECONDS`	How long cached data is reused before a refetch (default `60`).
`HORDE_MODEL_REFERENCE_ENABLE_GITHUB_FALLBACK`	Whether to fall back to GitHub when PRIMARY fails (default `True`).
`HORDE_MODEL_REFERENCE_PRIMARY_API_URL`	Override the PRIMARY server, or set to skip and use GitHub only.

A longer TTL means fewer network calls and better resilience to transient outages, at the cost of staleness. See Configuration & Troubleshooting for the complete list and debugging tips.

Read resiliently (caching, fallback, and being offline)

What a read actually does

Don't crash on missing data

Force a refresh (and when not to)

Tune freshness vs. resilience

Next