Skip to content

On-device AI

Shrike’s AI features run entirely on-device. Inference uses an embedded llama.cpp (via the llama-cpp-2 crate), GPU-accelerated through Metal on Apple Silicon. No prompt and no mail ever leaves your machine — there’s no API key to add and no server to call.

The first AI feature is Thread → to-do (t): the model reads a conversation and proposes a one-line action plus a due date, which pre-fills the to-do editor for one-key confirmation. The AI boundary sits behind a clean AiEngine trait, mirroring the mail provider, so more features can plug into the same on-device engine over time.

Shrike ships a built-in model picker rather than one hardcoded model. It offers a small tiered registry of GGUF models and uses a RAM-fit heuristic to recommend one your Mac can run comfortably.

  • The recommended default is a compact instruct model that’s quick on Apple Silicon and fits machines with modest memory.
  • Pick a model and Shrike streams the download with a progress indicator, caches it under the app’s data directory, and reuses it thereafter.
  • Your choice is saved in preferences, and you can switch models later — the engine is hot-swappable by model id.

Manage all of this from the Settings window (⌘,) under AI Models.

The model loads only when an AI feature is first invoked, on a background thread. Launch stays instant, and if you never use an AI feature you never pay the cost — no model is downloaded until you ask for one.

Because everything runs locally, on-device AI is a privacy feature, not a privacy cost. The contents of your inbox are never sent to a model provider, including Shrike’s authors. See Privacy & security.