Developer releases Kivarro, an all-in-one local inference workbench

A developer has released Kivarro, a source-available desktop application designed to consolidate local large language model inference into a single interface. The tool aims to replace fragmented workflows by combining model management, runtime tuning, and monitoring in one place.

Supports GGUF, safetensors, bin, and MLX file formats with automatic metadata reading.
Provides supervision for llama.cpp/llama-server and an optional mistral.rs backend.
Includes hardware fit planning, memory context visibility, and benchmark views for tokens/sec.
Features a local RAG knowledge-base workbench and OpenAI-compatible API view.
Offers cross-platform builds for Windows, macOS, and Linux across x64 and ARM64 architectures.

The author is seeking feedback from users who run models locally to identify missing workflow elements and determine which backend supports should be prioritized next.