Topic · Code generation
lab Mistral AI News · 2d ago

Mistral Releases OCR 4 with Multilingual Support and Structured Output

Mistral OCR 4 introduces bounding boxes, block classification, and inline confidence scores for 170 languages across 10 language groups. It outperforms leading OCR systems in human preference evaluations with a 72% win rate and achieves the top score on OlmOCRBench (85.20), while offering self-hosted deployment in a single container and supporting enterprise use cases like RAG and document ingestion.

media Hugging Face Forums · 2d ago

Buddy System: Rust entropy monitor with NER-gated uncertainty for tiered LLM inference

The Buddy System uses a Rust entropy monitor to detect per-token uncertainty in local Gemma 3 4B inference, routing only uncertain tokens to Sonnet via NER-gated span extraction and semantic retrieval. Benchmarks show it achieves 71.4% accuracy at $0.21, outperforming the Anthropic Advisor pattern (62.9% at $0.44) across seven Hugging Face datasets, with a key improvement on SQuAD v2 by routing source passage chunks to the cloud model.

arxiv arXiv cs.CL · 2d ago

Latent Personal Memory: Dynamic Soft Prompts for LLM Personalization

Latent Personal Memory (LPM) represents user-specific memories as a compact, persistent matrix of N latent slots. These slots are mapped via a shared cross-attention network into dynamic, input-conditioned soft prompts that are prepended to a frozen LLM. LPM outperforms LoRA and Prompt Tuning by up to 8.8% and 54.4% on PersonaMem v1, reduces KV-cache usage by over 64x, matches LoRA accuracy on LoCoMo with 120x fewer parameters, and scales efficiently with context length, outperforming full-context at 128K tokens.