Source · r/LocalLLaMA
media r/LocalLLaMA · 2d ago

EU AI Act mandates AI-generated text watermarking from August 2024

The EU AI Act requires all AI systems generating synthetic text to include machine-readable, detectable watermarks using robust, interoperable technical solutions with two layers. This applies to all AI models, including open-source ones, and extends to any service accessible by EU citizens, regardless of location. Non-compliance risks fines of up to 35 million euros or a percentage of annual income, with providers of 'systemic risk' AI models facing heightened liability.

media r/LocalLLaMA · 17h ago

Baidu's Unlimited-OCR Transcribes Dozens of Pages in One Forward Pass

Baidu has released Unlimited-OCR, a model that transcribes dozens of pages in a single forward pass using Reference Sliding Window Attention (R-SWA). It builds on DeepSeek-OCR, inheriting its encoder, image compression, and MoE architecture, with only 500M active parameters per token. The model achieves 93.92% accuracy on OmniDocBench v1.6, outperforming DeepSeek-OCR's 87.01% on v1.5, though vendor-reported results warrant independent validation.

media r/LocalLLaMA · 1d ago

650+ Apache-2.0 biomedical NER/de-ID models run 30-40x faster on Apple Silicon

A new open-source project offers 650+ Apache-2.0 licensed biomedical NER and de-identification models that run on-device via MLX. On a 3-year-old MacBook Pro with M3 Max, clinical NER models achieve 30-40x speedups over PyTorch-CPU with identical fp32 outputs and entity results, due to architectural efficiency on Apple Silicon. The models, including 434M biomedical NER and PII de-ID, are publicly available on Hugging Face and GitHub, with full reproducibility provided in code and methodology.

media r/LocalLLaMA · 1d ago

Tmax-27B Terminal Agent for Small GPUs with DPPO Training

Tmax-27B is a terminal agent based on Qwen3.6-27B, trained with DPPO (RL), achieving 43% on Terminal Bench 2.0 and 69% on TB Lite. To run on consumer GPUs, it is quantized using importance-matrix-calibrated GGUF models from 2 to 5 bits per weight, with a grafted MTP head enabling speculative decoding. IQ2_XS at 8.5 GiB achieves 70% pass rate in agentic coding tasks, outperforming plain quantization and demonstrating stable tool-call generation.

media r/LocalLLaMA · 1d ago

7 Chinese companies shipping H100/H200-class AI chips, most IPO'd in last 6 months

At least seven Chinese companies are now shipping H100/H200-class AI accelerators, with most having gone public within the last six months. Huawei alone shipped 812,000 AI cards last year, accounting for 49% of China's domestic supply, and its Ascend 950 is reportedly targeted at H200-class performance. Several of these firms were founded by former NVIDIA and AMD GPU leaders, including MetaX, which saw revenue grow 3,800x in three years, and Alibaba, which launched a server with 1.5TB of VRAM for on-premises frontier model deployment.

media r/LocalLLaMA · 1d ago

KLD Analysis of KV Cache Quantization for Qwen3.6-35B-A3B and Gemma4-E2B QAT

A detailed analysis maps the KLD (Kullback-Leibler divergence) of KV cache quantization for Qwen3.6-35B-A3B and Gemma4-E2B models. Results show q8/q8 quantization is nearly lossless on both models, while q4/q4 performs well on Qwen but causes severe degradation on Gemma. Turbo quantization variants show mixed performance, with turbo3 and turbo2 enabling extreme cache compression at significant accuracy cost.