All articles — korshunov.ai

All articles Page 1 / 130

SupraLabs Releases SupraVL-Nano-900k Vision-Language Model

SupraLabs has launched SupraVL-Nano-900k, a fully transparent, 900k-parameter vision-language model trained from scratch on Flickr8k. It features a CNN visual encoder, GPT-2-style decoder, and prefix concatenation fusion, with all components openly documented and designed for educational clarity.

media r/LocalLLaMA · 12d ago

Ohio State University releases open-source Deep Research agent QUEST-35B

Researchers at Ohio State University trained QUEST-35B, a Deep Research agent, using approximately 32 H100 GPUs and 8,000 synthetic samples. They open-sourced the training recipe, code, weights, and datasets, with benchmark results showing competitive performance compared to leading closed-source Deep Research systems.

media r/LocalLLaMA · 12d ago

GLM-5.2 can now run locally in llama.cpp and Unsloth Studio

GLM-5.2, the strongest open model to date, can now run locally using llama.cpp and Unsloth Studio. The 2-bit quantized model retains ~82% accuracy after reducing size from 1.51TB to 238GB, a 84% reduction, and is compatible with 256GB RAM or VRAM setups.

github llama.cpp · 12d ago

ggml-cpu adds K tails support for Power10 MMA Q8/Q4

ggml-cpu now supports K tails in Power10 Q8/Q4 MMA matmul, removing the requirement that K be divisible by kc. This enables more workloads to use the MMA kernel and reduces fallback to mnpack.

media r/LocalLLaMA · 12d ago

Little late thank you to the DeepSeek team!

A user thanked the DeepSeek team for releasing DeepSeek V4 Pro and its Flash version, which fits on local hardware. The post was made seven months after an initial Reddit post.

media r/LocalLLaMA · 13d ago

Guys, Le Chaton Fat is real...

Le Chaton Fat has been requantized in GGUF format and is soon to be available on Hugging Face. Users are advised to install a specific pip command to access the model, including flags like --trust-remote and --just-do-it.

github OpenAI Agents SDK · 13d ago

v0.17.6 Release Notes

The v0.17.6 release adds pre-approval tool input guardrails and SDK-only custom data for tool outputs. It also enforces a strict JSON-compatible contract for tool outputs and suppresses unnecessary whitespace warnings in tool names. @siddiksawani made their first contribution in this release.

media Latent Space · 13d ago

GLM-5.2 Passes Vibe Check, Outperforms GPT-5.5

GLM-5.2 has passed a 'vibe check' as a frontier open model, receiving praise from Jeremy Howard and outperforming GPT-5.5 in Artificial Analysis' new knowledge work benchmark. It also gained validation from the /r/LocalLlama community, indicating strong real-world utility and performance.

media r/LocalLLaMA · 13d ago

How can I self host code review?

A user asks about self-hosting code review tools due to Gemini Code Assist ending consumer support and moving to enterprise only. They are exploring GitHub apps or actions for local or cloud-based solutions.

github llama.cpp · 13d ago

llama.cpp Release b9716 Adds Batching Support for InternVL

llama.cpp version b9716 introduces batching support for InternVL, enhancing model performance through efficient batch processing. The release includes binary builds for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options, including Vulkan, OpenVINO, SYCL, and ROCm.

github llama.cpp · 13d ago

llama.cpp releases b9713 with new binaries and features

llama.cpp has released version b9713, adding batching support to mtmd-cli and video tests. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options, including Vulkan, CUDA, OpenVINO, and SYCL.

github llama.cpp · 13d ago

llama.cpp release b9714 adds X-Accel-Buffering header and new binaries

llama.cpp version b9714 adds the "X-Accel-Buffering": "no" header to streaming endpoints to prevent Nginx from buffering responses, which resolves streaming issues with applications like the Pi coding harness. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options.

arxiv arXiv cs.AI · 13d ago

UFP4: Uniform 4-Bit Training Overcomes Shrinkage Bias in LLM Pretraining

A study identifies shrinkage bias in E2M1-based FP4 formats due to geometric asymmetry, causing multiplicative error accumulation and training instability. The proposed UFP4 recipe uses uniform E1M2/INT4 grids and applies Random Hadamard Transform to all GEMMs, achieving lower loss degradation than E2M1 baselines in large-scale LLM pretraining. The authors recommend E1M2/INT4 as a first-class training primitive for future accelerators.

github llama.cpp · 13d ago

LLaMA.cpp Release b9715 Adds CUDA Col2Im 1D and Multiple Platform Binaries

LLaMA.cpp version b9715 introduces CUDA support for GGML_OP_COL2IM_1D, building on a CPU implementation. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and acceleration frameworks, including Vulkan, ROCm, OpenVINO, and SYCL.

arxiv arXiv cs.AI · 13d ago

DataMagic Turns Tabular Data into Interactive Insight Videos

DataMagic transforms raw tabular data and natural language queries into narrative data-insight videos. It uses DVSpec to ensure data fidelity by linking visual elements to data fields via semantic references, and employs a multi-agent architecture to generate and orchestrate coherent video scenes. The system supports interactive exploration and provenance-based data Q&A, enabling users to engage with data beyond static views.

arxiv arXiv cs.AI · 13d ago

NRT-Bench: Multi-turn Red-teaming of LLM Agents in Safety-Critical Systems

NRT-Bench introduces a benchmark for multi-turn red-teaming of LLM agents operating in a simulated nuclear power plant. Across four frontier operator models, 8.7% to 12.1% of attack sessions result in loss of a critical safety function, with vulnerabilities largely disjoint across models. The effectiveness of defences varies significantly by model, showing strong model dependence.

arxiv arXiv cs.AI · 13d ago

Multi-View Decompilation Improves LLM-Based Malware Classification

A benchmark of benign and malicious binaries compiled and decompiled with Ghidra and RetDec reveals that providing both decompiler views to large language models improves malicious-class F1, primarily by increasing recall. Analysis shows Ghidra and RetDec make distinct errors, indicating their outputs offer complementary evidence for malware classification.

arxiv arXiv cs.AI · 13d ago

Attention-Guided Deep Learning for Interpretable Sperm Morphology Classification

A new deep learning framework combines EfficientNet-B0 with CBAM to improve accuracy and interpretability in sperm morphology classification. Evaluated on SMIDS and HuSHem datasets, it achieves 90.2% and 93.9% accuracy with macro F1 scores of 0.913 and 0.948, outperforming baseline models. Grad-CAM++ visualizations enable transparent feature analysis, supporting clinical adoption in fertility clinics.

arxiv arXiv cs.AI · 13d ago

Repurposing Speech Classifier for Diffusion-Based Generation

A pretrained speech classifier is repurposed as a backbone for guided diffusion-based speech generation. By attaching a lightweight subnetwork and training it under denoising score matching, the approach achieves high speech quality with reduced memory and computational cost, using a single model instead of two separately trained components.

arxiv arXiv cs.AI · 13d ago

Context-Aware Bayesian Model Improves IVF Success Prediction

A hierarchical Bayesian model using 55 context-aware environmental features reduces prediction error to 1.27% in IVF data, compared to 3-5% with raw sensor averages. The model achieves R2 = 0.86 on held-out data and reduces error by 64% for women aged 35-39, showing transferable clinical signal across clinics.