All articles — korshunov.ai — ML news

All articles Page 1 / 130

media r/LocalLLaMA · 12d ago

SupraLabs Releases SupraVL-Nano-900k Vision-Language Model

SupraLabs has launched SupraVL-Nano-900k, a fully transparent, 900k-parameter vision-language model trained from scratch on Flickr8k. It features a CNN visual encoder, GPT-2-style decoder, and prefix concatenation fusion, with all components openly documented and designed for educational clarity.

github llama.cpp · 13d ago

llama.cpp release b9714 adds X-Accel-Buffering header and new binaries

llama.cpp version b9714 adds the "X-Accel-Buffering": "no" header to streaming endpoints to prevent Nginx from buffering responses, which resolves streaming issues with applications like the Pi coding harness. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options.

arxiv arXiv cs.AI · 13d ago

UFP4: Uniform 4-Bit Training Overcomes Shrinkage Bias in LLM Pretraining

A study identifies shrinkage bias in E2M1-based FP4 formats due to geometric asymmetry, causing multiplicative error accumulation and training instability. The proposed UFP4 recipe uses uniform E1M2/INT4 grids and applies Random Hadamard Transform to all GEMMs, achieving lower loss degradation than E2M1 baselines in large-scale LLM pretraining. The authors recommend E1M2/INT4 as a first-class training primitive for future accelerators.

github llama.cpp · 13d ago

LLaMA.cpp Release b9715 Adds CUDA Col2Im 1D and Multiple Platform Binaries

LLaMA.cpp version b9715 introduces CUDA support for GGML_OP_COL2IM_1D, building on a CPU implementation. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and acceleration frameworks, including Vulkan, ROCm, OpenVINO, and SYCL.

arxiv arXiv cs.AI · 13d ago

DataMagic Turns Tabular Data into Interactive Insight Videos

DataMagic transforms raw tabular data and natural language queries into narrative data-insight videos. It uses DVSpec to ensure data fidelity by linking visual elements to data fields via semantic references, and employs a multi-agent architecture to generate and orchestrate coherent video scenes. The system supports interactive exploration and provenance-based data Q&A, enabling users to engage with data beyond static views.

arxiv arXiv cs.AI · 13d ago

NRT-Bench: Multi-turn Red-teaming of LLM Agents in Safety-Critical Systems

NRT-Bench introduces a benchmark for multi-turn red-teaming of LLM agents operating in a simulated nuclear power plant. Across four frontier operator models, 8.7% to 12.1% of attack sessions result in loss of a critical safety function, with vulnerabilities largely disjoint across models. The effectiveness of defences varies significantly by model, showing strong model dependence.

arxiv arXiv cs.AI · 13d ago

Multi-View Decompilation Improves LLM-Based Malware Classification

A benchmark of benign and malicious binaries compiled and decompiled with Ghidra and RetDec reveals that providing both decompiler views to large language models improves malicious-class F1, primarily by increasing recall. Analysis shows Ghidra and RetDec make distinct errors, indicating their outputs offer complementary evidence for malware classification.

arxiv arXiv cs.AI · 13d ago

Attention-Guided Deep Learning for Interpretable Sperm Morphology Classification

A new deep learning framework combines EfficientNet-B0 with CBAM to improve accuracy and interpretability in sperm morphology classification. Evaluated on SMIDS and HuSHem datasets, it achieves 90.2% and 93.9% accuracy with macro F1 scores of 0.913 and 0.948, outperforming baseline models. Grad-CAM++ visualizations enable transparent feature analysis, supporting clinical adoption in fertility clinics.

arxiv arXiv cs.AI · 13d ago

Repurposing Speech Classifier for Diffusion-Based Generation

A pretrained speech classifier is repurposed as a backbone for guided diffusion-based speech generation. By attaching a lightweight subnetwork and training it under denoising score matching, the approach achieves high speech quality with reduced memory and computational cost, using a single model instead of two separately trained components.

arxiv arXiv cs.AI · 13d ago

Context-Aware Bayesian Model Improves IVF Success Prediction

A hierarchical Bayesian model using 55 context-aware environmental features reduces prediction error to 1.27% in IVF data, compared to 3-5% with raw sensor averages. The model achieves R2 = 0.86 on held-out data and reduces error by 64% for women aged 35-39, showing transferable clinical signal across clinics.

arxiv arXiv cs.AI · 13d ago

Defensive Misdirection Against Automated Attacks on Agentic AI

Agentic AI systems face growing threats from model-guided automated attacks. A new defense strategy, Contextual Misdirection via Progressive Engagement (CMPE), reduces attacker success rates by up to two orders of magnitude and nearly eliminates verified attack success in benchmark tests.

arxiv arXiv cs.AI · 13d ago

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

UltraQuant enables 4-bit KV caching for context-heavy agents, reducing P50 time-to-first-token by 3.47x in late rounds and boosting output throughput by 1.63x over FP8 KV baseline. It achieves this using FP8 queries, FP4 KV tensors, UE8M0 group scales, and native scaled-MFMA on AMD CDNA4 GPUs, with optimizations for decode-attention kernels and robust design choices like asymmetric K/V treatment and Walsh-Hadamard rotation.

arxiv arXiv cs.AI · 13d ago

Optimal Order in Multi-Agent Systems Framework

A new framework analyzes multi-agent systems by modeling agent influence and response functions. It derives macroscopic properties like power, entropy, and order, and identifies an optimal level of synchronization that balances productivity, stability, and adaptability. The study shows that order and system properties are task-dependent and context-relative.

arxiv arXiv cs.AI · 13d ago

Evaluator Bias Propagation in Multi-Agent LLM Systems

Contagion Networks introduces a framework to measure how evaluator biases spread among LLM agents. In a 3-agent experiment, biases propagated consistently with contagion coefficients between 0.157 and 0.352, and homogeneous-model agents showed significantly weaker contagion than cross-model setups. Increasing evaluator committee size from k=1 to k=3 reduced effective contagion by 72.4%.

arxiv arXiv cs.AI · 13d ago

Calibration Without Comprehension in LLM Vulnerability Detection

CWE-Trace evaluates eight vanilla and 15 LoRA-fine-tuned LLMs on Linux kernel vulnerability detection. Results show data contamination offers no advantage, and fine-tuning only shifts output thresholds without altering decision policies. Despite improved detection scores, LLMs lack reliable security reasoning, with top-1 CWE accuracy below 1.3% and binary detection performance at 52.1%.

arxiv arXiv cs.AI · 13d ago

FreeStyle: Scalable Dual-Reference Generation via Community LoRA Mining

FreeStyle proposes a framework that mines community LoRAs to generate large-scale style-content dual-reference image triplets. It employs a two-stage curriculum with disentanglement mechanisms to suppress style leakage and introduces a benchmark with style-invariant and VLM-based scores to evaluate content preservation and leakage rejection.

arxiv arXiv cs.AI · 13d ago

How Safety-Aligned LLMs Interpret Mixed Compliance Demonstrations

Studies show benign and harmful compliance demonstrations are not interchangeable in LLMs. Benign demonstrations can either reduce or increase harmful compliance depending on the model, with preference optimization playing a key role in preventing harmful compliance. Demonstration ordering shows strong recency bias, and models vary in how they handle refusal during in-context learning.

arxiv arXiv cs.AI · 13d ago

Efficient and Sound Probabilistic Verification for AI Agents

A new framework enables secure, probabilistic policy enforcement for AI agents in ambiguous environments. It uses distributionally robust optimization to compute rigorous upper bounds on policy violation probabilities without assuming predicate independence. The method outperforms prior approaches on terminal and tool calling agent benchmarks, improving the security-utility trade-off.

arxiv arXiv cs.AI · 13d ago

Multi-LCB: Extending LiveCodeBench to 12 Programming Languages

Multi-LCB extends LiveCodeBench to twelve programming languages, preserving its contamination controls and evaluation protocol. It reveals Python overfitting, language-specific biases, and significant performance gaps among LLMs across languages, establishing a rigorous benchmark for cross-language code generation.

arxiv arXiv cs.AI · 13d ago

FlowEdit: Lifelong Pronunciation Adaptation in Flow-Matching TTS

FlowEdit enables frozen flow-matching TTS models to adapt pronunciation corrections over time using latent edits in text embeddings. It stores corrections in a Modern Hopfield Network and retrieves them via soft attention with similarity gating, reducing phoneme error rates by 92.7% on 312 multilingual proper nouns while preserving general-speech quality. Corrections take about 15 seconds to complete on a single GPU.