Reasoning models — korshunov.ai

Reasoning models Page 17 / 35

MedRLM: Recursive Multimodal Health Intelligence Framework

MedRLs enables long-context clinical reasoning by recursively inspecting patient data across text, images, sensors, and guidelines. It integrates specialized agents and a Clinical Evidence Graph Memory to connect patient observations with evidence, biomarkers, and referral criteria, supporting sensor-triggered reasoning and uncertainty-gated clinician review.

arxiv arXiv cs.CL · 7d ago

ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion

ReNikud introduces a novel audio-supervised approach for Hebrew grapheme-to-phoneme conversion, using weak audio supervision and a pseudo-vocalization architecture. It outperforms prior state-of-the-art methods on Hebrew G2-Ph benchmarks and the new MILIM benchmark, enabling more natural spoken Hebrew in text-to-speech applications.

arxiv arXiv cs.CL · 7d ago

Algorithm for Pitch Spelling and Key Estimation in Music Transcription

A new algorithm estimates note names, key signatures, and local scales from MIDI-like input by jointly optimizing modal and tonal stages. It has been evaluated on jazz lead sheets, solo transcriptions, traditional tunes, and classical piano scores, with additional distances defined between common jazz scales for musicological research.

arxiv arXiv cs.CL · 7d ago

Causal Activation Directions for Mitigating Emergent Misalignment in Language Models

Fine-tuning language models on insecure code causes emergent misalignment. A shared activation direction across four model families achieves 99.6% separation of aligned and misaligned activations, and subtracting it reduces code spillover by 21-51 points. Cross-architecture transfer shows behavioral suppression but lacks specificity, with within-model directions being causally actionable and cross-model directions only causally real.

arxiv arXiv cs.CL · 7d ago

Meaning Intelligence Framework for Nigerian Public Discourse

The Meaning Intelligence Framework (MIF) introduces a nine-dimensional schema to analyze Nigerian public discourse, addressing context failure in AI systems. A 30-item calibration dataset shows that schema-informed prompting improves register classification accuracy from 33.3% to 73.3% and boosts the composite Meaning Intelligence Score from 73.2 to 78.6.

arxiv arXiv cs.CL · 7d ago

PsyScore: A Psychometrically-Aware Framework for Trait-Adaptive Essay Scoring and ZPD-Scaffolded Feedback

PsyScore integrates diagnostic scoring and instructional feedback using a shared latent ability model. It features a trait-adaptive neural IRT scorer based on GPCM, a ZPD-scaffolded feedback generator that tailors instruction by proficiency level, and a multi-perspective evaluation strategy. Experiments on ASAP++ show competitive scoring and more pedagogically aligned feedback compared to existing methods.

media r/LocalLLaMA · 7d ago

DiffusionGemma 26B on 4090 reaches 475t/s with limitations

DiffusionGemma 26B runs at up to 475t/s on a 4090 via vLLM with INT4 AWQ quantization, achieving speeds between 290t/s and 700t/s based on output length. However, it suffers from single-user operation, lower response accuracy, rapid context loss, and slower time-to-first-token compared to standard 26B models.

media r/LocalLLaMA · 7d ago

Laguna M.1: 225B Parameter MoE Model for Agentic Coding

Laguna M.1 is a 225B-parameter mixture-of-experts model with 23B activated parameters per token, designed for agentic coding and long-horizon tasks. It achieves competitive performance on SWE-bench Verified (74.6%), SWE-bench Multilingual (63.1%), and Terminal-Bench 2.0 (45.8%), outperforming models like Devstral 2 and GLM-4.7 on key benchmarks.

media r/LocalLLaMA · 7d ago

My suitcase robot gets high from real gas sensor

A real MQ-2 gas sensor detects smoke and feeds live data to an LLM sampler, adjusting temperature, top_p, and top_k in real time. As smoke increases, the robot's speech becomes loopier and more associative, with no scripted 'stoned' mode, demonstrating live model behavior driven by physical input.

media r/LocalLLaMA · 7d ago

GLM-5.2 Is The Best Open Weight Creative Writing Model

Sam Paech's Creative Writing Benchmark on EQ Bench ranks GLM-5.2 as the top open-weight creative writing model. The assessment is based on performance metrics from the EQ Bench creative writing evaluation.

media r/LocalLLaMA · 7d ago

SLMs and Diffusion: The Future of Small, Specialized Models?

Users discuss whether task-specific small language models (SLMs) can outperform larger models in specific tasks, citing benchmarks where 9B models match or exceed larger ones. They propose a sequential agentic workflow using multiple specialized models, with one coordinating and others verifying answers, suggesting diffusion models could accelerate such workflows despite reduced intelligence.

media r/LocalLLaMA · 8d ago

Llama Bench vs Real-World Performance Discrepancy

The user reports a significant gap between Llama benchmark results and actual model performance. Benchmarks show 754 tk/s prefill and 36 tk/s generation, but real usage reveals only 7.98 tokens per second, with high latency and poor throughput. The discrepancy is attributed to real-world usage conditions, not benchmark settings, suggesting the model's actual performance is far below the benchmarked speed.

media r/LocalLLaMA · 8d ago

Keye-VL-2.0-30B-A3B Launches with Advanced Video Understanding and Agent Capabilities

Keye-VL-2.0-30B-A3B is a 30B-parameter multimodal model designed for long-video understanding and agent functionality. It outperforms open-source rivals and matches Gemini-3-Flash in temporal grounding, supports up to 256K context with near-lossless reasoning, and includes built-in capabilities for code, tool, and web search agent workflows.

media r/LocalLLaMA · 8d ago

GLM-5.2 Review and Censorship Response

GLM-5.2 demonstrates exceptional long-context coherence and conversational fluency, outperforming Gemini-3.1-Pro on text-only tasks and matching GPT-5.5 in reasoning quality. The model responds factually to sensitive topics like Taiwan and Tiananmen Square, providing detailed historical context without overt censorship, though it adheres to Chinese government content guidelines.

arxiv arXiv cs.LG · 8d ago

Discriminator-Guided RL Corrects Flow Matching with Data-Aligned Rewards

Discriminator-Guided RL (DRL) uses a pretrained representation space to train a discriminator that separates real data from model-generated samples. Its logit is used as a reward in KL-regularized RL, aligning model outputs with visual and semantic realism without human preferences. DRL improves FID and semantic FD across models like SiT and JiT, and enhances the Pareto frontier between preference and fidelity.

arxiv arXiv cs.LG · 8d ago

Essential Subspace Merging for Multi-Task Learning

Essential Subspace Merging (ESM) reduces inter-task interference by focusing on principal directions of activation shifts. ESM++ extends this with dynamic expert selection via prototype-based routing, enabling efficient, training-free multi-task model merging.

arxiv arXiv cs.LG · 8d ago

AGDN: Solving Traveling Salesman Problem with Anisotropic Graph Diffusion

AGDN introduces a graph neural network framework that addresses topological priors and connectivity loss in TSP. It uses a MixScore transition matrix and anisotropic diffusion to enable efficient information exchange, outperforming existing methods across diverse problem sizes and distributions while maintaining competitive computation time. The implementation is available on GitHub.

arxiv arXiv cs.LG · 8d ago

Decision-Focused RL for EV Charging with Unknown Departure Times

A new decision-focused RL framework jointly trains a forecaster and charging controller to handle unknown EV departure times. By aligning forecast accuracy with downstream decision quality, the method achieves up to 14% higher total reward and a 55% reduction in unsupplied energy compared to standard RL approaches.

arxiv arXiv cs.LG · 8d ago

Generalised Eigenvalue Geometry of Semantic Adversarial Attacks

A new theory models how semantic paraphrases can fool financial sentiment classifiers by analyzing the worst-case displacement of target model representations. The attackability index λ*(x) is derived from the largest generalised eigenvalue of a matrix pencil (A,B), offering closed-form predictions and robustness certificates for affine readouts. The framework connects continuous perturbation theory to discrete paraphrase search, with empirical validation on real financial text classifiers.

arxiv arXiv cs.LG · 8d ago

MAST Enables Selective Unlearning in RLVR-Induced Reasoning

MAST, a mechanism-guided unlearning method, achieves targeted forgetting of RLVR-induced reasoning with minimal collateral damage. On Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, it significantly reduces MATH performance (45/150 to 37/15-0) while preserving GSM8K accuracy by +0.8 points and maintaining MATH retention at -0.5 points. Results hold across different seeds, objectives, and models, showing superior stability over full-parameter unlearning.