All articles — korshunov.ai

All articles Page 1 / 97

Measuring & Mitigating Over-Alignment for LLMs in Multilingual Criminal Law Courts

This article addresses the challenge of over-alignment in large language models used within Swiss Federal Supreme Court criminal law contexts, where model guardrails frequently trigger refusals when processing sensitive case details. The authors introduce TF-RefusalBench, a multilingual benchmark derived from public rulings, to measure this phenomenon across French, German, Italian, and English.

arxiv arXiv cs.AI · 15h ago

Energy-Based Transformers as Predictors of Reading Difficulty

This study introduces energy-based transformers as a novel measure for predicting human reading difficulty, establishing a formal link between transformer models and associative memory literature like Hopfield networks.

arxiv arXiv cs.AI · 15h ago

Distribution-Aware Diffusion-LLM for Robust Ultra-Long-Term Time Series Forecasting

The authors propose Diffusion-LLM, a framework that integrates a conditional diffusion model into an LLM-based pipeline to address challenges in multimodal time series forecasting. This joint design enables the learning of future data distributions while improving semantic alignment within a shared latent space.

media r/LocalLLaMA · 15h ago

Fast medical RAG API to give your local LLMs access to facts

A developer has released a free, simple Retrieval-Augmented Generation (RAG) API powered by medical Wikipedia articles to provide local large language models with accurate factual information. The service aims for subsecond responses and currently runs on a single ARM VPS using approximately 2GB of RAM.

media r/LocalLLaMA · 15h ago

DGX Spark OS lifetime?

A user on Reddit asks whether Nvidia has disclosed the support lifecycle for the operating system running on DGX Spark hardware. The inquiry specifically concerns the duration of OS support and whether users will be forced to upgrade to new products in the near future, such as by 2028.

arxiv arXiv cs.AI · 15h ago

Automated Semantic Fault Localization in SysML v2 Using Knowledge-Graph Augmented LLMs

This paper presents a human-in-the-loop framework for automatically identifying and repairing semantic errors in SysML v2 models that compilers cannot detect. The approach combines fine-tuned Small Language Models with a domain knowledge graph to ground repair suggestions in valid engineering constraints.

arxiv arXiv cs.AI · 15h ago

Litmus: Zero-Label, Code-Driven Metric Specification for Evaluating AI Systems

Litmus is a zero-label system that designs evaluation and monitoring metrics for AI pipelines by eliciting evaluation intent from source code and targeted interrogation. Instead of assuming the evaluation target is known, it identifies what must be measured and why to construct a justified metric portfolio.

arxiv arXiv cs.AI · 15h ago

ReasoningLens: Hierarchical Visualization and Diagnostic Auditing for Large Reasoning Models

The emergence of Large Reasoning Models has introduced exceptionally long Chain-of-Thought traces, creating a transparency burden where critical logic is often buried under massive procedural text. To address this, the authors present ReasoningLens, an open-source framework designed for the hierarchical visualization and diagnostic auditing of complex reasoning chains.

arxiv arXiv cs.AI · 16h ago

HyperQuant: A Rate-Distortion-Optimal Quantization Pipeline for Large Language and Diffusion Models

HyperQuant is a unified post-training quantization pipeline designed for the weights and KV cache of large language and diffusion transformers, combining Hadamard transforms with optimal lattice quantization. The method outperforms recent schemes like HIGGS, TurboQuant, and OCTOPUS across various bit rates while maintaining near-lossless quality.

arxiv arXiv cs.AI · 16h ago

UnBias-Plus: Detect, Explain, and Rewrite Bias

UnBias-Plus is an open-source toolkit designed to address persistent bias in natural language by unifying detection, explanation, and neutral rewriting capabilities.

arxiv arXiv cs.AI · 16h ago

Detecting Malicious Agent Skills in the Wild using Attention

The authors present Locate-and-Judge, a two-stage detector designed to identify malicious skills in LLM agent marketplaces where traditional prompt-injection defenses fail.

arxiv arXiv cs.AI · 16h ago

Digital Humanism and Evolutionary Design

This paper examines the concepts of digital humanism and evolutionary design to identify their common structures, synergies, and challenges within the context of human-centered technological development.

arxiv arXiv cs.AI · 16h ago

GRINQH: Graded Input-based Quantization Hierarchy for Efficient LLM Generation

Researchers propose GRINQH, a weight-only post-training quantization framework that accelerates large language model decoding by unifying quantization and sparsification. The method leverages activation magnitudes to dynamically assign weight channels to different precision levels, addressing the memory-bound nature of the decoding stage.

arxiv arXiv cs.AI · 16h ago

STAITUS: Disentangling Appearance and Pose for Video Object Tracking

The article introduces STAITUS, a unified framework for unsupervised video object tracking that addresses the limitations of existing slot-based methods by explicitly disentangling appearance from geometric pose. This approach resolves conflicts between temporal consistency and object motion, preventing slots from locking onto static backgrounds.

arxiv arXiv cs.AI · 16h ago

Cross-Architectural Mixture-of-Experts with Adaptive Soft Routing for Plant Leaf Disease Classification

This study proposes an adaptive soft Mixture-of-Experts (MoE) framework that integrates EfficientNet-B0, DenseNet-121, and Swin-Tiny to address challenges in plant leaf disease classification under complex backgrounds and class imbalance.

arxiv arXiv cs.AI · 16h ago

What Does a Chemical Language Model Know About Molecules?

This study applies sparse autoencoders to MolFormer to mechanistically examine how molecular representations are built across layers, challenging the assumption that chemical language models only learn surface-level syntax.

github CrewAI · 16h ago

crewAI 1.14.8a5 Release Notes

The crewAI version 1.14.8a5 update introduces changes to flow state management, documentation updates, and refactoring efforts.

media r/LocalLLaMA · 17h ago

LFM2.5 230M Runs In-Browser at 1,400 tok/s via Custom WebGPU Kernels

The LiquidAI LFM2.5-230M model is now running locally in the browser using custom WebGPU kernels. These specialized kernels were originally developed by Fable 5 prior to its shutdown and Opus 4.8. The demonstration was recorded on an M4 Max device, achieving a generation speed of 1,400 tokens per second. All processing occurs entirely within the user's browser environment without external server dependencies. A GGUF version of the model is available for download on Hugging Face alongside the standard checkpoint. Users can interact with the live demo hosted by the webml-community on Hugging Face Spaces.

media r/LocalLLaMA · 17h ago

Apple to Skip M6 Pro/Max Chips, Fast-Track M7 for Local AI

A recent report indicates that Apple plans to skip the release of M6 Pro and M6 Max chips in its upcoming lineup. Instead, the company intends to fast-track the development of the M7 chip series to better support local artificial intelligence workloads. This strategic shift suggests a prioritization of on-device AI capabilities over traditional performance increments for the Pro tier. The decision reflects Apple's growing emphasis on integrating advanced machine learning features directly into its hardware architecture. By accelerating the M7 timeline, Apple aims to provide more robust neural engine performance for running large language models locally. This move signals a significant pivot in Apple Silicon's development roadmap toward AI-centric design principles.

arxiv arXiv cs.AI · 17h ago

AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

The Android Open Harness Project (AOHP) is an open-source operating system-level agent harness built on the Android Open Source Project. It addresses the mismatch between current application-centric operating systems and the needs of autonomous AI agents by treating agents as first-class OS actors. The design introduces three key mechanisms: personalized service composition, efficient agent interfaces, and secure information flow. These features enable adaptive user interfaces and agent-friendly runtime environments while preserving the existing Android ecosystem. Preliminary experiments on challenging tasks demonstrate significant performance improvements over conventional systems. Specifically, AOHP achieved a 21.12% increase in task completion rates compared to baseline methods. It also reduced token execution costs by 51.55%, highlighting its efficiency gains. Furthermore, the system showed improved compliance with security policies during agent-mediated interactions.