When you don't have a data center GPU
The article references the LiquidAI LFM2.5-230M model as an alternative for users without access to data center GPUs.
The article references the LiquidAI LFM2.5-230M model as an alternative for users without access to data center GPUs.
Ornith-1.0 is a new family of open-source large language models specialized for agentic coding tasks. The model family spans multiple parameter sizes, including 9B Dense, 35B MoE, and 397B MoE configurations.
NVIDIA introduces Nemotron-TwoTower, a diffusion language model that decouples context representation and iterative denoising into two separate networks to overcome capacity limitations in existing approaches. Built on the open-weight Nemotron-3-Nano-30B-A3B model and trained on 2.1T tokens, it retains 98.7% of the autoregressive baseline's quality while achieving 2.42X higher wall-clock generation throughput.
A study reveals that while large reasoning models (LRMs) and humans both spend more time on harder problems, they diverge significantly in how they allocate deliberation within specific items. When making errors, LRMs generate more tokens than when correct, whereas humans do the opposite, spending less time on trials they get wrong.
The article introduces MemStrata, a retrieval memory system designed to eliminate stale-fact errors in AI agents by maintaining temporal validity within accumulated knowledge. Unlike standard Retrieval-Augmented Generation (RAG), which struggles to distinguish between duplicated and contradicted facts due to embedding similarity, MemStrata uses a deterministic supersession rule to retire outdated information.
The authors propose Erase-then-Delta Attention (EDA), a memory update rule for recurrent models that decouples the address used to erase stale information from the address used to write new content. This approach addresses the limitation of delta-rule linear attention, which cannot actively remove outdated data stored at different locations before writing.
A study reveals that conditioning language and vision models on narrow tasks suppresses their ability to report co-present, safety-critical signals they can otherwise detect. This phenomenon, termed the "Inattentional Gap," demonstrates a dissociation between measured benchmark safety and real-world safety.
The paper introduces DiARC, a method that improves the abstract reasoning capabilities of large language models by incorporating negative sample supervision alongside positive examples. This approach addresses the limitations of current methods that rely heavily on data augmentation or expensive closed-source models.
The authors introduce ApproxHDC, a framework that automates the identification and application of domain-specific approximations in Hyperdimensional Computing (HDC) workloads. This system extends the HPVM-HDC compiler infrastructure to enable retargetable compilation across diverse hardware backends, including CPUs, GPUs, and simulated ReRAM and PCM accelerators.
This survey integrates four disconnected tracks of adversarial evaluation—diffusion-based attacks on text and LLMs, image classifiers, vision-language models, and input purification defenses—into a single conceptual framework. It focuses on the LLM-side slice to unify vocabulary, threat models, and benchmarks around denoising diffusion as a shared generative mechanism.
Researchers propose KIRP, a zero-shot stance detection framework that addresses context sparsity and implicit target relevance in short texts by integrating external knowledge with reflective Chain-of-Thought reasoning. The study also introduces the first Japanese tweet-level dataset for stance detection to support this multi-topic evaluation.
Researchers address the quality gap in low-resource text-to-speech by fine-tuning the 2.4B-parameter VoxCPM2 model using Low-Rank Adaptation (LoRA) on a shared corpus of Khmer and Korean.
This paper proposes a new approach to catastrophic forgetting in large language models by regularizing in activation space using pretrained Sparse Autoencoders (SAEs) as a monosemantic feature dictionary, rather than relying on traditional weight-space methods like Elastic Weight Consolidation (EWC).
Researchers present CAT-Q, a post-training quantization scheme that compresses large language models into ternary precision without requiring costly quantization-aware training. The method utilizes learnable modulation and softened ternarization to achieve high accuracy using only 512 calibration samples.
A user asks for experience regarding the ablation of Mandarin, Russian, and Arabic from a model to create a primarily Latin-based version. The goal is to free up space for further training or safe pruning in contexts where English has no activation.
The authors introduce SocialPersona, a benchmark designed to evaluate whether multimodal large language models (MLLMs) can recover revealed preferences from longitudinal social-media timelines and use them in dialogue. This work addresses the limitation of current evaluations that focus only on explicit memory by testing a model's ability to infer interests from natural multimodal traces.
This paper investigates whether safety guardrails actually require chain-of-thought reasoning by training a lightweight bidirectional encoder alongside a reasoning-based guard on the same corpus. The authors find that removing reasoning does not improve moderation accuracy, challenging the common belief that step-by-step thinking is necessary for effective moderation.
This study investigates whether merging abstract logical structures with context-level linguistic cues improves the automated classification of logical fallacies, which often appear in nuanced forms.
HyperDFlash is a block-parallel speculative decoding framework designed to address feature misalignment issues when adapting DFlash to DeepSeek-V4's multi-hyper-connection (MHC) architecture. The authors propose two key optimizations: using pre-collapse residual states for conditioning and replacing the generic linear compressor with a lightweight gated residual reducer inherited from the model's hyper-connection head.
This article investigates how language models learn latent semantic structure despite being trained with one-hot labels that theoretically eliminate shared context statistics. The authors identify a tension between Neural Collapse theory and the observed ability of models to capture categorical features like object properties.