All articles
media r/LocalLLaMA · 5h ago

Reddit user proposes combining RTX 5080 and 4060 for local LLM inference

A Reddit user in the r/LocalLLaMA community is considering upgrading their hardware to improve inference speed and capacity for Qwen models by pairing a future RTX 5080 with their existing RTX 4060. The user aims to achieve at least 20-40 tokens per second while running Qwen 27B models, utilizing the combined 24GB of VRAM through tensor or layer splitting in llama.cpp or vLLm. They are evaluating this asymmetric dual-GPU setup against other options like the AMD R9700 AI Pro or 7900XTX, citing benchmark data that suggests limited performance gains for the AMD cards relative to their cost.

arxiv arXiv cs.CL · 7h ago

CARVE: Content-Aware Recurrent with Value Efficiency for Chunk-Parallel Linear Attention

The CARVE architecture addresses three critical defects in the leading GDN-2 delta-rule recurrent model by restricting erase operations to the key axis, thereby enabling valid WY-form triangular chunk solving and improving value efficiency. By reusing the recurrent output tensor as a content signal and replacing per-value write-gate projections with single scalars, CARVE maintains bit-identical initialization to GDN-2 while resolving memory-blind gating issues.

arxiv arXiv cs.CL · 7h ago

The Geometry of Updates: Fisher Alignment at Vocabulary Scale

This article addresses the challenge of training-free source selection for large language models with shared vocabularies in scientific domains like SMILES and genomics, where classical metrics are either uninformative or computationally prohibitive. The authors demonstrate that representation similarity metrics are non-identifiable for transfer because models can share identical representations yet have orthogonal head updates.

arxiv arXiv cs.CL · 7h ago

Beyond Surface Forms: A Comprehensive, Mechanism-Oriented Taxonomy of Indirect Linguistic Encoding for LLM-Based Coded Language Detection

Researchers propose a mechanism-oriented taxonomy of indirect linguistic expressions (ILE) to categorize the underlying operations used to encode and recover meaning in coded language. This approach abstracts away from communicative goals to focus on the specific encoding mechanisms found in algospeak, euphemisms, and adversarial obfuscation.

arxiv arXiv cs.CL · 7h ago

LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank

This paper presents the first case study applying Large Language Models to the German Central Bank's process of verifying securities eligibility for collateral, shifting from traditional Named Entity Recognition to a generative Information Extraction pipeline. The approach decomposes the task into extraction, normalization, and interpretation to handle noisy text and bilingual content more effectively.