All articles
media r/LocalLLaMA · 7h ago

Reddit user proposes combining RTX 5080 and 4060 for local LLM inference

A Reddit user in the r/LocalLLaMA community is considering upgrading their hardware to improve inference speed and capacity for Qwen models by pairing a future RTX 5080 with their existing RTX 4060. The user aims to achieve at least 20-40 tokens per second while running Qwen 27B models, utilizing the combined 24GB of VRAM through tensor or layer splitting in llama.cpp or vLLm. They are evaluating this asymmetric dual-GPU setup against other options like the AMD R9700 AI Pro or 7900XTX, citing benchmark data that suggests limited performance gains for the AMD cards relative to their cost.

arxiv arXiv cs.CL · 8h ago

CARVE: Content-Aware Recurrent with Value Efficiency for Chunk-Parallel Linear Attention

The CARVE architecture addresses three critical defects in the leading GDN-2 delta-rule recurrent model by restricting erase operations to the key axis, thereby enabling valid WY-form triangular chunk solving and improving value efficiency. By reusing the recurrent output tensor as a content signal and replacing per-value write-gate projections with single scalars, CARVE maintains bit-identical initialization to GDN-2 while resolving memory-blind gating issues.

arxiv arXiv cs.CL · 8h ago

The Geometry of Updates: Fisher Alignment at Vocabulary Scale

This article addresses the challenge of training-free source selection for large language models with shared vocabularies in scientific domains like SMILES and genomics, where classical metrics are either uninformative or computationally prohibitive. The authors demonstrate that representation similarity metrics are non-identifiable for transfer because models can share identical representations yet have orthogonal head updates.

arxiv arXiv cs.CL · 8h ago

Beyond Surface Forms: A Comprehensive, Mechanism-Oriented Taxonomy of Indirect Linguistic Encoding for LLM-Based Coded Language Detection

Researchers propose a mechanism-oriented taxonomy of indirect linguistic expressions (ILE) to categorize the underlying operations used to encode and recover meaning in coded language. This approach abstracts away from communicative goals to focus on the specific encoding mechanisms found in algospeak, euphemisms, and adversarial obfuscation.

arxiv arXiv cs.CL · 8h ago

LLM-Based Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank

This paper presents the first case study applying Large Language Models to the German Central Bank's process of verifying securities eligibility for collateral, shifting from traditional Named Entity Recognition to a generative Information Extraction pipeline. The approach decomposes the task into extraction, normalization, and interpretation to handle noisy text and bilingual content more effectively.

arxiv arXiv cs.CL · 8h ago

Assessing Post-Reform Changes in Risk Disclosure Quality with a Multidimensional Text Analysis Approach

This study proposes a longitudinal text analysis framework combining Japanese-language NLP metric extraction with paired testing and shift function analysis to evaluate qualitative changes in corporate risk disclosures. Applied to Japan's 2019 disclosure reforms, the approach analyzes 19,770 firm-year observations over ten years to capture multidimensional dynamics often masked by single-indicator methods.

arxiv arXiv cs.CL · 9h ago

Mapping Political-Elite Networks in Europe with a Multilingual Joint Entity-Relation Extraction Pipeline

Researchers present a modular, fully open-weight pipeline for multilingual joint entity-relation extraction that builds signed, temporal knowledge graphs from massive unstructured news corpora. The system combines span-based named-entity recognition with a linking cascade to Wikidata and an ontology-constrained mixture-of-experts model to extract directed relationships.