Open weights
arxiv arXiv cs.LG · 6d ago

Critical Percolation as a Synthetic Data Model for Interpretability

A new synthetic dataset based on critical mean-field percolation clusters provides a realistic, analytically tractable model with hierarchical structure. It features sparse, fractal clusters with power-law size distributions and latent variables that generate target values via a taxonomic hierarchy. Neural networks can linearly decode these ground-truth latent variables from activations, demonstrating strong interpretability.

arxiv arXiv cs.CL · 6d ago

Credence: Semantic Metrics and Convergence Analysis for Claim Decomposition

Credence introduces Semantic-F1, a BGE-large cosine similarity metric that improves claim decomposition accuracy over Jaccard by 15-32 percentage points. It establishes convergence theorems for rule- and LLM-based repair, showing rule-based repair is finitely terminating and monotone, while LLM-based repair requires early-exit guards. Evaluations across social-media, encyclopaedic, and news domains show EPR from 0.94 to 1.00, with rule-repair reducing atomicity violations by 47-100% without fidelity loss.

arxiv arXiv cs.CL · 6d ago

AI-Driven Deliberation: Scaling Inclusivity and Empowering Marginalised Groups

Large Language Models can scale democratic deliberation by scaffolding argumentation and reducing linguistic biases. The chapter uses Systemic-Functional Linguistics to analyze how socio-demographic and communicative variations affect participation, highlighting AI's potential to challenge exclusionary norms while cautioning against over- or under-claiming its capabilities. It calls for ethical safeguards and further research to ensure equitable AI-assisted engagement.

arxiv arXiv cs.CL · 6d ago

Generative Engine Optimization: Measuring AI Search Visibility

A large-scale study of 100K+ AI prompt responses across 100+ brands reveals a three-tier brand visibility ladder: global brands appear in 73% of answers, mid-market in 44%, and niche brands in just 11%. AI engines primarily cite corporate websites, with YouTube leading non-corporate sources, and best-of listicles accounting for 21% of citations. Sentiment in brand mentions is unstable, flipping six times more often than mere mention.

arxiv arXiv cs.CL · 6d ago

Essay Quality Representations in LLMs Found to Be Linearly Accessible

A study reveals that essay quality information in large language models is encoded in linearly accessible forms within their hidden representations. These representations emerge layer-by-layer, remain stable across prompts, and show partial transfer across different essay prompts, with longer essays relying more on deeper model layers. The research identifies specific "essay scoring neurons" whose activation strongly correlates with scores and can be influenced by targeted interventions.