Unlocking Britain’s next era of productivity: Building a nation of AI trailblazers
Google UK has released its latest Economic Impact Report detailing strategies to help more people unlock the benefits of AI-powered technologies in the country.
Google UK has released its latest Economic Impact Report detailing strategies to help more people unlock the benefits of AI-powered technologies in the country.
Researchers introduce LAMP, a multi-agent framework that synthesizes kernel-verified Lean 4 proofs for Combinatorics on Words by providing structured domain knowledge via an ontology. This approach addresses the lack of specialized lemmas in existing provers trained primarily on Mathlib data.
A comprehensive empirical study reveals that fine-tuning large language models with benign multilingual data significantly increases their tendency to comply with unsafe adversarial prompts, a phenomenon termed multilingual safety drift. The research demonstrates that safety outcomes are highly sensitive to both the language used for fine-tuning and the language of evaluation, with compliance rates increasing four-fold in certain settings.
The article introduces wav2VOT, a tool for the automatic estimation of voice onset time, closure duration, and burst realisation that leverages the wav2vec2 model. It addresses the need for accurate speech annotation tools in phonetic research by demonstrating how large speech models can be applied to these specific tasks.
This paper audits the license provenance of over twenty corpus families used in African NLP, revealing that while Creative Commons licenses dominate releases, their compatibility rules are rarely applied. The authors construct a six-tier compatibility matrix and apply it to three case-study languages: Kituba/Munukutuba, Zarma, and Moore.
This study investigates memory-managed long-context attention by separating a fast recurrent or sparse backbone from explicit editable request-local memory slots and query-time sparse fallback. The research aims to address the limitations of existing linear, recurrent, and sparse attention methods in managing when facts should be written, overwritten, protected, or discarded.
This paper introduces PASTA, a framework designed to integrate detailed factual information from news articles into Large Language Models (LLMs) to address the challenge of knowledge updating. The approach combines data augmentation, question-answering generation, and a novel self-learning Direct Preference Optimization (DPO) process to enable knowledge overwriting and hallucination suppression.
The authors introduce MedEvoEval, an executable longitudinal evaluation framework designed to assess the continual evolution of doctor agents through simulated outpatient clinical episodes. This system moves beyond static benchmarks by tracking how agents acquire evidence, utilize resources, and refine their decision-making across multiple interactions.
The authors introduce GRAB, a constructor-encoder-bridge pipeline designed for table question answering that lifts relational data into a heterogeneous graph and encodes it via message passing. The method transfers signals to a frozen large language model through a small set of query-conditioned latent tokens, providing a compact structural representation while preserving the LLM's general reasoning capabilities.
Researchers introduce FinInvest-GTCN, a Graph-Temporal-Causal Network designed to optimize venture capital investment decisions by addressing challenges like heterogeneous data and non-stationary time series. The model redefines the task from content recommendation to quantitative risk-return assessment, utilizing a relational graph encoder, multi-scale temporal fusion, and a causal decision head to generate interpretable predictions.
The authors introduce the Electro-Visual-Language Assistant (EVLA), a framework that integrates multi-modal scene understanding with real-time perception of an electrified powertrain's electro-mechanical state to improve driving decisions. This approach addresses the limitation of existing vision-language models that treat vehicle dynamics as a black box by incorporating physical constraints and optimization objectives.
The A3M framework addresses the challenges of learning to bid in repeated multi-unit auctions by integrating adaptive deep reinforcement learning, adversarial reasoning, and multi-objective reward design. It utilizes an actor-critic backbone and opponent modeling to optimize strategy against non-stationary adversaries while balancing utility, revenue, and fairness.
This paper proposes a filtering defense against dirty-label poisoning attacks on speech commands classification systems by clustering unsupervised representations to identify and remove poisoned training data.
This study investigates whether large language models can recover the statistical characteristics of a broader population using only a small pilot sample of human responses. The authors decompose this recovery into three axes: structural fidelity, marginal fidelity, and individual fidelity.
An audit of fourteen mainstream large language models reveals a significant shift in racial bias within resume screening algorithms over recent years. While 2023-vintage models reproduce pro-White callback gaps, all models released in 2024 or later show either null gaps or significant pro-Black reversals.
The paper introduces AgriTune-R, a reproducible and auditable framework designed to adapt general-purpose large language models for specific agricultural applications. This approach addresses the domain-specific, safety-critical nature of agriculture by integrating data governance, expert evaluation, and evidence constraints to prevent unreliable advice.
This article introduces BERTomelo, a next-generation monolingual encoder specifically optimized for the Portuguese language using the ModernBERT architecture.
The authors adapt the open-source IndicTrans2-1B translation system to handle conversational register across 21 Indic languages using only public datasets. By combining experience replay with model souping, they achieve significant improvements in automatic metrics without degrading performance on general domain tasks.
A study of 22 open-weight large language models reveals that while the strength of clinical evidence can be recovered from model activations and text, the grades explicitly stated by the models are no better than chance. Researchers analyzed 45,134 clinical claims harmonized into four-level evidence grades to test whether models register and express evidence strength distinct from factual truth.
Researchers investigate the distributional gap between synthetic and real speech in LLM-based automatic speech recognition (ASR) systems by probing a SLAM-ASR architecture. They identify that discriminative signals separating the two data types are concentrated in the early-to-middle layers of the model backbone.