All articles — korshunov.ai

All articles Page 1 / 102

Jailbreaking for the Average Jane: Choosing Optimal Jailbreaks via Bandit Algorithms

This study investigates whether non-expert malicious actors can successfully jailbreak large language models by using bandit algorithms to select optimal attacks and enhance queries. The authors propose a novel attack strategy based on the multi-armed bandit framework to efficiently learn the best jailbreak from a large choice set through noisy exploration.

arxiv arXiv cs.CL · 6h ago

Term-Centric Hierarchy Induction from Heterogeneous Corpora

Researchers propose a term-centric framework for inducing hierarchical taxonomies from diverse text sources, addressing the limitations of existing methods that rely on document-level representations. This approach maps documents into a shared representation space via automatic term extraction to enable robust cross-source alignment and construct interpretable hierarchies.

arxiv arXiv cs.CL · 6h ago

RedVox: Safety and Fairness Gaps in Speech Models Across Languages

A new study reveals significant safety and fairness gaps in multilingual speech models, finding that only 8% of state-of-the-art releases document any multilingual analysis. To address this, the authors introduce RedVox, a benchmark built on real voices covering unsafe requests across five languages.

arxiv arXiv cs.CL · 6h ago

Einstein World Models: Visualizing Counterfactuals for LLM Reasoning

The article introduces Einstein World Models (EWMs), a framework designed to enhance large language model reasoning by integrating visual-temporal rollouts into the reasoning trace. This approach allows models to utilize visual thought experiments as inspectable hypotheses to complement text-based processing.

arxiv arXiv cs.CL · 6h ago

Auditing Framing-Sensitive Behavioral Instability in LLMs for Mental Health

This study investigates how semantically similar concerns presented through different contextual framings elicit varying responses from instruction-tuned large language models, potentially challenging system reliability. Using controlled matched prompts and layer-wise probing analyses, the authors demonstrate that framing systematically alters interpretive response tendencies across multiple model architectures.

arxiv arXiv cs.CL · 6h ago

ReaORE: Reasoning-Guided Progressive Open Relation Extraction Empowered by Large Reasoning Models

Researchers propose ReaORE, a framework for open relation extraction that utilizes large reasoning models to achieve reliable generalization to unseen relation types. The method addresses limitations of current clustering and direct generation approaches through a coarse-to-fine reasoning process.

arxiv arXiv cs.CL · 6h ago

Where Do Models Find Happiness? Emotion Vectors in Open-Source LLMs

This study investigates the presence and structure of emotion vectors in open-weight large language models, specifically Apertus-8B-Instruct-2509 and Gemma-4-E4B-it. The research confirms that these models encode valence geometry with high correlation to human psychological structures, approaching the levels previously observed in Claude Sonnet 4.5.

arxiv arXiv cs.CL · 6h ago

MinGram: A Minimalist Unigram Tokenizer with High Compression and Competitive Morphological Alignment

The authors introduce MinGram, a minimalist unigram tokenizer that simplifies training by using a BPE-derived seed vocabulary, Hard EM on a minimum-token path, and a single flat score-pruning step. This approach removes the need for suffix arrays, forward-backward passes, and iterative prune loops, making the procedure significantly less complex than standard methods.

arxiv arXiv cs.CL · 6h ago

Improving Verbalized Uncertainty Calibration in Medical VQA

This work addresses the tendency of multimodal large language models to produce overconfident outputs in Medical Visual Question Answering by proposing a training-based framework that finetunes these models for better calibration. The method employs a composite loss function combining Brier-style calibration, anchor regularization, contrastive image-text alignment, and KL divergence terms to align model confidence with actual correctness.

arxiv arXiv cs.CL · 6h ago

Improving General Role-Playing Agents via Psychology-Grounded Reasoning and Role-Aware Policy Optimization

Researchers propose Psy-CoT, a psychology-grounded chain-of-thought framework that decomposes pre-response reasoning into Interaction Perception, Psychological Empathy, and Logical Construction to improve character fidelity. To address gradient misalignment in reinforcement learning, they introduce Role-Aware Policy Optimization (RAPO), which uses profile-token mutual information to weight gradients asymmetrically.

arxiv arXiv cs.CL · 6h ago

NuclearQAv2: A Structured Benchmark for Evaluating Domain-Science Competence in Large Language Models

Researchers introduce NuclearQAv2, a new benchmark designed to assess the reliability of large language models in nuclear engineering by testing factual knowledge, quantitative reasoning, and conceptual understanding.

arxiv arXiv cs.CL · 7h ago

Towards Explainable Adjudicative Variance: Quantifying Judicial Discretion via Gated Multi-Task Learning

Researchers propose a Judge-Aware Gated Multi-Task Learning architecture that disentangles objective case facts from adjudicative context to improve legal outcome prediction. The model uses a fine-grained outcome taxonomy and a gated fusion mechanism to dynamically modulate reliance on judge identity, evaluated on 13,937 UK Employment Tribunal decisions.

arxiv arXiv cs.CL · 7h ago

The Riddle Riddle: Testing Flexible Reasoning in Large Language Models and Humans

A study introduces the "riddle riddle" paradigm to determine whether large language models (LLMs) rely on flexible reasoning or pattern matching, revealing that humans and LLMs fail in opposite directions. In experiments involving nine state-of-the-art LLMs and 100 human participants, LLMs performed significantly worse on riddle riddles than on genuine riddles, while humans showed the reverse trend.

arxiv arXiv cs.CL · 7h ago

HarmVideoBench: Benchmarking Harmful Video Understanding in Large Multimodal Models

Researchers introduce HarmVideoBench, a multi-layered diagnostic benchmark designed to evaluate large vision-language models on their ability to understand harmful videos beyond superficial cues. The benchmark addresses limitations in existing works by incorporating explanatory rationales and assessing three hierarchical dimensions of harm: Observable Evidence, Clip-Internal Meaning, and Beyond-Clip Reasoning.

arxiv arXiv cs.CL · 7h ago

Forecasting With LLMs: Improved Generalization Through Feature Steering

This study applies Large Language Models to forecasting tasks and uses sparse autoencoders to analyze their internal states, distinguishing between time-specific knowledge and generalizable patterns. The research identifies specific features associated with both time-aware reasoning and look-ahead-biased reasoning.

arxiv arXiv cs.CL · 7h ago

Syntactic Belief Update as the Driver of Garden Path Processing Difficulty

The article proposes Syntactic Belief Update, a model that predicts processing difficulty in garden path sentences by measuring the magnitude of syntactic belief updates via generalized Rényi divergence. This approach outperforms lexical surprisal by providing a better fit to human reading time data.

arxiv arXiv cs.CL · 7h ago

Paved with True Intents: Intent-Aware Training Improves LLM Safety Classification Across Training Regimes

The authors introduce AIMS, a dataset of 1,724 human-annotated difficult safety prompts paired with intent descriptions and harm labels, to evaluate intent-aware training across multiple regimes. They argue that modeling user intent as an explicit signal significantly improves the robustness of safety classifiers.

arxiv arXiv cs.CL · 7h ago

Ask, Don't Judge: Binary Questions for Interpretable LLM Evaluation and Self-Improvement

The authors propose BINEVAL, a framework that decomposes evaluation criteria into atomic binary questions to provide interpretable, multi-dimensional scores for large language models. This approach generates transparent question-level feedback and calibrated overall scores by having an LLM answer fine-grained evaluation questions independently for each output.

blog Simon Willison · 7h ago

datasette-export-database 0.3a2 fixes version pin

The datasette-export-database plugin version 0.3a2 addresses a compatibility issue caused by an overly strict dependency constraint in the previous release.

github llama.cpp · 7h ago

llama.cpp b9827 release adds CUDA 2D async copy optimization

The llama.cpp b9827 release introduces a performance optimization for CUDA by adding a cudaMemcpy2DAsync fast path to the ggml_cuda_cpy function. This change accelerates same-type, same-shape strided copies where tensors are not fully contiguous but each row is contiguous, replacing slower element-wise scalar copy kernels.