Research paper — korshunov.ai

Topic · Research paper

A small-scale experiment shows that native binary embedding models achieve better retrieval than post-hoc binarization of float models. At SciFact Recall@10, native binary models (2048-dim and 4096-dim) outperform post-hoc binary models by 17% and 25% respectively, with significant speed and memory advantages in indexing.

arxiv arXiv cs.CL · 2d ago

OpenBioRQ: Benchmark for Agentic Biomedical Research Faithfulness

OpenBioRQ introduces a benchmark of 12,553 unsolved biomedical research questions across 12 domains, designed to test agentic models' faithfulness and abstention. It evaluates models in a tool-using setting without answer keys, using real follow-up evidence rather than parametric knowledge, and reveals significant agentic collapse on the hardest questions where tools are no longer used despite being critical.

media Hugging Face Forums · 3d ago

I built a novel triple-hybrid LLM under 1B parameters for ~$50

Mateusz has developed a full pre-trained language model, Project Inkblot's Titan v1, combining Mamba SSM, Multi-Head Attention, and 32-expert MoE in a single decoder-only architecture under 1B parameters. The model, trained on a single NVIDIA L4 GPU for ~$50, achieves 27.5 validation perplexity and demonstrates efficient scaling via a single-line config update, with all components implemented from scratch in PyTorch. Titan v2's first training cycle is now complete, and dataset expansion is underway.

arxiv arXiv cs.LG · 6d ago

LLM Alignment Using Implicit User Feedback

A new dataset, IFLLM, collects mouse trajectories and eye gazing data from users interacting with LLMs. It shows that implicit feedback significantly improves LLM alignment, boosting text-based reward model accuracy from 55% to 64% and nearly tripling response quality improvements after DPO training on eight LLMs.

arxiv arXiv cs.CL · 6d ago

LLM Alignment Using Implicit User Feedback

arxiv arXiv cs.AI · 6d ago

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization

ScaffoldAgent introduces a utility-guided framework for dynamic outline optimization in open-ended deep research. It models outline evolution through Expansion, Contraction, and Revision operations, guided by a feedback mechanism that evaluates retrieval gain, structural coherence, and generation quality. Experiments show it improves long-form report generation and factual grounding compared to existing agents.

arxiv arXiv cs.CL · 8d ago

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

As enterprise agent tool catalogs scale from 10 to 110 agents, routing accuracy drops 16--23 percentage points on under-specified requests. An oracle analysis identifies retrieval and confusion gaps, with embedding-based shortlisting recovering +10--11pp F1. A human-annotated study of 1,435 utterances confirms real-world recovery of +10--17pp despite lower absolute performance.

arxiv arXiv cs.AI · 1d ago

Social World Model for Lifelong Social Intelligence

The Social World Model decomposes social interaction into five dimensions to enable closed-loop learning. It allows open-source models to sustainably improve and retain social capabilities, outperforming baselines and matching closed-source Gemini 3 Flash in key metrics without forgetting across difficulty levels.

arxiv arXiv cs.AI · 1d ago

Ramanujan Graph Rewiring Alleviates GNN Over-Squashing

Ramanujan Propagation uses Ramanujan graphs to reduce over-squashing in Graph Neural Networks by ensuring non-negative resistance curvature. The method preserves local connectivity while enabling efficient long-range information flow, outperforming nine state-of-the-art rewiring techniques.

arxiv arXiv cs.AI · 1d ago

SOHET: Self-Supervised Transformer for Heterogeneous Event Streams

SOHET introduces a hierarchical transformer architecture with event-type-specific tabular encoders and self-supervised pre-training objectives. It outperforms existing methods by 5.8% on Booking.com's fraud detection task and achieves faster convergence with 2.4% additional gain from pre-training. On the EBES benchmark, bidirectional SOHET matches or exceeds the best published results on six out of eight tasks.

arxiv arXiv cs.AI · 1d ago

Graph-of-Differences for Anatomy-Structured MedReID

Graph-of-Differences (GoD) introduces anatomy-graph representations to enable medical image re-identification with explicit structural grounding. It computes differences across named anatomical regions and aligns them with global backbone differences, providing clinically auditable, structure-level explanations. GoD improves Rank-1 accuracy by 7.1 pp on fundus and 3.1 pp on CXR, with better performance on zero-shot transfers.

arxiv arXiv cs.AI · 1d ago

2D vs 3D Diffusion for Synthetic X-ray AI Training

A study compares 2D and 3D diffusion models for generating synthetic X-ray images. It shows that 2D diffusion-based synthetic X-rays can train AI models to perform as well as models trained on real X-rays, offering a viable path to large, diverse datasets without relying on real patient data.

arxiv arXiv cs.AI · 2d ago

MIRCaps: Large-Scale Mixed-Domain Vision-Language Dataset

MIRCaps introduces a large-scale multimodal dataset with 141,364 images, 981,947 image-level captions, 1,742,264 region-level captions, and 5,391,779 bounding box annotations. It enables fine-grained vision-language learning by providing detailed captions for object categories, sizes, colors, actions, and environmental context, and demonstrates effectiveness in image captioning and object detection tasks.

arxiv arXiv cs.AI · 2d ago

Deep learning with O(log N) parallel time complexity

Hierarchical Block-Local Learning (HBLL) enables deep neural network training in O(log N) parallel time complexity, eliminating the need for full backpropagation. HBLL decomposes networks into hierarchically linked blocks and achieves competitive performance on vision and language tasks, with extensions to recurrent architectures.

arxiv arXiv cs.AI · 2d ago

JS Divergence Improves GRPO Autoregressive Text-to-Image Alignment

A study introduces JS divergence in GRPO-style autoregressive text-to-image post-training, showing it balances policy optimization and generation diversity. Experiments on LlamaGen and Janus-7B demonstrate JS divergence achieves top or strong performance on evaluation metrics while preserving diverse outputs.

media Hugging Face Forums · 2d ago

Seeking arXiv cs.LG Endorsement for PsiLogic Optimizer

Ali, a 16-year-old independent researcher, has developed PsiLogic, a chaos-aware active cancellation optimizer based on Adam. Evaluated against AdamW and Lion using FairBench on an NVIDIA H100, PsiLogic achieved top validation metrics in three out of four tasks and is statistically tied in the fourth, though it incurs step-time overhead. The author seeks endorsement for arXiv submission under cs.LG, providing a GitHub repository and endorsement code 4ACC37.

media r/LocalLLaMA · 2d ago

VibeThinker: 3B-parameter model beats Opus 4.5 in reasoning

VibeThinker, a 3-billion-parameter language model, outperforms Opus 4.5 in reasoning tasks using a novel SFT+GRPO training approach. The model was introduced in a paper available on arXiv, with details shared in a Reddit post.

media r/LocalLLaMA · 2d ago

My new benchmark: how good are LLMs at simulating wetting behavior?

A new LLM micro-benchmark evaluates how well large language models can simulate solid-liquid interfaces using Surface Evolver, a 1992 tool for modeling liquid surfaces. The benchmark requires LLMs to write SE datafiles defining geometry and constraints through an iterative agentic process with objective grading, offering a niche task with real scientific relevance and sparse training data.

arxiv arXiv cs.CL · 2d ago

Predicate Importance Estimation and Decoupled Rationale-Score Distillation for Entity Alignment

A new method improves entity alignment in knowledge graphs by introducing Predicate Importance Estimation and Decoupled Rationale-Score Distillation. These modules enhance classification accuracy and enable human-in-the-loop verification by detecting uncertain predictions through a decoupled confidence-score estimation.

arxiv arXiv cs.CL · 2d ago

Entity-level Membership Inference via LLM Interrogation

Researchers propose entity-level membership inference to determine if an LLM has been exposed to information about a real-world entity during training. By constructing prompts with limited entity clues and analyzing semantic features in generated responses, their five interrogation strategies achieve up to 0.97 AUC and improve Balanced Accuracy by 6.0%–17.5% over adapted baselines on person entities.

Native binary embeddings outperform post-hoc binarization

OpenBioRQ: Benchmark for Agentic Biomedical Research Faithfulness

I built a novel triple-hybrid LLM under 1B parameters for ~$50

LLM Alignment Using Implicit User Feedback

LLM Alignment Using Implicit User Feedback

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

Social World Model for Lifelong Social Intelligence

Ramanujan Graph Rewiring Alleviates GNN Over-Squashing

SOHET: Self-Supervised Transformer for Heterogeneous Event Streams

Graph-of-Differences for Anatomy-Structured MedReID

2D vs 3D Diffusion for Synthetic X-ray AI Training

MIRCaps: Large-Scale Mixed-Domain Vision-Language Dataset

Deep learning with O(log N) parallel time complexity

JS Divergence Improves GRPO Autoregressive Text-to-Image Alignment

Seeking arXiv cs.LG Endorsement for PsiLogic Optimizer

VibeThinker: 3B-parameter model beats Opus 4.5 in reasoning

My new benchmark: how good are LLMs at simulating wetting behavior?

Predicate Importance Estimation and Decoupled Rationale-Score Distillation for Entity Alignment

Entity-level Membership Inference via LLM Interrogation