Research paper — korshunov.ai

Topic · Research paper

A small-scale experiment shows that native binary embedding models achieve better retrieval than post-hoc binarization of float models. At SciFact Recall@10, native binary models (2048-dim and 4096-dim) outperform post-hoc binary models by 17% and 25% respectively, with significant speed and memory advantages in indexing.

arxiv arXiv cs.CL · 2d ago

OpenBioRQ: Benchmark for Agentic Biomedical Research Faithfulness

OpenBioRQ introduces a benchmark of 12,553 unsolved biomedical research questions across 12 domains, designed to test agentic models' faithfulness and abstention. It evaluates models in a tool-using setting without answer keys, using real follow-up evidence rather than parametric knowledge, and reveals significant agentic collapse on the hardest questions where tools are no longer used despite being critical.

media Hugging Face Forums · 3d ago

I built a novel triple-hybrid LLM under 1B parameters for ~$50

Mateusz has developed a full pre-trained language model, Project Inkblot's Titan v1, combining Mamba SSM, Multi-Head Attention, and 32-expert MoE in a single decoder-only architecture under 1B parameters. The model, trained on a single NVIDIA L4 GPU for ~$50, achieves 27.5 validation perplexity and demonstrates efficient scaling via a single-line config update, with all components implemented from scratch in PyTorch. Titan v2's first training cycle is now complete, and dataset expansion is underway.

arxiv arXiv cs.LG · 6d ago

LLM Alignment Using Implicit User Feedback

A new dataset, IFLLM, collects mouse trajectories and eye gazing data from users interacting with LLMs. It shows that implicit feedback significantly improves LLM alignment, boosting text-based reward model accuracy from 55% to 64% and nearly tripling response quality improvements after DPO training on eight LLMs.

arxiv arXiv cs.CL · 6d ago

LLM Alignment Using Implicit User Feedback

arxiv arXiv cs.AI · 6d ago

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization

ScaffoldAgent introduces a utility-guided framework for dynamic outline optimization in open-ended deep research. It models outline evolution through Expansion, Contraction, and Revision operations, guided by a feedback mechanism that evaluates retrieval gain, structural coherence, and generation quality. Experiments show it improves long-form report generation and factual grounding compared to existing agents.

arxiv arXiv cs.CL · 8d ago

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

As enterprise agent tool catalogs scale from 10 to 110 agents, routing accuracy drops 16--23 percentage points on under-specified requests. An oracle analysis identifies retrieval and confusion gaps, with embedding-based shortlisting recovering +10--11pp F1. A human-annotated study of 1,435 utterances confirms real-world recovery of +10--17pp despite lower absolute performance.

arxiv arXiv cs.AI · 16h ago

Text2DSL: LLM-Based Code Generation for Domain-Specific Languages

This paper introduces Text2DSL, a distinct task of generating domain-specific language code from natural language. Using the PolkitBench dataset of 4,204 validated pairs, it shows that structured context—such as BNF grammar and API specs—boosts syntactic and structural validity and CodeBLEU scores by 60% to 95% across different LLM models, without fine-tuning.

media r/LocalLLaMA · 16h ago

Baidu's Unlimited-OCR Transcribes Dozens of Pages in One Forward Pass

Baidu has released Unlimited-OCR, a model that transcribes dozens of pages in a single forward pass using Reference Sliding Window Attention (R-SWA). It builds on DeepSeek-OCR, inheriting its encoder, image compression, and MoE architecture, with only 500M active parameters per token. The model achieves 93.92% accuracy on OmniDocBench v1.6, outperforming DeepSeek-OCR's 87.01% on v1.5, though vendor-reported results warrant independent validation.

arxiv arXiv cs.AI · 16h ago

PaperClaw: Autonomous Research with Human-in-the-Loop Refinement

PaperClaw is a multi-agent system that autonomously conducts research from field selection to paper publication. It uses a validated, iterative propose-test-reflect loop, grounded in real references and runnable results, and supports human-in-the-loop refinement at any stage. Evaluation shows it produces strong papers both autonomously and with human oversight.

arxiv arXiv cs.LG · 16h ago

TeaNet Improves Few-Shot Learning in Vibrational Spectroscopy

TeaNet, a task-enhanced augmentation network, reconstructs randomly masked spectra to generate augmented samples that preserve original spectral features while introducing domain-specific variations. This approach enables deep neural networks to identify discriminant wavenumbers more effectively, outperforming CNNs by 17% in challenging synthetic scenarios and offering improved interpretability in few-shot learning tasks.

arxiv arXiv cs.LG · 16h ago

Topological Neural Dynamics: Neuron-wise Sequence Modeling

Topological Neural Dynamics (TND) introduces a neuron-wise framework for sequence modeling, where each neuron evolves independently through a directed graph structure. In a single-player Pong behavior cloning task, TND achieves a mean of 17.47 consecutive catches per round, surpassing all baseline models by more than three times.

arxiv arXiv cs.LG · 16h ago

NASDAQ: Normalized Observation Space Dynamics-Augmented Q-Learning

NASDAQ addresses low-dimensional observation challenges in reinforcement learning by normalizing observation spaces to balance reconstruction losses. It integrates value learning with short-term value and next observation prediction, achieving competitive or superior performance with less training time across domains.

arxiv arXiv cs.LG · 16h ago

MedTS-TTT: Test-Time Training for Medical Time Series

MedTS-TTT introduces a test-time training framework for medical time series classification. Built on CLSA-TTT and a Gated Convolutional Backbone, it enables rapid, single-step adaptation without iterative optimization. On four public datasets, it achieves 11 top-1 rankings out of 12 evaluations across nine baselines and three metrics.

media r/LocalLLaMA · 17h ago

KaLM-Reranker-V1: Fast and Efficient Document Reranking

KaLM-Reranker-V1 is a fast but not late-interaction reranker that decouples query and passage computation while maintaining strong relevance modeling through cross-attention. It achieves state-of-the-art performance on BEIR, outperforms industrial models like Qwen3-Reranker, and shows excellent results on MIRACL and LMEB, with the 0.27B Nano model remaining competitive against 7-12B models.

arxiv arXiv cs.LG · 17h ago

Ramanujan Graph Rewiring Alleviates GNN Over-Squashing

Ramanujan Propagation uses Ramanujan graphs to reduce over-squashing in Graph Neural Networks by ensuring non-negative resistance curvature. The method preserves local connectivity while enabling efficient long-range information flow, outperforming nine state-of-the-art rewiring techniques.

arxiv arXiv cs.LG · 18h ago

SOHET: Transformer for Heterogeneous Event Streams

SOHET introduces a hierarchical transformer architecture with event-type-specific tabular encoders and self-supervised pre-training. It outperforms existing methods by 5.8% on Booking.com's fraud detection task and achieves state-of-the-art results on 6 out of 8 EBES benchmark tasks.

arxiv arXiv cs.LG · 18h ago

Graph-of-Differences for Anatomy-Structured MedReID

Graph-of-Differences (GoD) introduces anatomy-structured difference alignment for medical image re-identification. It represents images as anatomy graphs, computes differences over matched anatomical regions, and anchors retrieval signals to homologous structures. GoD improves Rank-1 accuracy by 7.1 pp on fundus and 3.1 pp on CXR, with better generalization in zero-shot settings.

arxiv arXiv cs.LG · 18h ago

Functional Orthogonality Ensures Identifiability in Unsupervised Disentanglement

The paper proves that locally orthogonal directions in generative models guarantee latent factor identifiability without needing statistical independence or causal assumptions. Experiments with orthogonality-regularized normalizing flows confirm reliable recovery of true latent factors, challenging prior claims about unsupervised disentanglement impossibility.

arxiv arXiv cs.LG · 18h ago

Universal Encoders for Modular Relational Deep Learning

The paper proposes a modular relational deep learning approach that decouples row encoding from graph message-passing. It introduces a transformer-based Universal Row Encoder that uses schema metadata to generate invariant row embeddings, enabling better generalization across databases and improving convergence on RelBench benchmarks.

Native binary embeddings outperform post-hoc binarization

OpenBioRQ: Benchmark for Agentic Biomedical Research Faithfulness

I built a novel triple-hybrid LLM under 1B parameters for ~$50

LLM Alignment Using Implicit User Feedback

LLM Alignment Using Implicit User Feedback

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

Text2DSL: LLM-Based Code Generation for Domain-Specific Languages

Baidu's Unlimited-OCR Transcribes Dozens of Pages in One Forward Pass

PaperClaw: Autonomous Research with Human-in-the-Loop Refinement

TeaNet Improves Few-Shot Learning in Vibrational Spectroscopy

Topological Neural Dynamics: Neuron-wise Sequence Modeling

NASDAQ: Normalized Observation Space Dynamics-Augmented Q-Learning

MedTS-TTT: Test-Time Training for Medical Time Series

KaLM-Reranker-V1: Fast and Efficient Document Reranking

Ramanujan Graph Rewiring Alleviates GNN Over-Squashing

SOHET: Transformer for Heterogeneous Event Streams

Graph-of-Differences for Anatomy-Structured MedReID

Functional Orthogonality Ensures Identifiability in Unsupervised Disentanglement

Universal Encoders for Modular Relational Deep Learning