Research paper — korshunov.ai

Topic · Research paper

Talos is an open-source tool that automates iterative reanalysis of genomic data to identify rare disease diagnoses. It achieved a 90% recovery rate of in-scope diagnoses with only 1.3 candidate variants per patient, and delivered 241 new diagnoses across 5,000 undiagnosed patients, with most new findings emerging within 32 days of evidence publication.

media Hugging Face Forums · 1d ago

Native binary embeddings outperform post-hoc binarization

A small-scale experiment shows that native binary embedding models achieve better retrieval than post-hoc binarization of float models. At SciFact Recall@10, native binary models (2048-dim and 4096-dim) outperform post-hoc binary models by 17% and 25% respectively, with significant speed and memory advantages in indexing.

arxiv arXiv cs.CL · 2d ago

OpenBioRQ: Benchmark for Agentic Biomedical Research Faithfulness

OpenBioRQ introduces a benchmark of 12,553 unsolved biomedical research questions across 12 domains, designed to test agentic models' faithfulness and abstention. It evaluates models in a tool-using setting without answer keys, using real follow-up evidence rather than parametric knowledge, and reveals significant agentic collapse on the hardest questions where tools are no longer used despite being critical.

media Hugging Face Forums · 3d ago

I built a novel triple-hybrid LLM under 1B parameters for ~$50

Mateusz has developed a full pre-trained language model, Project Inkblot's Titan v1, combining Mamba SSM, Multi-Head Attention, and 32-expert MoE in a single decoder-only architecture under 1B parameters. The model, trained on a single NVIDIA L4 GPU for ~$50, achieves 27.5 validation perplexity and demonstrates efficient scaling via a single-line config update, with all components implemented from scratch in PyTorch. Titan v2's first training cycle is now complete, and dataset expansion is underway.

arxiv arXiv cs.LG · 6d ago

LLM Alignment Using Implicit User Feedback

A new dataset, IFLLM, collects mouse trajectories and eye gazing data from users interacting with LLMs. It shows that implicit feedback significantly improves LLM alignment, boosting text-based reward model accuracy from 55% to 64% and nearly tripling response quality improvements after DPO training on eight LLMs.

arxiv arXiv cs.CL · 6d ago

LLM Alignment Using Implicit User Feedback

arxiv arXiv cs.AI · 6d ago

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization

ScaffoldAgent introduces a utility-guided framework for dynamic outline optimization in open-ended deep research. It models outline evolution through Expansion, Contraction, and Revision operations, guided by a feedback mechanism that evaluates retrieval gain, structural coherence, and generation quality. Experiments show it improves long-form report generation and factual grounding compared to existing agents.

arxiv arXiv cs.CL · 8d ago

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

As enterprise agent tool catalogs scale from 10 to 110 agents, routing accuracy drops 16--23 percentage points on under-specified requests. An oracle analysis identifies retrieval and confusion gaps, with embedding-based shortlisting recovering +10--11pp F1. A human-annotated study of 1,435 utterances confirms real-world recovery of +10--17pp despite lower absolute performance.

arxiv arXiv cs.AI · 15h ago

SCOPE: Self-Adaptive Symbolic Planning for Open-Ended Environments

SCOPE introduces a framework that refines action plans and evolves symbolic world models in open-ended environments. It combines a Symbolic Execution Simulator and a Self-Adaptive Symbolic Memory to improve plan completeness, perturbation resilience, and cross-task adaptability.

arxiv arXiv cs.AI · 16h ago

Grounded Scaling: Determinism as a Core Limit in Agentic AI

Agentic AI performance degrades exponentially in non-deterministic environments, with k-step success falling as δ^k when per-step determinism δ < 1. The paper introduces a framework linking environment determinism to task success, verifiability, and skill evolution, proposing a Supply Certainty Index and a five-level Determinism Maturity Model. It challenges prevailing views by identifying determinism as a binding constraint across compute, data, embodiment, and alignment.

arxiv arXiv cs.AI · 17h ago

Concept-Constrained Prompt Learning for Few-Shot CLIP Adaptation

CCPL introduces a lightweight framework that anchors class prompts to frozen concept prototypes, improving few-shot CLIP adaptation. It achieves better base-to-new performance on DTD and EuroSAT compared to CoOp, with consistent gains from text-space concept regularization, while maintaining neutrality on OxfordPets. The method uses concept dropout and controllable ensemble fusion at inference, with results sensitive to dataset semantics and protocol.

arxiv arXiv cs.AI · 17h ago

Context-Aware Distillation and Ablation for Text2DSL

A new Text2DSL system uses context-aware distillation with a structured context of BNF grammar, API specification, and closed identifier vocabulary. Ablation studies show that the vocabulary has the largest impact on semantic quality, while API and BNF significantly improve structural validity, confirming structured context as a critical, load-bearing component.

arxiv arXiv cs.AI · 18h ago

Text2DSL: LLM-Based Code Generation for Domain-Specific Languages

This paper introduces Text2DSL, a distinct task of generating domain-specific language code from natural language. Using the PolkitBench dataset of 4,204 validated pairs, it shows that structured context—such as BNF grammar and API specs—boosts syntactic and structural validity and CodeBLEU scores by 60% to 95% across different LLM models, without fine-tuning.

media r/LocalLLaMA · 18h ago

Baidu's Unlimited-OCR Transcribes Dozens of Pages in One Forward Pass

Baidu has released Unlimited-OCR, a model that transcribes dozens of pages in a single forward pass using Reference Sliding Window Attention (R-SWA). It builds on DeepSeek-OCR, inheriting its encoder, image compression, and MoE architecture, with only 500M active parameters per token. The model achieves 93.92% accuracy on OmniDocBench v1.6, outperforming DeepSeek-OCR's 87.01% on v1.5, though vendor-reported results warrant independent validation.

arxiv arXiv cs.AI · 18h ago

PaperClaw: Autonomous Research with Human-in-the-Loop Refinement

PaperClaw is a multi-agent system that autonomously conducts research from field selection to paper publication. It uses a validated, iterative propose-test-reflect loop, grounded in real references and runnable results, and supports human-in-the-loop refinement at any stage. Evaluation shows it produces strong papers both autonomously and with human oversight.

arxiv arXiv cs.LG · 18h ago

TeaNet Improves Few-Shot Learning in Vibrational Spectroscopy

TeaNet, a task-enhanced augmentation network, reconstructs randomly masked spectra to generate augmented samples that preserve original spectral features while introducing domain-specific variations. This approach enables deep neural networks to identify discriminant wavenumbers more effectively, outperforming CNNs by 17% in challenging synthetic scenarios and offering improved interpretability in few-shot learning tasks.

arxiv arXiv cs.LG · 18h ago

Topological Neural Dynamics: Neuron-wise Sequence Modeling

Topological Neural Dynamics (TND) introduces a neuron-wise framework for sequence modeling, where each neuron evolves independently through a directed graph structure. In a single-player Pong behavior cloning task, TND achieves a mean of 17.47 consecutive catches per round, surpassing all baseline models by more than three times.

arxiv arXiv cs.LG · 18h ago

NASDAQ: Normalized Observation Space Dynamics-Augmented Q-Learning

NASDAQ addresses low-dimensional observation challenges in reinforcement learning by normalizing observation spaces to balance reconstruction losses. It integrates value learning with short-term value and next observation prediction, achieving competitive or superior performance with less training time across domains.

arxiv arXiv cs.LG · 18h ago

MedTS-TTT: Test-Time Training for Medical Time Series

MedTS-TTT introduces a test-time training framework for medical time series classification. Built on CLSA-TTT and a Gated Convolutional Backbone, it enables rapid, single-step adaptation without iterative optimization. On four public datasets, it achieves 11 top-1 rankings out of 12 evaluations across nine baselines and three metrics.

media r/LocalLLaMA · 19h ago

KaLM-Reranker-V1: Fast and Efficient Document Reranking

KaLM-Reranker-V1 is a fast but not late-interaction reranker that decouples query and passage computation while maintaining strong relevance modeling through cross-attention. It achieves state-of-the-art performance on BEIR, outperforms industrial models like Qwen3-Reranker, and shows excellent results on MIRACL and LMEB, with the 0.27B Nano model remaining competitive against 7-12B models.

Talos: Automated Genomic Reanalysis for Rare Disease Diagnosis

Native binary embeddings outperform post-hoc binarization

OpenBioRQ: Benchmark for Agentic Biomedical Research Faithfulness

I built a novel triple-hybrid LLM under 1B parameters for ~$50

LLM Alignment Using Implicit User Feedback

LLM Alignment Using Implicit User Feedback

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

SCOPE: Self-Adaptive Symbolic Planning for Open-Ended Environments

Grounded Scaling: Determinism as a Core Limit in Agentic AI

Concept-Constrained Prompt Learning for Few-Shot CLIP Adaptation

Context-Aware Distillation and Ablation for Text2DSL

Text2DSL: LLM-Based Code Generation for Domain-Specific Languages

Baidu's Unlimited-OCR Transcribes Dozens of Pages in One Forward Pass

PaperClaw: Autonomous Research with Human-in-the-Loop Refinement

TeaNet Improves Few-Shot Learning in Vibrational Spectroscopy

Topological Neural Dynamics: Neuron-wise Sequence Modeling

NASDAQ: Normalized Observation Space Dynamics-Augmented Q-Learning

MedTS-TTT: Test-Time Training for Medical Time Series

KaLM-Reranker-V1: Fast and Efficient Document Reranking