Research paper — korshunov.ai

Topic · Research paper

A small-scale experiment shows that native binary embedding models achieve better retrieval than post-hoc binarization of float models. At SciFact Recall@10, native binary models (2048-dim and 4096-dim) outperform post-hoc binary models by 17% and 25% respectively, with significant speed and memory advantages in indexing.

arxiv arXiv cs.CL · 2d ago

OpenBioRQ: Benchmark for Agentic Biomedical Research Faithfulness

OpenBioRQ introduces a benchmark of 12,553 unsolved biomedical research questions across 12 domains, designed to test agentic models' faithfulness and abstention. It evaluates models in a tool-using setting without answer keys, using real follow-up evidence rather than parametric knowledge, and reveals significant agentic collapse on the hardest questions where tools are no longer used despite being critical.

media Hugging Face Forums · 3d ago

I built a novel triple-hybrid LLM under 1B parameters for ~$50

Mateusz has developed a full pre-trained language model, Project Inkblot's Titan v1, combining Mamba SSM, Multi-Head Attention, and 32-expert MoE in a single decoder-only architecture under 1B parameters. The model, trained on a single NVIDIA L4 GPU for ~$50, achieves 27.5 validation perplexity and demonstrates efficient scaling via a single-line config update, with all components implemented from scratch in PyTorch. Titan v2's first training cycle is now complete, and dataset expansion is underway.

arxiv arXiv cs.LG · 6d ago

LLM Alignment Using Implicit User Feedback

A new dataset, IFLLM, collects mouse trajectories and eye gazing data from users interacting with LLMs. It shows that implicit feedback significantly improves LLM alignment, boosting text-based reward model accuracy from 55% to 64% and nearly tripling response quality improvements after DPO training on eight LLMs.

arxiv arXiv cs.CL · 6d ago

LLM Alignment Using Implicit User Feedback

arxiv arXiv cs.AI · 6d ago

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization

ScaffoldAgent introduces a utility-guided framework for dynamic outline optimization in open-ended deep research. It models outline evolution through Expansion, Contraction, and Revision operations, guided by a feedback mechanism that evaluates retrieval gain, structural coherence, and generation quality. Experiments show it improves long-form report generation and factual grounding compared to existing agents.

arxiv arXiv cs.CL · 8d ago

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

As enterprise agent tool catalogs scale from 10 to 110 agents, routing accuracy drops 16--23 percentage points on under-specified requests. An oracle analysis identifies retrieval and confusion gaps, with embedding-based shortlisting recovering +10--11pp F1. A human-annotated study of 1,435 utterances confirms real-world recovery of +10--17pp despite lower absolute performance.

arxiv arXiv cs.LG · 21h ago

Ramanujan Graph Rewiring Alleviates GNN Over-Squashing

Ramanujan Propagation uses Ramanujan graphs to reduce over-squashing in Graph Neural Networks by ensuring non-negative resistance curvature. The method preserves local connectivity while enabling efficient long-range information flow, outperforming nine state-of-the-art rewiring techniques.

arxiv arXiv cs.LG · 21h ago

SOHET: Transformer for Heterogeneous Event Streams

SOHET introduces a hierarchical transformer architecture with event-type-specific tabular encoders and self-supervised pre-training. It outperforms existing methods by 5.8% on Booking.com's fraud detection task and achieves state-of-the-art results on 6 out of 8 EBES benchmark tasks.

arxiv arXiv cs.LG · 21h ago

Graph-of-Differences for Anatomy-Structured MedReID

Graph-of-Differences (GoD) introduces anatomy-structured difference alignment for medical image re-identification. It represents images as anatomy graphs, computes differences over matched anatomical regions, and anchors retrieval signals to homologous structures. GoD improves Rank-1 accuracy by 7.1 pp on fundus and 3.1 pp on CXR, with better generalization in zero-shot settings.

arxiv arXiv cs.LG · 21h ago

Functional Orthogonality Ensures Identifiability in Unsupervised Disentanglement

The paper proves that locally orthogonal directions in generative models guarantee latent factor identifiability without needing statistical independence or causal assumptions. Experiments with orthogonality-regularized normalizing flows confirm reliable recovery of true latent factors, challenging prior claims about unsupervised disentanglement impossibility.

arxiv arXiv cs.LG · 22h ago

Universal Encoders for Modular Relational Deep Learning

The paper proposes a modular relational deep learning approach that decouples row encoding from graph message-passing. It introduces a transformer-based Universal Row Encoder that uses schema metadata to generate invariant row embeddings, enabling better generalization across databases and improving convergence on RelBench benchmarks.

arxiv arXiv cs.LG · 22h ago

JS Divergence Enhances GRPO Autoregressive Text-to-Image Alignment

A study introduces JS divergence in GRPO-style autoregressive text-to-image alignment, showing it effectively balances policy optimization and generation diversity. Experiments on LlamaGen and Janus-7B demonstrate JS divergence achieves top or competitive performance across metrics while preserving diverse outputs.

arxiv arXiv cs.LG · 23h ago

Deep Learning with O(log N) Parallel Time Complexity

Hierarchical Block-Local Learning (HBLL) enables deep neural network training in O(log N) parallel time complexity, eliminating the need for full backpropagation. HBLL decomposes networks into hierarchically linked blocks and achieves competitive performance on vision and language tasks, with extensions to recurrent architectures.

arxiv arXiv cs.LG · 23h ago

Privacy-Preserving Federated Temporal Graph Learning for Cyber-Resilient IoMT

The paper introduces Federated TGCN-A2C, a privacy-preserving framework that achieves 99.48% and 99.61% test accuracy on CICDDoS 2019 and TON-IoT benchmarks, outperforming Fed-Inforce-Fusion by 0.21 percentage points. It includes anomaly detection, digital twin-based scoring, adaptive action selection, and an enhanced honeypot layer, with all major attack classes achieving F1 scores above 0.92 and 0.94, respectively, and provides post-hoc explainability via SHAP, LIME, Grad-CAM, and counterfactual analysis.

arxiv arXiv cs.CL · 1d ago

AI-PAVE-Br: LLM-Based PAVE for Brazilian E-Commerce

AI-PAVE-Br uses large language models to enhance product attribute value extraction in Brazilian e-commerce. The system outperforms traditional NER methods, with a new Golden Set dataset providing a manually annotated benchmark for Portuguese product data.

arxiv arXiv cs.CL · 1d ago

DREAM: Autoregressive Training for Dense Retrieval Embeddings

DREAM uses autoregressive next-token prediction to supervise dense retrieval embedding training. It injects query-document similarity scores into a frozen LLM's attention heads, enabling gradient backpropagation for retriever optimization. DREAM outperforms baselines on BEIR and RTEB benchmarks across model scales.

arxiv arXiv cs.CL · 1d ago

CANDLE: Lightweight Arabic Noise Deduplication via CTC

CANDLE is a lightweight system that uses Connectionist Temporal Classification to deduplicate repeated characters in Arabic text, without relying on handcrafted rules or morphological analyzers. It achieves a Sentence Error Rate of 5.37% and reduces tokenizer fertility by up to 12.8%, lowering inference costs and improving context window usage.

arxiv arXiv cs.CL · 1d ago

Micro-Transaction Markets for Verified Product Info in Agentic E-Commerce

Autonomous agents in e-commerce face a scarcity of trustworthy product information, not product matching. A proposed micro-transaction model allows agents to pay fractions of a cent to access verified data like service histories and test reports, with pricing and trust scored via reputation. This system prioritizes genuine product quality and real-time information acquisition over chatbot fluency.

arxiv arXiv cs.CL · 1d ago

L3Cube-MahaPOS: Marathi POS Tagging Dataset and BERT Models

L3Cube-MahaPOS introduces a gold-standard part-of-speech tagging dataset for Marathi, manually annotated with 32,354 sentences from news text. It includes a 16-tag Universal Dependencies scheme and benchmarks six model families, achieving 88.67% token-level accuracy and 81.67% macro-F1 on 15 tag classes using MahaBERT-v2.

Native binary embeddings outperform post-hoc binarization

OpenBioRQ: Benchmark for Agentic Biomedical Research Faithfulness

I built a novel triple-hybrid LLM under 1B parameters for ~$50

LLM Alignment Using Implicit User Feedback

LLM Alignment Using Implicit User Feedback

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

Ramanujan Graph Rewiring Alleviates GNN Over-Squashing

SOHET: Transformer for Heterogeneous Event Streams

Graph-of-Differences for Anatomy-Structured MedReID

Functional Orthogonality Ensures Identifiability in Unsupervised Disentanglement

Universal Encoders for Modular Relational Deep Learning

JS Divergence Enhances GRPO Autoregressive Text-to-Image Alignment

Deep Learning with O(log N) Parallel Time Complexity

Privacy-Preserving Federated Temporal Graph Learning for Cyber-Resilient IoMT

AI-PAVE-Br: LLM-Based PAVE for Brazilian E-Commerce

DREAM: Autoregressive Training for Dense Retrieval Embeddings

CANDLE: Lightweight Arabic Noise Deduplication via CTC

Micro-Transaction Markets for Verified Product Info in Agentic E-Commerce

L3Cube-MahaPOS: Marathi POS Tagging Dataset and BERT Models