Research paper — korshunov.ai

Topic · Research paper

Mateusz has developed a full pre-trained language model, Project Inkblot's Titan v1, combining Mamba SSM, Multi-Head Attention, and 32-expert MoE in a single decoder-only architecture under 1B parameters. The model, trained on a single NVIDIA L4 GPU for ~$50, achieves 27.5 validation perplexity and demonstrates efficient scaling via a single-line config update, with all components implemented from scratch in PyTorch. Titan v2's first training cycle is now complete, and dataset expansion is underway.

arxiv arXiv cs.LG · 6d ago

LLM Alignment Using Implicit User Feedback

A new dataset, IFLLM, collects mouse trajectories and eye gazing data from users interacting with LLMs. It shows that implicit feedback significantly improves LLM alignment, boosting text-based reward model accuracy from 55% to 64% and nearly tripling response quality improvements after DPO training on eight LLMs.

arxiv arXiv cs.CL · 6d ago

LLM Alignment Using Implicit User Feedback

arxiv arXiv cs.AI · 6d ago

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization

ScaffoldAgent introduces a utility-guided framework for dynamic outline optimization in open-ended deep research. It models outline evolution through Expansion, Contraction, and Revision operations, guided by a feedback mechanism that evaluates retrieval gain, structural coherence, and generation quality. Experiments show it improves long-form report generation and factual grounding compared to existing agents.

arxiv arXiv cs.CL · 9d ago

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

As enterprise agent tool catalogs scale from 10 to 110 agents, routing accuracy drops 16--23 percentage points on under-specified requests. An oracle analysis identifies retrieval and confusion gaps, with embedding-based shortlisting recovering +10--11pp F1. A human-annotated study of 1,435 utterances confirms real-world recovery of +10--17pp despite lower absolute performance.

arxiv arXiv cs.CL · 3d ago

IPA-Based Tokenization Improves Multilingual Language Model Performance

A new approach uses the International Phonetic Alphabet to create language-agnostic tokenizers for multilingual models. Training matched text and IPA subword tokenizers across 24 languages and 14 scripts shows IPA tokenizers enhance tokenization quality, particularly for non-Latin scripts, and generalize better to unseen languages and scripts.

arxiv arXiv cs.CL · 3d ago

ConceptE: LLM-Enhanced Event Ontology Expansion

ConceptE introduces a framework that uses large language models to derive concept-level semantics from event triggers, enabling more coherent event clustering and reliable hierarchy expansion. Experiments on ACE, ERE, and MAVEN show ConceptE outperforms existing methods, with up to 12.37\% improvement in BCubed-F1 and 6.48\% in Taxo_F1.

arxiv arXiv cs.CL · 3d ago

Multilabel Emotion Annotation: Agreement and Soft Voting Analysis

A case study evaluates how annotator variation and aggregation methods affect multilabel emotion annotation. The paper shows that soft vote-share labels, including intensity-weighted variants, better capture annotator uncertainty and improve model alignment with empirical variance compared to hard labels.

arxiv arXiv cs.CL · 3d ago

FiLM-Coordinated Dual-Branch Transformer for Language Modeling

A new Transformer architecture introduces separate global and local branches for language modeling, using FiLM to dynamically coordinate them. Experiments show it outperforms single-branch and weakened dual-branch models on small datasets like TinyShakespeare and WikiText-2, with stable results across multiple seeds and channel-selective modulation patterns.

arxiv arXiv cs.CL · 3d ago

Synthetic Audio Framework Improves ATC Speech Recognition

A synthetic audio generation framework is introduced to address data scarcity in Air Traffic Control speech recognition. It uses neural techniques like Text-to-Speech and accent conversion to simulate non-native English accents, enhancing Automatic Speech Recognition performance. Experiments with the Whisper model on the ATCO2 corpus show reduced word error rates when fine-tuned with synthetic or mixed real-synthetic data.

arxiv arXiv cs.CL · 3d ago

Economic Shifts and Cultural Evolution in French Drama

French drama shows a shift from aristocratic to bourgeois themes as capitalism developed. Bourgeois themes responded to GDP shocks starting in the 18th century, with household economic concerns becoming responsive only after 1820. Peer effects and economic sensitivity jointly explain this transition, supported by simulations.

arxiv arXiv cs.CL · 3d ago

Two-Stage Alignment Improves Math Tutoring Pedagogy

A two-stage alignment pipeline enhances large language models' pedagogical performance in math mistake remediation. The approach combines supervised fine-tuning with Direct Preference Optimization using synthetic data on scaffolding and factuality, outperforming base and existing tutoring models in both accuracy and teaching quality. Human evaluations show the model competes with a proprietary baseline, offering greater openness and reproducibility.

arxiv arXiv cs.CL · 3d ago

PeerMathDial: First Dataset on Student-Student Math Problem Solving

PeerMathDial is the first dataset of peer collaborative math problem-solving dialogues from middle school classrooms. It includes 55 dialogues from 27 students, totaling 6,406 turns, and features a corpus-grounded dialogue act taxonomy. The dataset enables research on dialogue evolution, student trait-behavior links, and LLM performance in simulating student interactions.

media r/LocalLLaMA · 3d ago

TMax: A Simple Recipe for Terminal Agents

TMax introduces TMax-15k, a dataset of 14,600 RL environments, over 2.5× larger than the next-largest open terminal dataset. It also presents a simple RL recipe that trains open models from 2B to 27B parameters, with TMax-9B achieving 27.2% on Terminal Bench 2.0 and TMax-27B reaching 42.7%.

lab Hugging Face Blog · 4d ago

Can You Beat LoRA in Fine-Tuning?

A new study explores alternatives to LoRA, the most popular fine-tuning technique, assessing whether other methods can achieve better performance with less computational cost. The research finds that while some approaches show promise, none consistently outperform LoRA across diverse tasks and datasets.

media r/LocalLLaMA · 6d ago

The Eagle3 has landed for Qwen

The Eagle3 speculative decoding model is now available in llama.cpp's latest release via --spec-type draft-eagle3. It requires a draft model, such as Ex0bit-Qwen3.6-27B-PRISM-EAGLE3-GGUF, and can be used with -md or --model-draft. Performance is comparable to draft-mtp, though tensor parallelism is not supported and VRAM usage is higher.

media r/LocalLLaMA · 6d ago

Ohio State University releases open-source Deep Research agent QUEST-35B

Ohio State University's NLP team has released QUEST-35B, an open-source Deep Research agent trained on approximately 32 H100 GPUs using 8,000 synthetic samples. The team open-sourced the training recipe, code, weights, and datasets, with benchmark results showing competitive performance compared to leading closed-source Deep Research systems.

media r/LocalLLaMA · 6d ago

Ohio State University releases open-source Deep Research agent QUEST-35B

Researchers at Ohio State University trained QUEST-35B, a Deep Research agent, using approximately 32 H100 GPUs and 8,000 synthetic samples. They open-sourced the training recipe, code, weights, and datasets, with benchmark results showing competitive performance compared to leading closed-source Deep Research systems.

arxiv arXiv cs.AI · 6d ago

DeepSWIP: Counterfactual Reasoning in Neural Probabilistic Logic

DeepSWIP introduces a single-world counterfactual semantics for DeepProbLog, enabling causal reasoning through neural materialization and weighted model counting. It achieves exact inference under finite grounding and unique-supported-model assumptions, with experiments showing a 2.14× speedup and improved calibration over DeepTwin and AIPW estimators.

arxiv arXiv cs.LG · 6d ago

Agentic Symbolic Search for PDE Solution Characterization

ASYS proposes a prior-guided framework that uses mathematical theory and evolutionary search to generate interpretable symbolic forms of PDE solutions. It produces analytical representations for complex problems like Allen-Cahn dynamics and Keller-Segel blow-up, offering new pathways for mathematical analysis beyond traditional methods.

I built a novel triple-hybrid LLM under 1B parameters for ~$50

LLM Alignment Using Implicit User Feedback

LLM Alignment Using Implicit User Feedback

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization

Routing Accuracy Degradation and Recovery in Enterprise Agent Systems

IPA-Based Tokenization Improves Multilingual Language Model Performance

ConceptE: LLM-Enhanced Event Ontology Expansion

Multilabel Emotion Annotation: Agreement and Soft Voting Analysis

FiLM-Coordinated Dual-Branch Transformer for Language Modeling

Synthetic Audio Framework Improves ATC Speech Recognition

Economic Shifts and Cultural Evolution in French Drama

Two-Stage Alignment Improves Math Tutoring Pedagogy

PeerMathDial: First Dataset on Student-Student Math Problem Solving

TMax: A Simple Recipe for Terminal Agents

Can You Beat LoRA in Fine-Tuning?

The Eagle3 has landed for Qwen

Ohio State University releases open-source Deep Research agent QUEST-35B

Ohio State University releases open-source Deep Research agent QUEST-35B

DeepSWIP: Counterfactual Reasoning in Neural Probabilistic Logic

Agentic Symbolic Search for PDE Solution Characterization