All articles
arxiv arXiv cs.LG · 13d ago

UNIEGO: Proxy-Mediated Unified Egocentric Video Representation

UNIEGO introduces a hierarchical multi-teacher distillation framework that uses proxy models to mediate knowledge transfer from nine diverse teachers across viewpoints and modalities. The Selective Proxy Distillation (SPD) stage adaptively selects reliable proxies during training, improving representation quality and stability. UNIEGO achieves state-of-the-art results in action recognition, video retrieval, and action segmentation on ego-exo benchmarks.

arxiv arXiv cs.LG · 13d ago

How Transparent is DiffusionGemma?

DiffusionGemma has poor variable transparency due to high opaque serial depth, but this can be mitigated by an interpretable token bottleneck, reducing serial depth to 1.1X that of Gemma 4. Algorithmic transparency is more challenging in diffusion models due to dynamic token changes, though case studies reveal novel phenomena like non-chronological reasoning and intermediate-context reasoning. DiffusionGemma is found to be similarly monitorable to Gemma 4.

arxiv arXiv cs.CL · 13d ago

StylisticBias: Visual Cues Drive Most Social Biases in MLLMs

StylisticBias introduces a controlled benchmark to evaluate attribute-level social bias in multimodal large language models. It reveals that age and body type dominate identity-level effects, while fashion style and 15 key visual attributes drive most bias, accounting for nearly 80% of variation. The benchmark highlights that model judgments are most sensitive to appearance-related cues, especially in socioeconomic and style-based contexts.

arxiv arXiv cs.AI · 13d ago

Lean as Process-Verified Reward Oracle in RL for Theorem Proving

This work shows that Lean can serve as a symbolic process oracle, providing fine-grained, verified feedback during reinforcement learning. By parsing proof attempts into tactic sequences and using Lean's elaboration to mark sound steps and first failures, the system generates dense, type-theoretic reward signals. Experiments demonstrate tactic-level supervision outperforms outcome-only methods on benchmarks like MiniF2F and ProofNet, highlighting Lean's role as both evaluator and training reward source.

arxiv arXiv cs.AI · 13d ago

EEG Foundation Models for Burst-Suppression Detection in ICU

A study evaluates EEG Foundation Models for event-based burst-suppression detection in ICU settings without patient-specific calibration. REVE-base achieved the highest event-based F1-score of 0.868 and reduced burst-per-minute error by 52.1% compared to EEGNet and 36.2% compared to adaptive thresholding, demonstrating superior performance. Ablation results show full fine-tuning outperforms other strategies, and pretrained REVE-base surpasses random initialization by 0.723 F1 points at 25% labeled data, highlighting the value of pretraining for limited datasets.

arxiv arXiv cs.AI · 13d ago

Hidden Evolution of Disguised Visual Context in VLMs

Visual tokens enter large language models as raw, unstructured signals. Their internal transformation and integration depend on architecture—either as in-context prompts or injected into intermediate layers—leading to distinct evolution paths in visual representation and frequency characteristics. We find that attention alone is insufficient; performance is driven by the quality of visual representations at each layer across different integration paradigms.