Multimodal — korshunov.ai

Multimodal Page 6 / 8

Latent SDEs for Anomaly Detection in Sparse Multivariate Time Series

We propose a generative method using Latent SDEs to detect anomalies in sparse and irregular multivariate time series. The approach projects observed data onto continuous-time stochastic systems, handling missing values and irregular sampling while capturing cyclic patterns. Experiments on six benchmark datasets show our method achieves top performance, outperforming state-of-the-art baselines, especially under severe data sparsity.

arxiv arXiv cs.LG · 7d ago

ViGOS: Decoupling Perception and Reasoning in Multimodal On-Policy Self-Distillation

ViGOS introduces a visually grounded on-policy self-distillation framework for multimodal large language models. It decouples perception and reasoning by using an image-only teacher for visual descriptions and a reasoning teacher for final outputs, reducing reliance on text-only references. This approach improves image-grounded performance across multiple vision-language benchmarks.

arxiv arXiv cs.LG · 7d ago

INDEQS: Graph-Informed Neural Controlled Differential Equations

INDEQS introduces a graph-based neural controlled differential equation framework that incorporates prior directed graph knowledge at architectural levels. It separates inner and outer mixing, offering both graph-constrained and data-adaptive variants, with outer informedness reducing mean absolute error on larger graphs, while inner informedness provides parameter efficiency for known adjacency adherence. Continuous decoders outperform discrete ones in real-world traffic and hydrological forecasting tasks.

arxiv arXiv cs.LG · 7d ago

ChronoSurv: A Graph Framework for Multimodal Survival Analysis

ChronoSurv introduces a hierarchical directed graph framework that models patient care as a progression-aware clinical trajectory. It achieves state-of-the-art performance in multimodal survival prediction by capturing structured clinical workflows and handling missing data through heterogeneous message passing.

arxiv arXiv cs.CL · 7d ago

Fair Cognitive Impairment Detection Through Unlearning

A multimodal framework combines speech, text, and image data with gradient reversal unlearning to reduce demographic bias in Mild Cognitive Impairment detection. The method outperforms existing multilingual and multimodal baselines on TAUKADIAL and PREPARE, with reduced performance gaps across sex and language subgroups, and shows improved transfer across datasets.

arxiv arXiv cs.CL · 7d ago

Morpheus: Neural Tokenizer and Embedder for Turkish

Morpheus is a morphology-aware neural tokenizer and word embedder for Turkish that preserves original text through lossless encoding and decoding. It achieves the lowest bits-per-character (1.425), improves morphological alignment (MorphScore macro-F1 0.61), and uses 19% less GPU memory than 64K-vocabulary subword tokenizers. Frozen Morpheus embeddings outperform BGE-M3 and BERTurk in lexical retrieval, with root-family MAP of 0.85 and ROC-AUC of 1.00.

arxiv arXiv cs.CL · 7d ago

SAMA: Unified Framework for Low-Resource Multimodal Data Augmentation

SAMA introduces a unified framework that generates high-fidelity, task-aware synthetic data by aligning semantic anchors across modalities. It uses a Collaborative Multi-Experts Multimodal Large Language Model with shared and task-specific adapters, and employs an Anchor-Preserving Diffusion mechanism for image synthesis, ensuring semantic consistency while diversifying visual contexts. Extensive experiments show SAMA outperforms state-of-the-art methods in MNER, MRE, and MEE under low-resource conditions.

arxiv arXiv cs.CL · 7d ago

RPCL Improves Multimodal Emotion-Cause Pair Extraction

RPCL, a training-only framework, enhances pair confidence in multimodal emotion-cause pair extraction by enforcing discriminative and stable confidence margins. It outperforms a base model on ECF, MECAD, and MEC4 by 2.58 to 2.83 percentage points in Pair F1 and improves mean Pair AUPRC across datasets, with stronger separation between gold pairs and hard negatives.

arxiv arXiv cs.CL · 7d ago

Steerable Model Merging for Multilingual Reasoning

Steerable Model Merging (ST-Merge) introduces a gated cross-attention mechanism to adaptively weight source models during multilingual reasoning. It outperforms existing baselines on four multilingual reasoning benchmarks across 21 languages by dynamically prioritizing models based on input characteristics.

arxiv arXiv cs.CL · 7d ago

IndicContextEval: Benchmark for Context Utilisation in Audio LLMs

IndicContextEval introduces a 56-hour multilingual benchmark featuring natural speech from 555 speakers across 8 Indian languages and 23 domains. It employs a 7-level prompting framework to progressively test context utilisation, including metadata, descriptions, and adversarial inputs. Evaluation of five models shows significant differences in contextual grounding, underscoring the need for explicit assessment of context use in AudioLLMs.

arxiv arXiv cs.AI · 7d ago

PID Feedback Control for Interpretable Activation Steering in Music Generation

This paper proposes a Dual Steering framework using Gram-Schmidt Orthogonalization to decouple Pitch and Duration control in symbolic music generation. By isolating latent directions via DiffMean and applying PID feedback, it enables deterministic, independent modulation of signal attributes without retraining, reducing conceptual interference and signal degradation.

arxiv arXiv cs.AI · 7d ago

SHIFT: Reducing Language Bias in Multilingual Information Retrieval

SHIFT is a training-free method that mitigates language bias in multilingual information retrieval by using parallel translation pairs to estimate relative language vectors. It corrects language-specific offsets in document embeddings during indexing, improving retrieval performance across diverse models and benchmarks.

arxiv arXiv cs.AI · 7d ago

KinemaForge: URDF Synthesis from RGB-D Sequences

KinemaForge jointly infers part-level shape, joint topology, and parameters from RGB-D sequences using a kinematic constraint graph and differentiable screw-axis solver. It validates results with an energy-consistent verifier, reducing joint-axis error and simulation drift while improving closed-loop manipulation success by 14.6 percentage points over Ditto.

arxiv arXiv cs.AI · 7d ago

BeliefDiffusion: Generative-Model Predictive Planning for Navigation

BeliefDiffusion combines diffusion models for multimodal belief representation with Model Predictive Control for long-term navigation planning. It outperforms model-free reinforcement learning and other generative methods in navigation success and path efficiency in partially observable environments.

arxiv arXiv cs.AI · 7d ago

RTSGameBench: An RTS Benchmark for Strategic Reasoning

RTSGameBench addresses limitations in existing RTS benchmarks by offering diverse gameplay, targeted competency diagnosis, and self-evolving scenario generation. It evaluates vision-language models in strategic reasoning under uncertainty, revealing that state-of-the-art models struggle with multiagent coordination and large-scale tasks.

arxiv arXiv cs.AI · 7d ago

Quantum GAN Augmentation Shows No Benefit in Brain MRI

A controlled benchmark found no significant performance gain from quantum generative models in brain MRI augmentation. Synthetic samples produced by quantum and classical GANs were statistically indistinguishable, with both showing mode collapse and off-distribution samples, especially at low data fractions. The study concludes that quantum augmentation does not provide meaningful data expansion and acts more as regularization.

arxiv arXiv cs.AI · 7d ago

ThinkDeception: Interpretable Multimodal Deception Detection Framework

ThinkDeception introduces a progressive reinforcement learning framework that enables interpretable multimodal deception detection. It leverages a step-by-step annotated Chain of Thought dataset and proposes Visual-Audio Consistency Group Relative Policy Optimization with a dynamic curriculum, enhancing reasoning quality and outperforming existing methods on mainstream benchmarks.

arxiv arXiv cs.AI · 7d ago

Equivariant Graph Neural Networks Improve Optical Spectra Prediction

Equivariant graph neural networks outperform existing models in predicting optical spectra for materials screening. The adapted GotenNet achieves superior performance, especially in the 0-8 eV range and for static real permittivity prediction, critical for thin-film optics.

media r/LocalLLaMA · 7d ago

Lemonade v10.8 Releases Auto Memory Management, Cloud Offload, and MCP Tool Support

Lemonade v10.8 introduces dynamic VRAM management that auto-unloads idle models and downsizes KV-cache to reclaim GPU memory. It adds cloud offload support for OpenAI-compatible providers, enabling local-first model serving with optional cloud routing. A new MCP gateway exposes local models as tools via POST /mcp, allowing local models to be used as tools in MCP-aware applications.

arxiv arXiv cs.CL · 8d ago

Source Language Effects in Cross-Lingual In-Context Learning

A study finds that fine-tuning-based assumptions about cross-lingual transfer do not apply to in-context learning. The research reveals that source language selection in ICL requires new heuristics, especially for generative tasks where language confusion is a key challenge.