Multimodal
arxiv arXiv cs.LG · 6d ago

VibrantForests framework maps forest structure at 10-meter resolution

The VibrantForests framework uses satellite data trained on lidar samples to generate annual, wall-to-wall maps of canopy cover, height, biomass, basal area, and quadratic mean diameter at 10-meter resolution across the contiguous U.S. It improves accuracy by reducing overestimation in sparse forests and underestimation in dense forests, extending the range of reliable predictions beyond traditional passive-sensor models.

arxiv arXiv cs.LG · 6d ago

De-biased VLM-as-3D-Judge Protocol for Furniture Generation

A de-biased VLM-based judge protocol specializes TRELLIS on furniture generation using lightweight adaptation. The protocol addresses failure modes like image overload and geometry-hiding, with calibration showing 0.83–1.0 win rates and base-vs-base symmetry at 0.5. Among six adaptation methods, conditioner repair under severe degradation achieves parity with the base model, while no method exceeds a 65% win-rate target.

arxiv arXiv cs.CL · 6d ago

NEST: Dataset for Narrative Event Structures in Long Videos

NEST introduces a dataset of 1005 full-length movies, each annotated with 102 multimodal narrative events grounded in visual, dialogue, and audio content. The dataset captures event relationships such as temporal ordering, hierarchy, and long-range dependencies, with benchmark tasks showing low performance in event detection and localization, and higher performance in event relation extraction after fine-tuning.

arxiv arXiv cs.CL · 6d ago

CzechDocs: Parallel Dataset for Minority Language Document Translation

CzechDocs is a multiway parallel dataset of formatted documents in HTML, DOCX, and PDF formats, covering Czech and minority languages such as Ukrainian, English, Vietnamese, and Russian. It supports evaluation of machine translation systems that preserve document formatting, with a validation subset and evaluation toolkit publicly released. A held-out test split will be used for a future shared task on document-level translation with formatting preservation.

media r/LocalLLaMA · 6d ago

LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M Released

LFM2.5-Embedding-350M is a dense bi-encoder that provides fast multilingual retrieval with one vector per document, achieving best-in-class accuracy for its size and inference speed comparable to smaller models. LFM2.5-ColBERT-350M is a late interaction retriever with best-in-class multilingual accuracy, enabling cross-lingual retrieval by storing one vector per token and supporting retrieval in multiple languages with high precision. Both models are designed as drop-in replacements for existing RAG pipelines.

arxiv arXiv cs.AI · 7d ago

Clinician-Centered Pipeline for Ultrasound AI Annotation and Evaluation

A new pipeline enables clinicians to perform remote annotation and blinded evaluation of ultrasound AI models without local data downloads. It supports multi-rater participation, result aggregation, and automated statistical analysis, validated in a fetal ultrasound segmentation study with six raters of varying expertise. Results show moderate to strong agreement and a preference for later active learning models in blinded rankings.

arxiv arXiv cs.LG · 7d ago

Inductive Biases in ML Emulation of Sudden Stratospheric Warmings

A study evaluates how architectural inductive biases affect machine learning emulators' ability to capture sudden stratospheric warming dynamics in idealized simulations. Results show that three-dimensional vertical coupling is a key bias, with model performance diverging significantly during active SSW-like variability. However, low forecast error does not ensure accurate wave-mean-flow interactions, as coherent errors persist in stratospheric wave-driving structure.