Evaluation & benchmarks
arxiv arXiv cs.LG · 23h ago

TeaNet Improves Few-Shot Learning in Vibrational Spectroscopy

TeaNet, a task-enhanced augmentation network, reconstructs randomly masked spectra to generate augmented samples that preserve original spectral features while introducing domain-specific variations. This approach enables deep neural networks to identify discriminant wavenumbers more effectively, outperforming CNNs by 17% in challenging synthetic scenarios and offering improved interpretability in few-shot learning tasks.

arxiv arXiv cs.LG · 23h ago

TASER: Task-Differentiated Skill Expansion for Heterogeneous Continual Learning

TASER introduces a framework that dynamically expands and routes atomic skills for continual learning across highly heterogeneous tasks. It reduces catastrophic forgetting and improves plasticity by ensuring semantic distinctness and efficient capacity allocation through skill detection and routing mechanisms. Evaluated on HeteroCLBench, a benchmark with 19 diverse tasks across 9 cognitive dimensions, TASER outperforms existing baselines.

arxiv arXiv cs.LG · 1d ago

DataClaw0: Agentic Tailoring of Multimodal Data from Raw Streams

DataClaw0 introduces an agentic paradigm for actively refining raw multimodal data to align with user and downstream intents. It uses a two-stage pipeline grounded in factual anchors to generate a large-scale dataset across five domains and combines supervised fine-tuning with GRPO to achieve strong alignment with complex refinement tasks. Evaluated on video generation, VQA, and GUI navigation, DataClaw0 produces high-information-density tailored data, enabling efficient model adaptation with minimal training data.

arxiv arXiv cs.LG · 1d ago

VLA-FAIL: Lightweight Failure Detection for Vision-Language-Action Models

VLA-FAIL introduces a lightweight, failure detection framework for vision-language-action models that uses last-layer Mahalanobis distance and action chunk consistency without requiring failure data or expensive action sampling. The framework combines these detectors to achieve reliable, early failure detection across diverse tasks, outperforming baseline methods in both accuracy and efficiency.

arxiv arXiv cs.LG · 1d ago

Deep Learning Fuses Satellite Data with Meteorological Features for Soil Moisture Estimation

A study validates a Cross-Correlation Function method to identify optimal temporal and depth lags between meteorological variables and soil moisture. Using satellite and meteorological data across seven agricultural plots in southeastern Spain, deep learning models achieved significant improvements: a per-pixel CNN reached R² = 0.877, while a CNN-LSTM hybrid achieved the highest overall performance with R² = 0.930. Subsurface depth information and meteorological features substantially enhanced estimation accuracy.