Research paper
arxiv arXiv cs.CL · 1d ago

UOL@IDEM Submits L1-Aware Vocabulary Prediction Model

UOL@IDEM presents a closed-track submission to BEA 2026, modeling vocabulary difficulty prediction as regression for Spanish, German, and Chinese. The system integrates multilingual contextual embeddings with engineered features like frequency and cognate similarity, achieving lower RMSE scores than baselines, with feature analysis highlighting frequency as the most stable predictor and contextual predictability as a key L1-sensitive signal.

arxiv arXiv cs.CL · 1d ago

AdversaBench: Automated LLM Red-Teaming with Multi-Judge Confirmation

AdversaBench introduces an end-to-end red-teaming pipeline that generates adversarial prompts via five structured operators, evaluates target models, and confirms failures through a three-judge panel with meta-judge tiebreaker. Experiments on 45 seed prompts across reasoning, instruction-following, and tool use show every seed produces a confirmed failure, with operator effectiveness, failure iteration counts, judge agreement, and cross-model transferability revealing key patterns in LLM vulnerability.

arxiv arXiv cs.AI · 1d ago

MedLayXPlain: Benchmarking Expert-Lay Gap in Medical Vision-Language Models

MedLayXPlain introduces the first large-scale benchmark for medical lay language generation, featuring 122,789 region-grounded samples across eight imaging modalities. It evaluates medical vision-language models on expert-lay alignment using a hierarchical ontology system and a lightweight evaluator, revealing a systematic gap: expert-level performance in captioning coexists with significant degradation in lay language, while general-purpose models lack clinical precision.

arxiv arXiv cs.AI · 1d ago

QBioFusion-QSAR: Quantum Kernel Learning for Small-Data Ligand Classification

QBioFusion-QSAR integrates a quantum fidelity kernel with Morgan/Tanimoto fingerprints to improve ligand classification. On the PsychLight-A benchmark, QMKL increased accuracy and MCC compared to Morgan/Tanimoto alone, with improvements attributed to better predictions of molecules with activity cliffs, such as N-Me-5-HT and N-Me-tryptamine. Auditable analysis confirms localized quantum-kernel contributions in small-data settings.

arxiv arXiv cs.AI · 1d ago

SOHET: Self-Supervised Transformer for Heterogeneous Event Streams

SOHET introduces a hierarchical transformer architecture with event-type-specific tabular encoders and self-supervised pre-training objectives. It outperforms existing methods by 5.8% on Booking.com's fraud detection task and achieves faster convergence with 2.4% additional gain from pre-training. On the EBES benchmark, bidirectional SOHET matches or exceeds the best published results on six out of eight tasks.

arxiv arXiv cs.AI · 1d ago

Graph-of-Differences for Anatomy-Structured MedReID

Graph-of-Differences (GoD) introduces anatomy-graph representations to enable medical image re-identification with explicit structural grounding. It computes differences across named anatomical regions and aligns them with global backbone differences, providing clinically auditable, structure-level explanations. GoD improves Rank-1 accuracy by 7.1 pp on fundus and 3.1 pp on CXR, with better performance on zero-shot transfers.

arxiv arXiv cs.AI · 1d ago

Functional Orthogonality Ensures Identifiability in Unsupervised Disentanglement

The paper proves that latent concepts can be identified in unsupervised learning through functional orthogonality, an orthogonality constraint on the generative mapping's Jacobian. This condition enables identifiability in general nonlinear models without needing statistical independence or causal assumptions, as long as the latent domain supports all factor combinations. Experiments with normalizing flows confirm reliable recovery of true factors, offering a viable foundation for disentangled representation learning.