Safety & alignment
arxiv arXiv cs.LG · 7d ago

Generalised Eigenvalue Geometry of Semantic Adversarial Attacks

A new theory models how semantic paraphrases can fool financial sentiment classifiers by analyzing the worst-case displacement of target model representations. The attackability index λ*(x) is derived from the largest generalised eigenvalue of a matrix pencil (A,B), offering closed-form predictions and robustness certificates for affine readouts. The framework connects continuous perturbation theory to discrete paraphrase search, with empirical validation on real financial text classifiers.

arxiv arXiv cs.LG · 7d ago

Conceptual Innovation in Medical Imaging AI

A new perspective argues that medical imaging AI research should prioritize conceptual innovation—reframing problems, evaluation metrics, and clinical relevance—over algorithmic improvements alone. The article highlights that current academic incentives undervalue conceptual contributions, leading to misaligned objectives and limited real-world impact, and offers recommendations for researchers, mentors, and journals to better support such innovation.

arxiv arXiv cs.LG · 7d ago

MC Dropout Uncertainty Alignment Insufficient for Clinical Safety in Glioma Segmentation

A study on 126 BraTS21 patients finds that while MC Dropout achieves strong uncertainty-error alignment, it fails to detect critical calibration issues in enhancing tumour regions. The UNet-Res model shows near-zero entropy and high ECE in these clinically vital areas, with a low Dice score of 0.714, indicating severe miscalibration invisible to standard metrics like Dice and AUROC. These results highlight that uncertainty alignment alone is insufficient for clinical safety and that region-specific calibration must be evaluated alongside standard metrics.

arxiv arXiv cs.LG · 7d ago

Wasserstein Policy Learning for Distributional Outcomes

This paper introduces offline policy learning for distribution-valued outcomes, where rewards are derived from utility functionals applied to Wasserstein barycenters. It establishes statistical guarantees using IPW and DR estimators, proving finite-sample regret with leading dependence \widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(\Pi)/N}) and provides a minimax lower bound confirming the sharpness of this rate.