Open weights
arxiv arXiv cs.AI · 6d ago

UFP4: Uniform 4-Bit Training Overcomes Shrinkage Bias in LLM Pretraining

A study identifies shrinkage bias in E2M1-based FP4 formats due to geometric asymmetry, causing multiplicative error accumulation and training instability. The proposed UFP4 recipe uses uniform E1M2/INT4 grids and applies Random Hadamard Transform to all GEMMs, achieving lower loss degradation than E2M1 baselines in large-scale LLM pretraining. The authors recommend E1M2/INT4 as a first-class training primitive for future accelerators.

arxiv arXiv cs.AI · 6d ago

Attention-Guided Deep Learning for Interpretable Sperm Morphology Classification

A new deep learning framework combines EfficientNet-B0 with CBAM to improve accuracy and interpretability in sperm morphology classification. Evaluated on SMIDS and HuSHem datasets, it achieves 90.2% and 93.9% accuracy with macro F1 scores of 0.913 and 0.948, outperforming baseline models. Grad-CAM++ visualizations enable transparent feature analysis, supporting clinical adoption in fertility clinics.

arxiv arXiv cs.AI · 6d ago

How Transparent is DiffusionGemma?

DiffusionGemma has poor variable transparency due to high opaque serial depth, but this can be mitigated by an interpretable token bottleneck, reducing serial depth to 1.1X that of Gemma 4. Algorithmic transparency is more challenging in diffusion models due to dynamic token predictions, with early evidence of non-chronological reasoning, token smearing, and intermediate-context reasoning. DiffusionGemma is found to be similarly monitorable to Gemma 4.

arxiv arXiv cs.LG · 6d ago

How Transparent is DiffusionGemma?

DiffusionGemma has poor variable transparency due to high opaque serial depth, but this can be mitigated by an interpretable token bottleneck, reducing serial depth to 1.1X that of Gemma 4. Algorithmic transparency is more challenging in diffusion models due to dynamic token changes, though case studies reveal novel phenomena like non-chronological reasoning and intermediate-context reasoning. DiffusionGemma is found to be similarly monitorable to Gemma 4.

arxiv arXiv cs.AI · 6d ago

Essay Quality Representations in LLMs Found to Be Linearly Accessible

A study reveals that essay quality information in large language models is encoded in linearly accessible forms within their hidden representations. These representations emerge layer-by-layer, remain stable across prompts, and show partial transfer across different essay prompts, with longer essays relying more on deeper model layers. The research identifies specific 'essay scoring neurons' whose activation strongly correlates with scores and can be influenced by targeted interventions.