All articles
arxiv arXiv cs.CL · 4h ago

Joint Transcription and Decryption of Images of Encrypted Handwritten Documents: A Comparison with the Traditional Pipeline

Researchers propose Direct Image Decryption, an end-to-end approach that maps encrypted manuscript images directly to plaintext, bypassing the intermediate transcription stage used in traditional pipelines. Using the Copiale cipher as a case study, the authors compare this joint architecture against the conventional two-stage method of transcription followed by decryption.

arxiv arXiv cs.CL · 4h ago

Mitigating Position Bias in Transformers via Layer-Specific Positional Embedding Scaling

Researchers introduce layer-specific positional embedding scaling (LPES) to address the "lost-in-the-middle" problem in large language models, where critical information in long-context inputs is often underrepresented. This method assigns distinct scaling factors to each transformer layer to achieve a more balanced attention distribution without requiring parameter fine-tuning or increasing inference delay.

arxiv arXiv cs.CL · 5h ago

Triadic Werewolf: A Jester Role for Multi-Hop Theory of Mind in LLMs

Researchers extended the Werewolf game with a Jester role to create a triadic social-deduction environment that requires reasoning across three opposing utility functions, challenging large language models' theory-of-mind capabilities. Evaluations on GPT-4.1, DeepSeek-V3.1, and Llama-3.3-70B revealed that while the Jester won 60-70% of games, GPT-4.1 wolves frequently voted the Jester out on day 1 in 60-70% of cases, a self-defeating action driven by language priors.

arxiv arXiv cs.CL · 6h ago

An Empirical Analysis of Factual Errors in Human-Written Text and its Application

This study addresses the neglect of factual error detection in human-written text by distilling a taxonomy of errors from newspaper article corrections, revealing categories like kanji misconversions that are absent in current hallucination benchmarks. The authors evaluate vanilla large language models on synthesized test cases and real corrections to assess their performance on this specific task.