Code generation
media r/LocalLLaMA · 6d ago

Real-world token cost savings from rtk, headroom, and caveman

A real workload analysis shows headroom, rtk, and caveman reduce token costs by 2.8%, 0.5%, and 0.4% respectively, totaling 3.7% of baseline spending. However, savings are limited by payload diversity, with most traffic being plain text or source code, and the tools only compress structured outputs. Most cost reduction occurs on the cheapest token stream—cache reads—while the tools do not affect prompt caching or output costs, and coverage gaps exist, especially for rtk.

arxiv arXiv cs.AI · 7d ago

Trade-offs in Medical LLM Adaptation: French QA Study

A study compares continual pretraining (CPT), supervised fine-tuning (SFT), and their combination for French medical QA. CPT+SFT performs best in multiple-choice QA, though gains over SFT are small and often insignificant, making SFT a cost-effective default. For open-ended QA, CPT improves metrics while SFT degrades quality, with instruction tuning and CPT+SFT favored by LLM-based evaluations. Cross-lingual results show effective transfer from French to English benchmarks.