A proposal suggests splitting model architecture into a stable base model and lightweight, swappable worker models. The base model handles core reasoning and acts as a platform, while worker models provide domain-specific knowledge through runtime hot-plugging, similar to LoRA but for knowledge rather than behavior.
Proposal for splitting base models to avoid retraining
from English