The article asks if recent inference performance boosts from technologies like dSpark, dflash, MTP, and QAT are sufficient to make model spillover to disk more tolerable.
- The author notes that spillover typically causes a drop from 4-5 tokens per second to 0.5 tokens per second.
- The text inquires if these speed boosters push inference speeds high enough to maintain barely acceptable performance during spillover.
- It seeks user experiences regarding the viability of using dSpark combined with disk spillover.
The article does not provide a conclusion, as it is a question seeking community feedback on current performance benchmarks.