Question on whether dSpark, dflash, MTP, and QAT mitigate inference speed loss from model spillover to disk

The article asks if recent inference performance boosts from technologies like dSpark, dflash, MTP, and QAT are sufficient to make model spillover to disk more tolerable.

The author notes that spillover typically causes a drop from 4-5 tokens per second to 0.5 tokens per second.
The text inquires if these speed boosters push inference speeds high enough to maintain barely acceptable performance during spillover.
It seeks user experiences regarding the viability of using dSpark combined with disk spillover.

The article does not provide a conclusion, as it is a question seeking community feedback on current performance benchmarks.