The article asks if recent inference performance boosts from technologies like dSpark, dflash, MTP, and QAT are sufficient to make model spillover to disk more tolerable.

  • The author notes that spillover typically causes a drop from 4-5 tokens per second to 0.5 tokens per second.
  • The text inquires if these speed boosters push inference speeds high enough to maintain barely acceptable performance during spillover.
  • It seeks user experiences regarding the viability of using dSpark combined with disk spillover.

The article does not provide a conclusion, as it is a question seeking community feedback on current performance benchmarks.