A developer has released an offline, single-file HTML tool that estimates which local large language models will fit on a specific GPU configuration and predicts their token generation speed. The tool is designed to answer the common question of whether a custom PC build can run desired models effectively, without requiring a backend or user account.
- The capability estimator calculates resident size, VRAM fit status, and estimated decode/prefill speeds based on memory bandwidth, calibrated against real measured data from NVIDIA RTX 3090s.
- Price records include provenance indicators (sourced, estimate, or stale) to prevent silent errors, with live tax and shipping calculations.
- Users can paste product URLs to fetch prices via a CORS proxy, with weekly auto-refreshes handled by a GitHub Action.
- The tool corrects for Mixture of Experts (MoE) models by tracking active parameters rather than total parameters for decode speed estimation.
- Reference builds are included, such as a $2.2k single-3090 starter and a 4x RTX PRO 6000 rig, to help users visualize potential configurations.
This tool helps local AI enthusiasts accurately spec hardware for their desired model workloads by providing calibrated performance estimates and transparent pricing data without relying on external servers.