My micro-benchmark: how good are LLMs at simulating wetting behaviour?
The author benchmarks LLMs in simulating wetting behaviour using Surface Evolver, a 1992 tool for modeling liquid surfaces. LLMs are evaluated objectively by comparing their generated datafiles against reference implementations, with results showing pass counts and token costs for each model.