Benchmark for tiny LLMs on natural language file search

A benchmark evaluates small LLMs (0.3B–3B params) on parsing natural language queries into structured JSON, focusing on file type, temporal context, specificity, and combined queries. Results show models with 0.8B–1.5B parameters outperform sub-0.5B ones, with the project aiming to expand the test set and explore fine-tuning for improved performance.