A text-only large language model can be adapted into a Vision--Language--Action controller by using ASCII-rendered visual observations. This approach allows LLMs to interpret visual states through text, enabling them to follow natural-language instructions and generate executable actions in both simulation and on physical manipulators.