A study finds that artificial agents learn visual word meanings best when concepts are perceptually close, with acquisition accuracy strongly predicted by perceptual distance (partial R² = 0.245). Bidirectional evaluations reveal that retrieval performance depends on exemplar-based memory, not prototype matching, and frozen visual embeddings enable grounding while limiting learning without representational changes.