A new method uses large language models as fallible raters in a panel to measure political positions in regions with sparse data. Adding written axis definitions improves score consistency and agreement among raters, while Krippendorff's alpha of 0.86 indicates high reliability across models and labs. Disagreements highlight interpretive issues, suggesting the method detects referent problems rather than measurement errors.