PETRA transforms public web text into a curated petroleum engineering corpus with synthetic supervision for dense retrieval and reranking. It improves in-domain nDCG from 0.703 to 0.763 and boosts Earth Science benchmark performance by 44% and a six-task reasoning panel by 23%.
PETRA: Dataset and Pipeline for Petroleum Engineering Text Adaptation
from English