Train language models to reason through structured knowledge graphs before answering questions. This pipeline uses ORPO (preference learning) followed by Graph-GRPO (graph reinforcement learning) to ...