Train language models to reason through structured knowledge graphs before answering questions. This pipeline uses ORPO (preference learning) followed by Graph-GRPO (graph reinforcement learning) to ...
This project implements a Vision-Language Model (VLM)-based pipeline for safer and more aligned text-to-image (T2I) generation by iteratively refining user prompts. The core idea is to evaluate both ...