Direct Preference Optimization: Your Language Model is Secretly a Reward Model Post date May 28, 2025 Post author By Takara Taniguchi Post categories In ai, algorithms, machinelearning, nlp