Extending Direct Nash Optimization for Regularized Preferences

This content originally appeared on HackerNoon and was authored by Language Models (dot tech)

:::info Authors:

(1) Corby Rosset, Microsoft Research and Correspondence to corbyrosset@microsoft.com;

(2) Ching-An Cheng, Microsoft Research;

(3) Arindam Mitra, Microsoft Research;

(4) Michael Santacroce, Microsoft Research;

(5) Ahmed Awadallah, Microsoft Research and Correspondence to hassanam@microsoft.com;

(6) Tengyang Xie, Microsoft Research and Correspondence to tengyangxie@microsoft.com.

:::

Table of Links

Abstract and 1 Introduction

2 Preliminaries

2.1 RLHF Based on Reward Models

2.2 RLHF with General Preferences

3 Direct Nash Optimization and 3.1 Derivation of Algorithm 1

3.2 Theoretical Analysis

4 Practical Algorithm – Iterative Contrastive Self-Improvement

5 Experiments and 5.1 Experimental Setup

5.2 Results and Analysis

6 Related Work

7 Conclusion and References

\ Appendix

A Extension to Regularized Preferences

B Detailed Proofs

C Additional Experimental Details

A Extension to Regularized Preferences

In this section, we discuss how to extend the DNO framework to the case of regularized preferences (defined in Eq. (5)),

\ which was first introduced and solved by Munos et al. (2023) via Nash-MD introduced earlier.

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

This content originally appeared on HackerNoon and was authored by Language Models (dot tech)

Print Share Comment Cite Upload Translate Updates

APA

Language Models (dot tech) | Sciencx (2025-04-17T13:00:02+00:00) Extending Direct Nash Optimization for Regularized Preferences. Retrieved from https://www.scien.cx/2025/04/17/extending-direct-nash-optimization-for-regularized-preferences/

MLA

" » Extending Direct Nash Optimization for Regularized Preferences." Language Models (dot tech) | Sciencx - Thursday April 17, 2025, https://www.scien.cx/2025/04/17/extending-direct-nash-optimization-for-regularized-preferences/

HARVARD

Language Models (dot tech) | Sciencx Thursday April 17, 2025 » Extending Direct Nash Optimization for Regularized Preferences., viewed ,<https://www.scien.cx/2025/04/17/extending-direct-nash-optimization-for-regularized-preferences/>

VANCOUVER

Language Models (dot tech) | Sciencx - » Extending Direct Nash Optimization for Regularized Preferences. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/04/17/extending-direct-nash-optimization-for-regularized-preferences/

CHICAGO

" » Extending Direct Nash Optimization for Regularized Preferences." Language Models (dot tech) | Sciencx - Accessed . https://www.scien.cx/2025/04/17/extending-direct-nash-optimization-for-regularized-preferences/

IEEE

" » Extending Direct Nash Optimization for Regularized Preferences." Language Models (dot tech) | Sciencx [Online]. Available: https://www.scien.cx/2025/04/17/extending-direct-nash-optimization-for-regularized-preferences/. [Accessed: ]

rf:citation

» Extending Direct Nash Optimization for Regularized Preferences | Language Models (dot tech) | Sciencx | https://www.scien.cx/2025/04/17/extending-direct-nash-optimization-for-regularized-preferences/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

A Extension to Regularized Preferences

Related Posts