Fine-Tuning GPT-2 for IMDb Sentiment Analysis

This section details the experimental setup for sentiment analysis using IMDb data, focusing on the use of GPT-2 and RoBERTa models. The process includes supervised fine-tuning, generating preference pairs, and training an RLHF reward model. Larger models and hyper-parameters from the TRL library were used to enhance text quality and reward accuracy.


This content originally appeared on HackerNoon and was authored by Writings, Papers and Blogs on Text Models

:::info Authors:

(1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier;

(2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier;

(3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier;

(4) Stefano Ermon, CZ Biohub;

(5) Christopher D. Manning, Stanford University;

(6) Chelsea Finn, Stanford University.

:::

Abstract and 1. Introduction

2 Related Work

3 Preliminaries

4 Direct Preference Optimization

5 Theoretical Analysis of DPO

6 Experiments

7 Discussion, Acknowledgements, and References

Author Contributions

\ A Mathematical Derivations

A.1 Deriving the Optimum of the KL-Constrained Reward Maximization Objective

A.2 Deriving the DPO Objective Under the Bradley-Terry Model

A.3 Deriving the DPO Objective Under the Plackett-Luce Model

A.4 Deriving the Gradient of the DPO Objective and A.5 Proof of Lemma 1 and 2

A.6 Proof of Theorem 1

\ B DPO Implementation Details and Hyperparameters

\ C Further Details on the Experimental Set-Up and C.1 IMDb Sentiment Experiment and Baseline Details

C.2 GPT-4 prompts for computing summarization and dialogue win rates

C.3 Unlikelihood baseline

\ D Additional Empirical Results

D.1 Performance of Best of N baseline for Various N and D.2 Sample Responses and GPT-4 Judgments

D.3 Human study details

C Further Details on the Experimental Set-Up

In this section, we include additional details relevant to our experimental design.

C.1 IMDb Sentiment Experiment and Baseline Details

The prompts are prefixes from the IMDB dataset of length 2-8 tokens. We use the pre-trained sentiment classifier siebert/sentiment-roberta-large-english as a ground-truth reward model and gpt2-large as a base model. We use these larger models as we found the default ones to generate low-quality text and rewards to be somewhat inaccurate. We first use supervised fine-tuning on a subset of the IMDB data for 1 epoch. We then use this model to sample 4 completions for 25000 prefixes and create 6 preference pairs for each prefix using the ground-truth reward model. The RLHF reward model is initialized from the gpt2-large model and trained for 3 epochs on the preference datasets, and we take the checkpoint with the highest validation set accuracy. The “TRL” run uses the hyper-parameters in the TRL library. Our implementation uses larger batch samples of 1024 per PPO step.

\

:::info This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Writings, Papers and Blogs on Text Models


Print Share Comment Cite Upload Translate Updates
APA

Writings, Papers and Blogs on Text Models | Sciencx (2024-08-26T20:45:14+00:00) Fine-Tuning GPT-2 for IMDb Sentiment Analysis. Retrieved from https://www.scien.cx/2024/08/26/fine-tuning-gpt-2-for-imdb-sentiment-analysis/

MLA
" » Fine-Tuning GPT-2 for IMDb Sentiment Analysis." Writings, Papers and Blogs on Text Models | Sciencx - Monday August 26, 2024, https://www.scien.cx/2024/08/26/fine-tuning-gpt-2-for-imdb-sentiment-analysis/
HARVARD
Writings, Papers and Blogs on Text Models | Sciencx Monday August 26, 2024 » Fine-Tuning GPT-2 for IMDb Sentiment Analysis., viewed ,<https://www.scien.cx/2024/08/26/fine-tuning-gpt-2-for-imdb-sentiment-analysis/>
VANCOUVER
Writings, Papers and Blogs on Text Models | Sciencx - » Fine-Tuning GPT-2 for IMDb Sentiment Analysis. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/08/26/fine-tuning-gpt-2-for-imdb-sentiment-analysis/
CHICAGO
" » Fine-Tuning GPT-2 for IMDb Sentiment Analysis." Writings, Papers and Blogs on Text Models | Sciencx - Accessed . https://www.scien.cx/2024/08/26/fine-tuning-gpt-2-for-imdb-sentiment-analysis/
IEEE
" » Fine-Tuning GPT-2 for IMDb Sentiment Analysis." Writings, Papers and Blogs on Text Models | Sciencx [Online]. Available: https://www.scien.cx/2024/08/26/fine-tuning-gpt-2-for-imdb-sentiment-analysis/. [Accessed: ]
rf:citation
» Fine-Tuning GPT-2 for IMDb Sentiment Analysis | Writings, Papers and Blogs on Text Models | Sciencx | https://www.scien.cx/2024/08/26/fine-tuning-gpt-2-for-imdb-sentiment-analysis/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.