Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss

These experiments with GPT-2 medium on OpenWebText validate the radius hypothesis from our theoretical framework, measuring activation distances in the last layer for next-token prediction.


This content originally appeared on HackerNoon and was authored by Reinforcement Technology Advancements

Abstract and 1 Introduction

2 Related Work

3 Model and 3.1 Associative memories

3.2 Transformer blocks

4 A New Energy Function

4.1 The layered structure

5 Cross-Entropy Loss

6 Empirical Results and 6.1 Empirical evaluation of the radius

6.2 Training GPT-2

6.3 Training Vanilla Transformers

7 Conclusion and Acknowledgments

\ Appendix A. Deferred Tables

Appendix B. Some Properties of the Energy Functions

Appendix C. Deferred Proofs from Section 5

Appendix D. Transformer Details: Using GPT-2 as an Example

\ References

6 Empirical Results

We explore the hypothesis regarding the radius r in Section 5 using a pre-trained GPT-2 medium model. Additionally, we train various GPT-2 small models and vanilla Transformer models to analyze their cross-entropy losses.

6.1 Empirical evaluation of the radius

\ Figure 3: Cross-entropy loss of GPT-2 small model trained on (left) 100%, (middle) 1%, and (right) 0.1% of OpenWebText-9B dataset with a typical training time.

\

:::info Authors:

(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;

(2) Bo Bai baibo (8@huawei.com);

(3) Lei Deng (deng.lei2@huawei.com);

(4) Wei Han (harvey.hanwei@huawei.com).

:::


:::info This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

:::

1. available at https://github.com/openai/gpt-2


This content originally appeared on HackerNoon and was authored by Reinforcement Technology Advancements


Print Share Comment Cite Upload Translate Updates
APA

Reinforcement Technology Advancements | Sciencx (2025-06-21T17:00:03+00:00) Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss. Retrieved from https://www.scien.cx/2025/06/21/empirical-results-gpt-2-analysis-of-transformer-memorization-loss/

MLA
" » Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss." Reinforcement Technology Advancements | Sciencx - Saturday June 21, 2025, https://www.scien.cx/2025/06/21/empirical-results-gpt-2-analysis-of-transformer-memorization-loss/
HARVARD
Reinforcement Technology Advancements | Sciencx Saturday June 21, 2025 » Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss., viewed ,<https://www.scien.cx/2025/06/21/empirical-results-gpt-2-analysis-of-transformer-memorization-loss/>
VANCOUVER
Reinforcement Technology Advancements | Sciencx - » Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/06/21/empirical-results-gpt-2-analysis-of-transformer-memorization-loss/
CHICAGO
" » Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss." Reinforcement Technology Advancements | Sciencx - Accessed . https://www.scien.cx/2025/06/21/empirical-results-gpt-2-analysis-of-transformer-memorization-loss/
IEEE
" » Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss." Reinforcement Technology Advancements | Sciencx [Online]. Available: https://www.scien.cx/2025/06/21/empirical-results-gpt-2-analysis-of-transformer-memorization-loss/. [Accessed: ]
rf:citation
» Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss | Reinforcement Technology Advancements | Sciencx | https://www.scien.cx/2025/06/21/empirical-results-gpt-2-analysis-of-transformer-memorization-loss/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.