GPT-2 Architecture and Training Details: Parameters & Cross-Entropy Loss

Explore the original GPT-2 model’s architecture, including its training on WebText, BPE tokenizer, hidden dimensions, layer parameters, and the cross-entropy loss formulation.


This content originally appeared on HackerNoon and was authored by Reinforcement Technology Advancements

Abstract and 1 Introduction

2 Related Work

3 Model and 3.1 Associative memories

3.2 Transformer blocks

4 A New Energy Function

4.1 The layered structure

5 Cross-Entropy Loss

6 Empirical Results and 6.1 Empirical evaluation of the radius

6.2 Training GPT-2

6.3 Training Vanilla Transformers

7 Conclusion and Acknowledgments

\ Appendix A. Deferred Tables

Appendix B. Some Properties of the Energy Functions

Appendix C. Deferred Proofs from Section 5

Appendix D. Transformer Details: Using GPT-2 as an Example

\ References

Appendix D. Transformer Details: Using GPT-2 as an Example

\ Another commonly used loss is the perplexity, which is equivalent to the exponentiated version of the cross-entropy.

\

:::info Authors:

(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;

(2) Bo Bai baibo (8@huawei.com);

(3) Lei Deng (deng.lei2@huawei.com);

(4) Wei Han (harvey.hanwei@huawei.com).

:::


:::info This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Reinforcement Technology Advancements


Print Share Comment Cite Upload Translate Updates
APA

Reinforcement Technology Advancements | Sciencx (2025-06-24T03:00:07+00:00) GPT-2 Architecture and Training Details: Parameters & Cross-Entropy Loss. Retrieved from https://www.scien.cx/2025/06/24/gpt-2-architecture-and-training-details-parameters-cross-entropy-loss/

MLA
" » GPT-2 Architecture and Training Details: Parameters & Cross-Entropy Loss." Reinforcement Technology Advancements | Sciencx - Tuesday June 24, 2025, https://www.scien.cx/2025/06/24/gpt-2-architecture-and-training-details-parameters-cross-entropy-loss/
HARVARD
Reinforcement Technology Advancements | Sciencx Tuesday June 24, 2025 » GPT-2 Architecture and Training Details: Parameters & Cross-Entropy Loss., viewed ,<https://www.scien.cx/2025/06/24/gpt-2-architecture-and-training-details-parameters-cross-entropy-loss/>
VANCOUVER
Reinforcement Technology Advancements | Sciencx - » GPT-2 Architecture and Training Details: Parameters & Cross-Entropy Loss. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/06/24/gpt-2-architecture-and-training-details-parameters-cross-entropy-loss/
CHICAGO
" » GPT-2 Architecture and Training Details: Parameters & Cross-Entropy Loss." Reinforcement Technology Advancements | Sciencx - Accessed . https://www.scien.cx/2025/06/24/gpt-2-architecture-and-training-details-parameters-cross-entropy-loss/
IEEE
" » GPT-2 Architecture and Training Details: Parameters & Cross-Entropy Loss." Reinforcement Technology Advancements | Sciencx [Online]. Available: https://www.scien.cx/2025/06/24/gpt-2-architecture-and-training-details-parameters-cross-entropy-loss/. [Accessed: ]
rf:citation
» GPT-2 Architecture and Training Details: Parameters & Cross-Entropy Loss | Reinforcement Technology Advancements | Sciencx | https://www.scien.cx/2025/06/24/gpt-2-architecture-and-training-details-parameters-cross-entropy-loss/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.