Validating Theoretical Loss Bound: Vanilla Transformer Experiments

Explore the training dynamics of vanilla Transformer models on the 2M token Question-Formation dataset, analyzing how their cross-entropy losses stabilize during training.


This content originally appeared on HackerNoon and was authored by Reinforcement Technology Advancements

Abstract and 1 Introduction

2 Related Work

3 Model and 3.1 Associative memories

3.2 Transformer blocks

4 A New Energy Function

4.1 The layered structure

5 Cross-Entropy Loss

6 Empirical Results and 6.1 Empirical evaluation of the radius

6.2 Training GPT-2

6.3 Training Vanilla Transformers

7 Conclusion and Acknowledgments

\ Appendix A. Deferred Tables

Appendix B. Some Properties of the Energy Functions

Appendix C. Deferred Proofs from Section 5

Appendix D. Transformer Details: Using GPT-2 as an Example

\ References

6.3 Training Vanilla Transformers

We next train vanilla transformer models using a small amount of high-quality data. The of Question-Formation dataset, proposed by McCoy et al. (2020), consists of pairs of English sentences in declarative formation and their corresponding question formation. The dataset contains D = 2M tokens. The sentences are context-free with a vocabulary size of 68 words, and the task is to convert declarative sentences into questions.

\

\

:::info Authors:

(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;

(2) Bo Bai baibo (8@huawei.com);

(3) Lei Deng (deng.lei2@huawei.com);

(4) Wei Han (harvey.hanwei@huawei.com).

:::


:::info This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Reinforcement Technology Advancements


Print Share Comment Cite Upload Translate Updates
APA

Reinforcement Technology Advancements | Sciencx (2025-06-22T16:00:16+00:00) Validating Theoretical Loss Bound: Vanilla Transformer Experiments. Retrieved from https://www.scien.cx/2025/06/22/validating-theoretical-loss-bound-vanilla-transformer-experiments/

MLA
" » Validating Theoretical Loss Bound: Vanilla Transformer Experiments." Reinforcement Technology Advancements | Sciencx - Sunday June 22, 2025, https://www.scien.cx/2025/06/22/validating-theoretical-loss-bound-vanilla-transformer-experiments/
HARVARD
Reinforcement Technology Advancements | Sciencx Sunday June 22, 2025 » Validating Theoretical Loss Bound: Vanilla Transformer Experiments., viewed ,<https://www.scien.cx/2025/06/22/validating-theoretical-loss-bound-vanilla-transformer-experiments/>
VANCOUVER
Reinforcement Technology Advancements | Sciencx - » Validating Theoretical Loss Bound: Vanilla Transformer Experiments. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/06/22/validating-theoretical-loss-bound-vanilla-transformer-experiments/
CHICAGO
" » Validating Theoretical Loss Bound: Vanilla Transformer Experiments." Reinforcement Technology Advancements | Sciencx - Accessed . https://www.scien.cx/2025/06/22/validating-theoretical-loss-bound-vanilla-transformer-experiments/
IEEE
" » Validating Theoretical Loss Bound: Vanilla Transformer Experiments." Reinforcement Technology Advancements | Sciencx [Online]. Available: https://www.scien.cx/2025/06/22/validating-theoretical-loss-bound-vanilla-transformer-experiments/. [Accessed: ]
rf:citation
» Validating Theoretical Loss Bound: Vanilla Transformer Experiments | Reinforcement Technology Advancements | Sciencx | https://www.scien.cx/2025/06/22/validating-theoretical-loss-bound-vanilla-transformer-experiments/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.