Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs

Explore rigorous mathematical proofs, including properties of incomplete gamma functions, Stirling’s approximation, and derivations of loss functions and partition functions for our theoretical model.


This content originally appeared on HackerNoon and was authored by Reinforcement Technology Advancements

Abstract and 1 Introduction

2 Related Work

3 Model and 3.1 Associative memories

3.2 Transformer blocks

4 A New Energy Function

4.1 The layered structure

5 Cross-Entropy Loss

6 Empirical Results and 6.1 Empirical evaluation of the radius

6.2 Training GPT-2

6.3 Training Vanilla Transformers

7 Conclusion and Acknowledgments

\ Appendix A. Deferred Tables

Appendix B. Some Properties of the Energy Functions

Appendix C. Deferred Proofs from Section 5

Appendix D. Transformer Details: Using GPT-2 as an Example

\ References

Appendix C. Deferred Proofs from Section 5

C.1 Proof of Proposition 4

C.2

\

:::info Authors:

(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;

(2) Bo Bai baibo (8@huawei.com);

(3) Lei Deng (deng.lei2@huawei.com);

(4) Wei Han (harvey.hanwei@huawei.com).

:::


:::info This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Reinforcement Technology Advancements


Print Share Comment Cite Upload Translate Updates
APA

Reinforcement Technology Advancements | Sciencx (2025-06-24T02:30:03+00:00) Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs. Retrieved from https://www.scien.cx/2025/06/24/theoretical-derivations-cross-entropy-loss-and-energy-functions-in-llms/

MLA
" » Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs." Reinforcement Technology Advancements | Sciencx - Tuesday June 24, 2025, https://www.scien.cx/2025/06/24/theoretical-derivations-cross-entropy-loss-and-energy-functions-in-llms/
HARVARD
Reinforcement Technology Advancements | Sciencx Tuesday June 24, 2025 » Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs., viewed ,<https://www.scien.cx/2025/06/24/theoretical-derivations-cross-entropy-loss-and-energy-functions-in-llms/>
VANCOUVER
Reinforcement Technology Advancements | Sciencx - » Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/06/24/theoretical-derivations-cross-entropy-loss-and-energy-functions-in-llms/
CHICAGO
" » Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs." Reinforcement Technology Advancements | Sciencx - Accessed . https://www.scien.cx/2025/06/24/theoretical-derivations-cross-entropy-loss-and-energy-functions-in-llms/
IEEE
" » Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs." Reinforcement Technology Advancements | Sciencx [Online]. Available: https://www.scien.cx/2025/06/24/theoretical-derivations-cross-entropy-loss-and-energy-functions-in-llms/. [Accessed: ]
rf:citation
» Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs | Reinforcement Technology Advancements | Sciencx | https://www.scien.cx/2025/06/24/theoretical-derivations-cross-entropy-loss-and-energy-functions-in-llms/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.