Related Work: Scaling Laws and Hopfield Models in LLM Research

Explore existing research on neural scaling laws in large language models and the evolution of Hopfield networks as associative memories, including their connection to Transformer attention.


This content originally appeared on HackerNoon and was authored by Reinforcement Technology Advancements

Abstract and 1 Introduction

2 Related Work

3 Model and 3.1 Associative memories

3.2 Transformer blocks

4 A New Energy Function

4.1 The layered structure

5 Cross-Entropy Loss

6 Empirical Results and 6.1 Empirical evaluation of the radius

6.2 Training GPT-2

6.3 Training Vanilla Transformers

7 Conclusion and Acknowledgments

\ Appendix A. Deferred Tables

Appendix B. Some Properties of the Energy Functions

Appendix C. Deferred Proofs from Section 5

Appendix D. Transformer Details: Using GPT-2 as an Example

\ References

2 Related Work

Scaling laws As discussed in the introduction, we have seen consistent empirical evidence that the performance of models increases as both the size of the models and the volume of training data scale up (Kaplan et al., 2020; Khandelwal et al., 2019; Rae et al., 2021; Chowdhery et al., 2023). Intensive experiments have also been conducted to explore neural scaling laws under various conditions, including constraints on computational budget (Hoffmann et al., 2022b), data (Muennighoff et al., 2024), and instances of over-training (Gadre et al., 2024). In these analyses, a decomposition of the expected risk is utilized, leading to the following fit:

\

\ For Chinchilla models, the fitted parameters are (Hoffmann et al., 2022a)

\

\ A line of research concerns the generalization of over-parameterized neural networks (Belkin et al., 2019; Nakkiran et al., 2021; Power et al., 2022). Recent experiments show that overtrained transformers exhibits inverted U-shaped scaling behavior (Murty et al., 2023), which cannot be explained by the empirical scaling laws.

\

\ Hopfield models Classical Hopfield networks (Amari, 1972; Hopfield, 1982) were introduced as paradigmatic examples of associative memory. The network’s update dynamics define an energy function, whose fixed points correspond to the stored memories. An important indicator is the number of patterns that the model can memorize, known as the network’s storage capacity. Modifications to the energy function (Krotov and Hopfield, 2016; Demircigil et al., 2017) result in higher storage capacities (see Table 1 in Appendix A). The original model operates on binary variables. The modern continuous Hopfield network (MCHN) (Ramsauer et al., 2020) generalizes the Hopfield model to the continuous domain, making it an appealing tool for understanding the attention mechanism in Transformers, which also take vector embeddings in the real domain as inputs. Given an input (e.g., a prompt), the Hopfield layer retrieves a memory by converging to a local minimum of the energy landscape, and the update rule has a nice correspondence to the query-key-value mechanism in attention. Krotov (2021) proposes a Hierarchical Associative Memory (HAM) model that enables the description of the neural network with a global energy function, as opposed to energy functions for individual layers.

\

:::info Authors:

(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;

(2) Bo Bai baibo (8@huawei.com);

(3) Lei Deng (deng.lei2@huawei.com);

(4) Wei Han (harvey.hanwei@huawei.com).

:::


:::info This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Reinforcement Technology Advancements


Print Share Comment Cite Upload Translate Updates
APA

Reinforcement Technology Advancements | Sciencx (2025-06-18T15:45:03+00:00) Related Work: Scaling Laws and Hopfield Models in LLM Research. Retrieved from https://www.scien.cx/2025/06/18/related-work-scaling-laws-and-hopfield-models-in-llm-research/

MLA
" » Related Work: Scaling Laws and Hopfield Models in LLM Research." Reinforcement Technology Advancements | Sciencx - Wednesday June 18, 2025, https://www.scien.cx/2025/06/18/related-work-scaling-laws-and-hopfield-models-in-llm-research/
HARVARD
Reinforcement Technology Advancements | Sciencx Wednesday June 18, 2025 » Related Work: Scaling Laws and Hopfield Models in LLM Research., viewed ,<https://www.scien.cx/2025/06/18/related-work-scaling-laws-and-hopfield-models-in-llm-research/>
VANCOUVER
Reinforcement Technology Advancements | Sciencx - » Related Work: Scaling Laws and Hopfield Models in LLM Research. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/06/18/related-work-scaling-laws-and-hopfield-models-in-llm-research/
CHICAGO
" » Related Work: Scaling Laws and Hopfield Models in LLM Research." Reinforcement Technology Advancements | Sciencx - Accessed . https://www.scien.cx/2025/06/18/related-work-scaling-laws-and-hopfield-models-in-llm-research/
IEEE
" » Related Work: Scaling Laws and Hopfield Models in LLM Research." Reinforcement Technology Advancements | Sciencx [Online]. Available: https://www.scien.cx/2025/06/18/related-work-scaling-laws-and-hopfield-models-in-llm-research/. [Accessed: ]
rf:citation
» Related Work: Scaling Laws and Hopfield Models in LLM Research | Reinforcement Technology Advancements | Sciencx | https://www.scien.cx/2025/06/18/related-work-scaling-laws-and-hopfield-models-in-llm-research/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.