GPT-2 Architecture and Training Details: Parameters & Cross-Entropy Loss Post date June 24, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs Post date June 24, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
LogSumExp Function Properties: Lemmas for Energy Functions Post date June 24, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Transformer Performance: Hopfield Theory & Cross-Entropy Loss Data Post date June 24, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
New Regularization-Free Energy Function for Transformer Analysis Post date June 22, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Validating Theoretical Loss Bound: Vanilla Transformer Experiments Post date June 22, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
The Impact of Data Size on Transformer Training: Overfitting & Loss Dynamics Post date June 21, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss Post date June 21, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Related Work: Scaling Laws and Hopfield Models in LLM Research Post date June 18, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Theoretical Framework: Transformer Memorization & Performance Dynamics Post date June 18, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models