Model Promotion: Using EMA to Balance Learning and Forgetting in IIL Post date November 5, 2025 Post author By Instancing Post categories In ai-models, catastrophic-forgetting, exponential-moving-average, instance-incremental-learning, knowledge-consolidation, model-generalization, overfitting-mitigation, teacher-student-model
Generalization and Robustness: RECKONING Excels on Longer Reasoning Chains Unseen During Training Post date October 24, 2025 Post author By The Tech Reckoning is Upon Us! Post categories In algorithms, ft-icr-baseline, gpt-2, in-context-reasoning, knowledge-encoding, model-generalization, multi-hop-reasoning, reckoning-algorithm
GPT-2 Architecture and Training Details: Parameters & Cross-Entropy Loss Post date June 24, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Theoretical Derivations: Cross-Entropy Loss and Energy Functions in LLMs Post date June 24, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
LogSumExp Function Properties: Lemmas for Energy Functions Post date June 24, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Transformer Performance: Hopfield Theory & Cross-Entropy Loss Data Post date June 24, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
New Regularization-Free Energy Function for Transformer Analysis Post date June 22, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Validating Theoretical Loss Bound: Vanilla Transformer Experiments Post date June 22, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
The Impact of Data Size on Transformer Training: Overfitting & Loss Dynamics Post date June 21, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss Post date June 21, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Related Work: Scaling Laws and Hopfield Models in LLM Research Post date June 18, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models
Theoretical Framework: Transformer Memorization & Performance Dynamics Post date June 18, 2025 Post author By Reinforcement Technology Advancements Post categories In associative-memory, attention-mechanism, cross-entropy-loss, hopfield-networks, model-generalization, model-scaling, neural-network-performance, transformer-models