Model Promotion: Using EMA to Balance Learning and Forgetting in IIL

This content originally appeared on HackerNoon and was authored by Instancing

Table of Links

Supplementary Material

4.2. Knowledge consolidation

Different from existing IIL methods that only focus on the student model, we propose to consolidate knowledge from student to teacher for better balance between learning and forgetting. The consolidation is not implemented through learning but through model exponential moving average (EMA). Model EMA was initially introduced by Tarvainen et al. [28] to enhance the generalizability of models. In the vanilla model EMA, the model is trained from scratch, and EMA is applied after every iteration. The underlying mechanism of model EMA is not thoroughly explained before. In this work, we leverage model EMA for knowledge consolidation (KC) in the context of IIL task and explain the mechanism theoretically. According to our theoretical analysis, we propose a new KC-EMA for knowledge consolidation. Mathematically, the model EMA can be formulated as

\ Hence, the teacher model can achieve a minima training loss on both the old task and the new task, which indicates improved generalization on both the old data and new observations. This has been verified by our experiments in Sec. 5. However, since α < 1, it is noteworthy that the gradient of the teacher model, whether on the old task or the new task, is larger than the initial gradient on the old task or the final gradient of the student model on the new task. That is, the obtained teacher model sacrifices some unilateral performance on either the old data or the new data in order to achieve better generalization on both. From this perspective, the mechanism of vanilla EMA could also be partially explained. In vanilla EMA, where the model starts from scratch and only the new task is considered, we only need to focus on the second term in Equation 13. Since the teacher model has larger gradient on the training data than the student model, it is less possible to overfit to the training data. As a result, the teacher model has better generalization as Tarvainen et al. [28] observed.

:::info Authors:

(1) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(2) Weifu Fu, Tencent Youtu Lab;

(3) Yuhuan Lin, Tencent Youtu Lab;

(4) Jialin Li, Tencent Youtu Lab;

(5) Yifeng Zhou, Tencent Youtu Lab;

(6) Yong Liu, Tencent Youtu Lab;

(7) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(8) Chengjie Wang, Tencent Youtu Lab.

:::

:::info This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::

This content originally appeared on HackerNoon and was authored by Instancing

Print Share Comment Cite Upload Translate Updates

APA

Instancing | Sciencx (2025-11-05T16:30:09+00:00) Model Promotion: Using EMA to Balance Learning and Forgetting in IIL. Retrieved from https://www.scien.cx/2025/11/05/model-promotion-using-ema-to-balance-learning-and-forgetting-in-iil/

MLA

" » Model Promotion: Using EMA to Balance Learning and Forgetting in IIL." Instancing | Sciencx - Wednesday November 5, 2025, https://www.scien.cx/2025/11/05/model-promotion-using-ema-to-balance-learning-and-forgetting-in-iil/

HARVARD

Instancing | Sciencx Wednesday November 5, 2025 » Model Promotion: Using EMA to Balance Learning and Forgetting in IIL., viewed ,<https://www.scien.cx/2025/11/05/model-promotion-using-ema-to-balance-learning-and-forgetting-in-iil/>

VANCOUVER

Instancing | Sciencx - » Model Promotion: Using EMA to Balance Learning and Forgetting in IIL. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/11/05/model-promotion-using-ema-to-balance-learning-and-forgetting-in-iil/

CHICAGO

" » Model Promotion: Using EMA to Balance Learning and Forgetting in IIL." Instancing | Sciencx - Accessed . https://www.scien.cx/2025/11/05/model-promotion-using-ema-to-balance-learning-and-forgetting-in-iil/

IEEE

" » Model Promotion: Using EMA to Balance Learning and Forgetting in IIL." Instancing | Sciencx [Online]. Available: https://www.scien.cx/2025/11/05/model-promotion-using-ema-to-balance-learning-and-forgetting-in-iil/. [Accessed: ]

rf:citation

» Model Promotion: Using EMA to Balance Learning and Forgetting in IIL | Instancing | Sciencx | https://www.scien.cx/2025/11/05/model-promotion-using-ema-to-balance-learning-and-forgetting-in-iil/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

4.2. Knowledge consolidation

Related Posts