Unveiling Nuances: Multi-Token Prediction’s Impact on Llama 2 Finetuning

Explore surprising finetuning results for Llama 2 with multi-token prediction (n=4) on coding benchmarks. Our analysis (Table S6) reveals that this loss, while promising, doesn’t always yield significant improvements, possibly due to initialization challenges.


This content originally appeared on HackerNoon and was authored by Cosmological thinking: time, space and universal causation

Abstract and 1. Introduction

2. Method

3. Experiments on real data

4. Ablations on synthetic data

5. Why does it work? Some speculation

6. Related work

7. Conclusion, Impact statement, Environmental impact, Acknowledgements and References

A. Additional results on self-speculative decoding

B. Alternative architectures

C. Training speeds

D. Finetuning

E. Additional results on model scaling behavior

F. Details on CodeContests finetuning

G. Additional results on natural language benchmarks

H. Additional results on abstractive text summarization

I. Additional results on mathematical reasoning in natural language

J. Additional results on induction learning

K. Additional results on algorithmic reasoning

L. Additional intuitions on multi-token prediction

M. Training hyperparameters

D. Finetuning

Table S6: Finetuning LLama 2 with multi-token prediction does not significantly improve performance. We tried to finetune LLama 2 with 4-token prediction but this did not yield significant improvements compared to the baseline. We suppose that this new loss changes the initialization too brutally and never really recovers. We still some improvements for example on MBPP Pass@1. All runs use 200B tokens of code.

\

:::info Authors:

(1) Fabian Gloeckle, FAIR at Meta, CERMICS Ecole des Ponts ParisTech and Equal contribution;

(2) Badr Youbi Idrissi, FAIR at Meta, LISN Université Paris-Saclayand and Equal contribution;

(3) Baptiste Rozière, FAIR at Meta;

(4) David Lopez-Paz, FAIR at Meta and a last author;

(5) Gabriel Synnaeve, FAIR at Meta and a last author.

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Cosmological thinking: time, space and universal causation


Print Share Comment Cite Upload Translate Updates
APA

Cosmological thinking: time, space and universal causation | Sciencx (2025-07-22T01:49:19+00:00) Unveiling Nuances: Multi-Token Prediction’s Impact on Llama 2 Finetuning. Retrieved from https://www.scien.cx/2025/07/22/unveiling-nuances-multi-token-predictions-impact-on-llama-2-finetuning/

MLA
" » Unveiling Nuances: Multi-Token Prediction’s Impact on Llama 2 Finetuning." Cosmological thinking: time, space and universal causation | Sciencx - Tuesday July 22, 2025, https://www.scien.cx/2025/07/22/unveiling-nuances-multi-token-predictions-impact-on-llama-2-finetuning/
HARVARD
Cosmological thinking: time, space and universal causation | Sciencx Tuesday July 22, 2025 » Unveiling Nuances: Multi-Token Prediction’s Impact on Llama 2 Finetuning., viewed ,<https://www.scien.cx/2025/07/22/unveiling-nuances-multi-token-predictions-impact-on-llama-2-finetuning/>
VANCOUVER
Cosmological thinking: time, space and universal causation | Sciencx - » Unveiling Nuances: Multi-Token Prediction’s Impact on Llama 2 Finetuning. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/07/22/unveiling-nuances-multi-token-predictions-impact-on-llama-2-finetuning/
CHICAGO
" » Unveiling Nuances: Multi-Token Prediction’s Impact on Llama 2 Finetuning." Cosmological thinking: time, space and universal causation | Sciencx - Accessed . https://www.scien.cx/2025/07/22/unveiling-nuances-multi-token-predictions-impact-on-llama-2-finetuning/
IEEE
" » Unveiling Nuances: Multi-Token Prediction’s Impact on Llama 2 Finetuning." Cosmological thinking: time, space and universal causation | Sciencx [Online]. Available: https://www.scien.cx/2025/07/22/unveiling-nuances-multi-token-predictions-impact-on-llama-2-finetuning/. [Accessed: ]
rf:citation
» Unveiling Nuances: Multi-Token Prediction’s Impact on Llama 2 Finetuning | Cosmological thinking: time, space and universal causation | Sciencx | https://www.scien.cx/2025/07/22/unveiling-nuances-multi-token-predictions-impact-on-llama-2-finetuning/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.