Real-World Code Performance: Multi-Token Finetuning on CodeContests

We use the Python subset of the CodeContests (Li et al., 2022) train split with reward annotations (“correct” / “incorrect”) and condition on correct solutions at evaluation time. For evaluation, we generate 1000 samples per problem from the test split for each temperature T ∈ {0.5, 0.6, 0.7, 0.8, 0.9}, and compute the unbiased estimator for pass@k from Chen et al. (2021) for each value of k and T. It is possible that models that were pretrained with different losses have different respective optimal temperatures for pass@k, so we compute and show k 7→ maxT pass_at(k, T) in Figure 4. In other words, we grant pass@k access to a temperature oracle. For small values of k, pass@k measures the capability of understanding and solving tasks while for large k, it additionally favors diversity in outputs. According to the results in Figure 4, multi-token prediction pretraining leads to finetuned models that are better on both axes.

:::info Authors:

(1) Fabian Gloeckle, FAIR at Meta, CERMICS Ecole des Ponts ParisTech and Equal contribution;

(2) Badr Youbi Idrissi, FAIR at Meta, LISN Université Paris-Saclayand and Equal contribution;

(3) Baptiste Rozière, FAIR at Meta;

(4) David Lopez-Paz, FAIR at Meta and a last author;

(5) Gabriel Synnaeve, FAIR at Meta and a last author.

:::

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

This content originally appeared on HackerNoon and was authored by Cosmological thinking: time, space and universal causation

Print Share Comment Cite Upload Translate Updates

APA

Cosmological thinking: time, space and universal causation | Sciencx (2025-07-22T02:05:11+00:00) Real-World Code Performance: Multi-Token Finetuning on CodeContests. Retrieved from https://www.scien.cx/2025/07/22/real-world-code-performance-multi-token-finetuning-on-codecontests/

MLA

" » Real-World Code Performance: Multi-Token Finetuning on CodeContests." Cosmological thinking: time, space and universal causation | Sciencx - Tuesday July 22, 2025, https://www.scien.cx/2025/07/22/real-world-code-performance-multi-token-finetuning-on-codecontests/

HARVARD

Cosmological thinking: time, space and universal causation | Sciencx Tuesday July 22, 2025 » Real-World Code Performance: Multi-Token Finetuning on CodeContests., viewed ,<https://www.scien.cx/2025/07/22/real-world-code-performance-multi-token-finetuning-on-codecontests/>

VANCOUVER

Cosmological thinking: time, space and universal causation | Sciencx - » Real-World Code Performance: Multi-Token Finetuning on CodeContests. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/07/22/real-world-code-performance-multi-token-finetuning-on-codecontests/

CHICAGO

" » Real-World Code Performance: Multi-Token Finetuning on CodeContests." Cosmological thinking: time, space and universal causation | Sciencx - Accessed . https://www.scien.cx/2025/07/22/real-world-code-performance-multi-token-finetuning-on-codecontests/

IEEE

" » Real-World Code Performance: Multi-Token Finetuning on CodeContests." Cosmological thinking: time, space and universal causation | Sciencx [Online]. Available: https://www.scien.cx/2025/07/22/real-world-code-performance-multi-token-finetuning-on-codecontests/. [Accessed: ]

rf:citation

» Real-World Code Performance: Multi-Token Finetuning on CodeContests | Cosmological thinking: time, space and universal causation | Sciencx | https://www.scien.cx/2025/07/22/real-world-code-performance-multi-token-finetuning-on-codecontests/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

F. Details on CodeContests finetuning

Related Posts