This content originally appeared on DEV Community and was authored by Henri Wang
Yes, exactly.
In DINO and DINOv2, the DINO loss is applied between the [CLS] tokens of the teacher and student models.
The [CLS] token output from the teacher is softmaxed with temperature and centered.
The student is trained to match this distribution using cross-entropy loss.
Each view of the same image produces one [CLS] embedding, and the goal is to make the student’s [CLS] output match the teacher’s.
So, the comparison is always between the [CLS] tokens, across different augmentations of the same image.
This content originally appeared on DEV Community and was authored by Henri Wang

Henri Wang | Sciencx (2025-06-30T08:31:42+00:00) Does DINO loss compare the [CLS] tokens from both teacher and student?. Retrieved from https://www.scien.cx/2025/06/30/does-dino-loss-compare-the-cls-tokens-from-both-teacher-and-student/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.