OpenCLIP BigG to CLIP L Conversion: What You Need to Know

To map from OpenCLIP ViT-bigG/14 image latents to CLIP ViT-L/14 image latents during MindEye2 inference we independently trained a linear model


This content originally appeared on HackerNoon and was authored by Image Recognition

Abstract and 1 Introduction

2 MindEye2 and 2.1 Shared-Subject Functional Alignment

2.2 Backbone, Diffusion Prior, & Submodules

2.3 Image Captioning and 2.4 Fine-tuning Stable Diffusion XL for unCLIP

2.5 Model Inference

3 Results and 3.1 fMRI-to-Image Reconstruction

3.2 Image Captioning

3.3 Image/Brain Retrieval and 3.4 Brain Correlation

3.5 Ablations

4 Related Work

5 Conclusion

6 Acknowledgements and References

\ A Appendix

A.1 Author Contributions

A.2 Additional Dataset Information

A.3 MindEye2 (not pretrained) vs. MindEye1

A.4 Reconstruction Evaluations Across Varying Amounts of Training Data

A.5 Single-Subject Evaluations

A.6 UnCLIP Evaluation

A.7 OpenCLIP BigG to CLIP L Conversion

A.8 COCO Retrieval

A.9 Reconstruction Evaluations: Additional Information

A.10 Pretraining with Less Subjects

A.11 UMAP Dimensionality Reduction

A.12 ROI-Optimized Stimuli

A.13 Human Preference Experiments

A.7 OpenCLIP BigG to CLIP L Conversion

To map from OpenCLIP ViT-bigG/14 image latents to CLIP ViT-L/14 image latents during MindEye2 inference we inde

\ Figure 9: Generating images from their CLIP image embeddings. SDXL unCLIP (middle) outperforms Versatile Diffusion (right) in capturing perceptual details.

\ Table 9: SDXL unCLIP reconstructions from ground truth OpenCLIP image latents consistently outperform Versatile Diffusion reconstructions from ground truth CLIP image latents.

\ pendently trained a linear model using ground truth images from the COCO 2017 train and validation dataset. This conversion was necessary to use the pretrained GIT image captioning model. The PyTorch code used to train this model is depicted in Algorithm 1.

\

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

:::info Authors:

(1) Paul S. Scotti, Stability AI and Medical AI Research Center (MedARC);

(2) Mihir Tripathy, Medical AI Research Center (MedARC) and a Core contribution;

(3) Cesar Kadir Torrico Villanueva, Medical AI Research Center (MedARC) and a Core contribution;

(4) Reese Kneeland, University of Minnesota and a Core contribution;

(5) Tong Chen, The University of Sydney and Medical AI Research Center (MedARC);

(6) Ashutosh Narang, Medical AI Research Center (MedARC);

(7) Charan Santhirasegaran, Medical AI Research Center (MedARC);

(8) Jonathan Xu, University of Waterloo and Medical AI Research Center (MedARC);

(9) Thomas Naselaris, University of Minnesota;

(10) Kenneth A. Norman, Princeton Neuroscience Institute;

(11) Tanishq Mathew Abraham, Stability AI and Medical AI Research Center (MedARC).

:::

\


This content originally appeared on HackerNoon and was authored by Image Recognition


Print Share Comment Cite Upload Translate Updates
APA

Image Recognition | Sciencx (2025-04-16T01:15:45+00:00) OpenCLIP BigG to CLIP L Conversion: What You Need to Know. Retrieved from https://www.scien.cx/2025/04/16/openclip-bigg-to-clip-l-conversion-what-you-need-to-know/

MLA
" » OpenCLIP BigG to CLIP L Conversion: What You Need to Know." Image Recognition | Sciencx - Wednesday April 16, 2025, https://www.scien.cx/2025/04/16/openclip-bigg-to-clip-l-conversion-what-you-need-to-know/
HARVARD
Image Recognition | Sciencx Wednesday April 16, 2025 » OpenCLIP BigG to CLIP L Conversion: What You Need to Know., viewed ,<https://www.scien.cx/2025/04/16/openclip-bigg-to-clip-l-conversion-what-you-need-to-know/>
VANCOUVER
Image Recognition | Sciencx - » OpenCLIP BigG to CLIP L Conversion: What You Need to Know. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/04/16/openclip-bigg-to-clip-l-conversion-what-you-need-to-know/
CHICAGO
" » OpenCLIP BigG to CLIP L Conversion: What You Need to Know." Image Recognition | Sciencx - Accessed . https://www.scien.cx/2025/04/16/openclip-bigg-to-clip-l-conversion-what-you-need-to-know/
IEEE
" » OpenCLIP BigG to CLIP L Conversion: What You Need to Know." Image Recognition | Sciencx [Online]. Available: https://www.scien.cx/2025/04/16/openclip-bigg-to-clip-l-conversion-what-you-need-to-know/. [Accessed: ]
rf:citation
» OpenCLIP BigG to CLIP L Conversion: What You Need to Know | Image Recognition | Sciencx | https://www.scien.cx/2025/04/16/openclip-bigg-to-clip-l-conversion-what-you-need-to-know/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.