Visual Prompt Generation: Cross-Attention in Q-Former Post date November 19, 2025 Post author By Instancing Post categories In bert-model, blip2, cross-attention, deep-learning, learnable-queries, multimodal-llms-(mllms), q-former-architecture, visual-prompt-embeddings
Visual Prompt Generation: Cross-Attention in Q-Former Post date November 19, 2025 Post author By Instancing Post categories In bert-model, blip2, cross-attention, deep-learning, learnable-queries, multimodal-llms-(mllms), q-former-architecture, visual-prompt-embeddings
MIVPG on E-commerce: Multi-Image/Multi-Patch Aggregation for Captioning Post date November 18, 2025 Post author By Instancing Post categories In blip2, computational-efficiency, deep-learning, e-commerce-captioning, mivpg, multimodal-fusion, multimodal-learning, multiple-instance-learning
Evaluating Visual Adapters: MIVPG Performance on Single and Multi-Image Inputs Post date November 15, 2025 Post author By Instancing Post categories In blip2, deep-learning, frozen-encoder, mivpg, multimodal-experiments, multimodal-learning, multiple-instance-learning, visual-prompt-generator