MIVPG on E-commerce: Multi-Image/Multi-Patch Aggregation for Captioning Post date November 18, 2025 Post author By Instancing Post categories In blip2, computational-efficiency, deep-learning, e-commerce-captioning, mivpg, multimodal-fusion, multimodal-learning, multiple-instance-learning
Evaluating Visual Adapters: MIVPG Performance on Single and Multi-Image Inputs Post date November 15, 2025 Post author By Instancing Post categories In blip2, deep-learning, frozen-encoder, mivpg, multimodal-experiments, multimodal-learning, multiple-instance-learning, visual-prompt-generator
MLLM Adapters: Review of VPGs and Multimodal Fusion Post date November 12, 2025 Post author By Instancing Post categories In deep-learning, image-text-fusion, mllm-architecture, multimodal-learning, perceiver-resampler, q-former, vision-language-models, visual-prompt-generators
When AI Learns to See the Unknown: Wrapping Up the OW‑VISCap Study Post date November 4, 2025 Post author By Instancing Post categories In ai-reproducibility, computer-vision, contrastive-learning, deep-learning, machine-learning-research, multimodal-learning, open-world-ai, video-segmentation
The Integration of Vision-LLMs into AD Systems: Capabilities and Challenges Post date September 27, 2025 Post author By Text Generation Post categories In autonomous, autonomous-driving-(ad), llms, multimodal-learning, safety-critical-systems, typographic-attacks, vision-llms, visual-reasoning
What It Takes to Train a Versatile Speech AI System Post date June 20, 2025 Post author By Phonology Technology Post categories In audio-language-model, automatic-speech-recognition, generalization-capability, instruction-finetuning, multimodal-learning, multitask-learning, speech-processing, zero-shot-learning
How We Pre-Trained a 300M Parameter Audio Encoder With Random Quantization Post date June 19, 2025 Post author By Phonology Technology Post categories In audio-language-model, automatic-speech-recognition, generalization-capability, instruction-finetuning, multimodal-learning, multitask-learning, speech-processing, zero-shot-learning
A Unified Multimodal Approach to Speech Processing with LLMs Post date June 19, 2025 Post author By Phonology Technology Post categories In audio-language-model, automatic-speech-recognition, generalization-capability, instruction-finetuning, multimodal-learning, multitask-learning, speech-processing, zero-shot-learning
Multimodal AI for High-Fidelity Video Creation and Editing Post date January 13, 2025 Post author By Teleplay Technology Post categories In ai-video-editing, high-fidelity-motion, llms, multimodal-learning, text-to-video-ai, unified-model-architecture, video-generation-ai, videopoet
Med-Flamingo: a Multimodal Medical Few-shot Learner – Appendix Post date June 19, 2024 Post author By The FewShot Prompting Publication Post categories In clinical-applications, few-shot-learning, generative-vqa, medical-ai, medical-informatics, multimodal-learning, usmle-evaluation, vision-language-models
Med-Flamingo: a Multimodal Medical Few-shot Learner – Discussion, Acknowledgments, and References Post date June 19, 2024 Post author By The FewShot Prompting Publication Post categories In clinical-applications, few-shot-learning, generative-vqa, medical-ai, medical-informatics, multimodal-learning, usmle-evaluation, vision-language-models
Med-Flamingo: a Multimodal Medical Few-shot Learner – Results Post date June 19, 2024 Post author By The FewShot Prompting Publication Post categories In clinical-applications, few-shot-learning, generative-vqa, medical-ai, medical-informatics, multimodal-learning, usmle-evaluation, vision-language-models
Med-Flamingo: a Multimodal Medical Few-shot Learner – Evaluation Post date June 19, 2024 Post author By The FewShot Prompting Publication Post categories In clinical-applications, few-shot-learning, generative-vqa, medical-ai, medical-informatics, multimodal-learning, usmle-evaluation, vision-language-models
Med-Flamingo: a Multimodal Medical Few-shot Learner – Med-Flamingo Post date June 19, 2024 Post author By The FewShot Prompting Publication Post categories In clinical-applications, few-shot-learning, generative-vqa, medical-ai, medical-informatics, multimodal-learning, usmle-evaluation, vision-language-models
Med-Flamingo: a Multimodal Medical Few-shot Learner – Related Works Post date June 19, 2024 Post author By The FewShot Prompting Publication Post categories In clinical-applications, few-shot-learning, generative-vqa, medical-ai, medical-informatics, multimodal-learning, usmle-evaluation, vision-language-models