What It Takes to Train a Versatile Speech AI System Post date June 20, 2025 Post author By Phonology Technology Post categories In audio-language-model, automatic-speech-recognition, generalization-capability, instruction-finetuning, multimodal-learning, multitask-learning, speech-processing, zero-shot-learning
How We Pre-Trained a 300M Parameter Audio Encoder With Random Quantization Post date June 19, 2025 Post author By Phonology Technology Post categories In audio-language-model, automatic-speech-recognition, generalization-capability, instruction-finetuning, multimodal-learning, multitask-learning, speech-processing, zero-shot-learning
A Unified Multimodal Approach to Speech Processing with LLMs Post date June 19, 2025 Post author By Phonology Technology Post categories In audio-language-model, automatic-speech-recognition, generalization-capability, instruction-finetuning, multimodal-learning, multitask-learning, speech-processing, zero-shot-learning
Multimodal AI for High-Fidelity Video Creation and Editing Post date January 13, 2025 Post author By Teleplay Technology Post categories In ai-video-editing, high-fidelity-motion, llms, multimodal-learning, text-to-video-ai, unified-model-architecture, video-generation-ai, videopoet
Med-Flamingo: a Multimodal Medical Few-shot Learner – Appendix Post date June 19, 2024 Post author By The FewShot Prompting Publication Post categories In clinical-applications, few-shot-learning, generative-vqa, medical-ai, medical-informatics, multimodal-learning, usmle-evaluation, vision-language-models
Med-Flamingo: a Multimodal Medical Few-shot Learner – Discussion, Acknowledgments, and References Post date June 19, 2024 Post author By The FewShot Prompting Publication Post categories In clinical-applications, few-shot-learning, generative-vqa, medical-ai, medical-informatics, multimodal-learning, usmle-evaluation, vision-language-models
Med-Flamingo: a Multimodal Medical Few-shot Learner – Results Post date June 19, 2024 Post author By The FewShot Prompting Publication Post categories In clinical-applications, few-shot-learning, generative-vqa, medical-ai, medical-informatics, multimodal-learning, usmle-evaluation, vision-language-models
Med-Flamingo: a Multimodal Medical Few-shot Learner – Evaluation Post date June 19, 2024 Post author By The FewShot Prompting Publication Post categories In clinical-applications, few-shot-learning, generative-vqa, medical-ai, medical-informatics, multimodal-learning, usmle-evaluation, vision-language-models
Med-Flamingo: a Multimodal Medical Few-shot Learner – Med-Flamingo Post date June 19, 2024 Post author By The FewShot Prompting Publication Post categories In clinical-applications, few-shot-learning, generative-vqa, medical-ai, medical-informatics, multimodal-learning, usmle-evaluation, vision-language-models
Med-Flamingo: a Multimodal Medical Few-shot Learner – Related Works Post date June 19, 2024 Post author By The FewShot Prompting Publication Post categories In clinical-applications, few-shot-learning, generative-vqa, medical-ai, medical-informatics, multimodal-learning, usmle-evaluation, vision-language-models