What It Takes to Train a Versatile Speech AI System Post date June 20, 2025 Post author By Phonology Technology Post categories In audio-language-model, automatic-speech-recognition, generalization-capability, instruction-finetuning, multimodal-learning, multitask-learning, speech-processing, zero-shot-learning
How We Pre-Trained a 300M Parameter Audio Encoder With Random Quantization Post date June 19, 2025 Post author By Phonology Technology Post categories In audio-language-model, automatic-speech-recognition, generalization-capability, instruction-finetuning, multimodal-learning, multitask-learning, speech-processing, zero-shot-learning
A Unified Multimodal Approach to Speech Processing with LLMs Post date June 19, 2025 Post author By Phonology Technology Post categories In audio-language-model, automatic-speech-recognition, generalization-capability, instruction-finetuning, multimodal-learning, multitask-learning, speech-processing, zero-shot-learning
Advancing Multimodal Video Generation with Responsible AI and Stylization Post date January 13, 2025 Post author By Teleplay Technology Post categories In llms, multimodal-ai, self-supervised-learning, super-resolution-ai, text-to-video-evaluation, video-generation-ai, videopoet, zero-shot-learning