If you’re involved in machine learning at all, you can’t have missed the plethora of groundbreaking models that have come out in past months. Two of the most hyped models are Whisper, OpenAI’s state-of-the-art speech recognition model, and Stable Diffusion, Stability AI’s groundbreaking image generation algorithm. In our upcoming webinar, Alaeddine Abdessalem, Software Developer at Jina AI, will show us how we can use both of these models to create an end-to-end multimodal application, capable of generating artwork from audio.
Whisper allows extremely accurate voice-to-text transcription in