Webinar: “Speech-to-image generation using Jina”
If you’re involved in machine learning at all, you can’t have missed the plethora of groundbreaking models that have come out in past months. Two of the most hyped models are Whisper, OpenAI’s state-of-the-art speech recognition model, and Stable Diffusion, Stability AI’s groundbreaking image generation algorithm. In our upcoming webinar, Alaeddine Abdessalem, Software Developer at Jina AI, will show us how we can use both of these models to create an end-to-end multimodal application, capable of generating artwork from audio.
Whisper allows extremely accurate voice-to-text transcription in a variety of languages. The model was trained on almost 700,000 hours of multilingual data, giving it robustness to accents, background noise, and technical language. You can play with a demonstration of Whisper here. I was able to get it to accurately transcribe me speaking German with an Australian accent, which is pretty impressive!
Stable Diffusion has emerged as one of the most popular image generation models released this year, and it’s not hard to see why – with a simple text prompt, you can quickly create amazing art of anything you can imagine. The model was trained on hundreds of millions of labeled images from sources such as Pinterest, WordPress, Flickr, and Wikimedia Commons. You can try out Stable Diffusion for yourself here.
If this piques your interest, join us and learn how to build an application using these models and deploy it using Jina Cloud. Alaeddine will show us how straightforward these cutting-edge models can be and how easy it is to productionize an application with them!
Join us on Monday, November 21, at 4:00 pm UTC.
About the speaker
Alaeddine is a Software Engineer from Tunisia interested in Cloud-native software and Artificial Intelligence. He works at Jina AI, where he contributes to DocArray and Jina projects.
Subscribe to Blog updates
Thanks, we've got you!