- RuntheAI
- Posts
- Run The AI
Run The AI
daily newsletter providing you with updates and thoughts on AI which can be read in 3 minutes or less.
Hi, We're new here... Run The AI is a daily newsletter providing you with updates and thoughts on AI which can be read in 3 minutes or less.
Here's what we've got for you today:
AI art in its early days, and a look at the development we've made in the last two years.
Stable Diffusion fine-tuned to produce music.
Custom AI generated bed time stories
First attempts at AI art (CLIP guided AI Art)
Progress in the past two years has been crazy. Two years ago before we were introduced to stable diffusion + dream booth generated art, we had another OpenAI model called CLIP.
We want back 2 years to find user's experiments generating AI art and what we found was amusing to say the least.
Everyone was amazed at the generated art that, to be honest, looks like a cobbled mess with a questionable correlation to the prompt.
Let's have a look at some of the ones we found:
1. a time traveler in the crowd
2. a man painting a completely red image
3. mist over green hills
Stable Diffusion fine-tuned to produce music.
Riffusion is an app for real-time music generation with Stable Diffusion. It is described as a latent text-to-image diffusion model capable of generating spectrogram images given any text input. These spectrograms can be used to create audio snippets.
It isn't uncommon to convert the spectrum of a sound file into images and vice versa. Several software synthesizers are based on this approach. However, placing these images in Stable Diffusion and modifying them over time is a new and interesting concept.
(The model was built by Seth Forsgren and Hayk Martiros.)
Run The AI: PPPs
Pick up (Learn):
Audio spectrograms have two components: the magnitude and the phase. Most of the information and structure is in the magnitude spectrogram so neural nets generally only synthesize that. If you were to look at a phase spectrogram it looks completely random and neural nets have a very, very difficult time learning how to generate good phases.When you go from a spectrogram to audio you need both the magnitudes and phases, but if the neural net only generates the magnitudes you have a problem. This is where the Griffin-Lim algorithm comes in. It tries to find a set of phases that works with the magnitudes so that you can generate the audio. It generally works pretty well, but tends to produce that sort of resonant artifact that you're noticing, especially when the magnitude spectrogram is synthesized (and therefore doesn't necessarily have a consistent set of phases).
Pilot (play):
Try Riffusion, here.
To convert the image to audio, you can use this script (make sure to put the audio.py file in the same folder)
Project:
🤯 Wait what?!
🧙🏻♂️ Can you teleport your own kids into the AI-generated stories?
This will be huge!
1. Generate a custom story,
2. Magically get your own kid to be the main character of the story...
3. Infinite custom stories made for your kids with them in them
4. repeat— Linus (●ᴗ●) (@LinusEkenstam)
1:12 AM • Dec 16, 2022
That's it from RuntheAI for today.
THANK YOU FOR READING AND SEE YOU SOON, SUBSCRIBE TO STAY UPDATED!