Skip to content
All posts

IBM’s AI generates high-quality voices from 5 minutes of talking

Training powerful text to speech models requires sufficiently powerful hardware. A recent study published by OpenAI drives the point home — it found that since 2012, the amount of compute used in the largest runs grew by more than 300,000 times. In pursuit of less demanding models, researchers at IBM developed a new lightweight and modular method for speech synthesis. They say it’s able to synthesize high-quality speech in real time by learning different aspects of a speaker’s voice, making it possible to adapt to new speaking styles and voices with small amounts of data.