While text-to-image and text-to-text take most of the Generative AI headlines, the field of text-to-speech (TTS) continues to make steady progress. Recently, I explored the capabilities of Tortoise-TTS, an open-source framework for building high-fidelity TTS systems.
I've captured my research and experimentation in a Jupyter notebook. You can find it along with some audio examples I generated at this github repository.
The notebook covers installation, configuration, experimenting with different voice output fidelities and even a first pass at creating a custom voice. You might find it useful as a starting point for your adventures into TTS; it may accelerate your research into some of the promising projects I've listed in the conclusion section.