I have shared many TTS solutions before:
Today, let's take a look at a new open-source project that has been released recently: ChatTTS, which is a speech synthesis model designed for daily conversations: https://huggingface.co/2Noise/ChatTTS.
In just one short week, the number of Stars soared to 18k.
Let's first listen to the Demo:
English male voice
ChatTTS is a text-to-speech model specifically designed for conversational scenarios (such as large language model assistants). It supports two languages: Chinese and English. The model has been trained with over 100,000 hours of Chinese and English speech data. The open-source version is available on HuggingFace, which is a model pre-trained with 40,000 hours of data but without fine-tuning (SFT).
Project Highlights
: ChatTTS has been optimized specifically for dialog tasks, capable of generating natural and expressive speech synthesis. It supports multiple speakers, which facilitates interactive conversations. : This model can predict and control fine-grained prosodic features, including laughter, pauses, and interjections. : In terms of rhythm, ChatTTS surpasses most open-source TTS models. ChatTTS provides pre-trained models to support further research and development.
You can run the demo on Google Colab: https://colab.research.google.com/drive/1fJGsNoKxUD62no-Y2mb5onAkhIXbsrI5
The generation process is still a bit slow, please be patient~ This is the final result I got: