ChatTTS - A generative speech model for daily dialogue

I have shared many TTS solutions before:

Today, let's take a look at a new open-source project that has been released recently: ChatTTS, which is a speech synthesis model designed for daily conversations: https://huggingface.co/2Noise/ChatTTS.

In just one short week, the number of Stars soared to 18k.

Let's first listen to the Demo:

English male voice

English female voice

Chinese male voice

Chinese female voice

ChatTTS is a text-to-speech model specifically designed for conversational scenarios (such as large language model assistants). It supports two languages: Chinese and English. The model has been trained with over 100,000 hours of Chinese and English speech data. The open-source version is available on HuggingFace, which is a model pre-trained with 40,000 hours of data but without fine-tuning (SFT).

Project Highlights

: ChatTTS has been optimized specifically for dialog tasks, capable of generating natural and expressive speech synthesis. It supports multiple speakers, which facilitates interactive conversations.
: This model can predict and control fine-grained prosodic features, including laughter, pauses, and interjections.
: In terms of rhythm, ChatTTS surpasses most open-source TTS models. ChatTTS provides pre-trained models to support further research and development.

You can run the demo on Google Colab: https://colab.research.google.com/drive/1fJGsNoKxUD62no-Y2mb5onAkhIXbsrI5

The generation process is still a bit slow, please be patient～ This is the final result I got: