Naturalness and quality comparison of TTS APIs for US English
Naturalness and quality comparison of multiple TTS models including Resemble AI, Google Cloud, AWS, Open AI HD, Eleven Labs, and Play AILoading...Mean
Comparison Graph
Mean table
Tag | Resemble AI | Google Cloud | AWS | Open AI HD | Eleven Labs | Play AI |
---|---|---|---|---|---|---|
Man | μ: 4.21 / σ: 0.95 / CI95 : 0.05 | μ: 3.99 / σ: 0.96 / CI95 : 0.05 | μ: 3.56 / σ: 1.11 / CI95 : 0.06 | μ: 4.37 / σ: 0.80 / CI95 : 0.04 | μ: 4.50 / σ: 0.72 / CI95 : 0.03 | μ: 3.71 / σ: 1.11 / CI95 : 0.05 |
Narrative | μ: 4.16 / σ: 0.93 / CI95 : 0.05 | μ: 3.94 / σ: 1.01 / CI95 : 0.05 | μ: 3.57 / σ: 1.09 / CI95 : 0.06 | μ: 4.41 / σ: 0.78 / CI95 : 0.04 | μ: 4.56 / σ: 0.70 / CI95 : 0.04 | μ: 3.76 / σ: 1.10 / CI95 : 0.05 |
Woman | μ: 4.20 / σ: 0.87 / CI95 : 0.04 | μ: 3.95 / σ: 0.99 / CI95 : 0.05 | μ: 3.40 / σ: 1.12 / CI95 : 0.05 | μ: 4.46 / σ: 0.77 / CI95 : 0.04 | μ: 4.44 / σ: 0.79 / CI95 : 0.04 | μ: 3.71 / σ: 1.11 / CI95 : 0.05 |
Vivid | μ: 4.25 / σ: 0.89 / CI95 : 0.04 | μ: 3.99 / σ: 0.95 / CI95 : 0.04 | μ: 3.39 / σ: 1.14 / CI95 : 0.05 | μ: 4.41 / σ: 0.80 / CI95 : 0.04 | μ: 4.40 / σ: 0.79 / CI95 : 0.04 | μ: 3.66 / σ: 1.12 / CI95 : 0.05 |
Evaluation
CreatedType
Loading...CUSTOM
Stimulus
CountRequested responsesLanguage
96015 per each audio (28,800 total)English (United States)
Evaluator
Total responsesValid/total evaluatorsGender ratio
38,724650 / 1,060 (61%)
MaleFemaleOther
3343122
Custom evaluation query (2)
Question 1Listen to the speech sample and rate how natural they sound
Responses
5. Excellent4. Good3. Fair2. Poor1. Bad
Question 2How would you rate the overall quality of the audio for the given transcription?
Responses
5. Clear and easy to understand.4. Minor background noise, but does not affect comprehension.3. Some distortions or noise, occasionally affecting comprehension.2. Significant distortions or noise, making comprehension difficult.1. Unintelligible or unusable audio.