Podonos logo

Naturalness and quality comparison of TTS APIs for US English

Naturalness and quality comparison of multiple TTS models including Resemble AI, Google Cloud, AWS, Open AI HD, Eleven Labs, and Play AILoading...

Mean

Comparison Graph

Mean table

TagResemble AIGoogle CloudAWSOpen AI HDEleven LabsPlay AI
Manμ: 4.21 / σ: 0.95 / CI95 : 0.05μ: 3.99 / σ: 0.96 / CI95 : 0.05μ: 3.56 / σ: 1.11 / CI95 : 0.06μ: 4.37 / σ: 0.80 / CI95 : 0.04μ: 4.50 / σ: 0.72 / CI95 : 0.03μ: 3.71 / σ: 1.11 / CI95 : 0.05
Narrativeμ: 4.16 / σ: 0.93 / CI95 : 0.05μ: 3.94 / σ: 1.01 / CI95 : 0.05μ: 3.57 / σ: 1.09 / CI95 : 0.06μ: 4.41 / σ: 0.78 / CI95 : 0.04μ: 4.56 / σ: 0.70 / CI95 : 0.04μ: 3.76 / σ: 1.10 / CI95 : 0.05
Womanμ: 4.20 / σ: 0.87 / CI95 : 0.04μ: 3.95 / σ: 0.99 / CI95 : 0.05μ: 3.40 / σ: 1.12 / CI95 : 0.05μ: 4.46 / σ: 0.77 / CI95 : 0.04μ: 4.44 / σ: 0.79 / CI95 : 0.04μ: 3.71 / σ: 1.11 / CI95 : 0.05
Vividμ: 4.25 / σ: 0.89 / CI95 : 0.04μ: 3.99 / σ: 0.95 / CI95 : 0.04μ: 3.39 / σ: 1.14 / CI95 : 0.05μ: 4.41 / σ: 0.80 / CI95 : 0.04μ: 4.40 / σ: 0.79 / CI95 : 0.04μ: 3.66 / σ: 1.12 / CI95 : 0.05

Evaluation

CreatedType
Loading...CUSTOM

Stimulus

CountRequested responsesLanguage
96015 per each audio (28,800 total)English (United States)

Evaluator

Total responsesValid/total evaluatorsGender ratio
38,724650 / 1,060 (61%)
MaleFemaleOther
3343122

Custom evaluation query (2)

Question 1Listen to the speech sample and rate how natural they sound
Responses
5. Excellent4. Good3. Fair2. Poor1. Bad
Question 2How would you rate the overall quality of the audio for the given transcription?
Responses
5. Clear and easy to understand.4. Minor background noise, but does not affect comprehension.3. Some distortions or noise, occasionally affecting comprehension.2. Significant distortions or noise, making comprehension difficult.1. Unintelligible or unusable audio.