Text to Speech

Evaluate Text to Speech providers and generate comparative leaderboards.

Input Data Format

Prepare a CSV file with text samples to synthesize:

id	text
row_1	hello world
row_2	this is a test
row_3	how are you doing today

The CSV should have two columns:

id - Unique identifier for each text
text - The text to synthesize into speech

Learn more about metrics

Detailed explanation of all metrics and how LLM Judge works

calibrate tts eval

Run Text to Speech evaluation against a specific provider.

calibrate tts eval -p <provider> -l <language> -i <input_file> -o <output_dir>

Arguments

Flag	Long	Type	Required	Default	Description
`-p`	`--provider`	string	No	`google`	Provider: `cartesia`, `openai`, `groq`, `google`, `elevenlabs`, `sarvam`, `smallest`
`-l`	`--language`	string	No	`english`	Language: `english`, `hindi`, `kannada`, `bengali`, `malayalam`, `marathi`, `odia`, `punjabi`, `tamil`, `telugu`, `gujarati`, `sindhi`
`-i`	`--input`	string	Yes	-	Path to input CSV file
`-o`	`--output-dir`	string	No	`./out`	Path to output directory
`-d`	`--debug`	flag	No	`false`	Run on first N texts only
`-dc`	`--debug_count`	int	No	`5`	Number of texts in debug mode
	`--overwrite`	flag	No	`false`	Overwrite existing results instead of resuming

Examples

Basic evaluation:

calibrate tts eval -p google -l english -i ./sample.csv -o ./out

Evaluate with Cartesia:

calibrate tts eval -p cartesia -i ./sample.csv -o ./out

Debug mode (process only first 3 texts):

calibrate tts eval -p google -i ./sample.csv -o ./out -d -dc 3

Output Structure

/path/to/output/<provider>
├── audios/
│   ├── row_1.wav
│   ├── row_2.wav
│   └── row_3.wav
├── results.csv       # Per-text results with TTFB and LLM judge scores
├── metrics.json      # Aggregated metrics (TTFB mean/std, LLM judge score)
└── results.log       # Terminal output summary

Output Files

results.csv contains per-text evaluation results:

id	text	audio_path	ttfb	llm_judge_score	llm_judge_reasoning
row_1	hello world	./out/elevenlabs/audios/row_1.wav	1.511	True	The audio says ‘hello world’ clearly and matches the reference text exactly.
row_2	this is a test	./out/elevenlabs/audios/row_2.wav	1.215	True	The audio clearly says ‘this is a test,’ which matches exactly with the provided reference text.

calibrate tts leaderboard

Generate a comparative leaderboard from multiple evaluation runs.

calibrate tts leaderboard -o <output_dir> -s <save_dir>

Arguments

Flag	Long	Type	Required	Description
`-o`	`--output-dir`	string	Yes	Directory containing evaluation run folders
`-s`	`--save-dir`	string	Yes	Directory to save leaderboard outputs

Example

calibrate tts leaderboard -o ./out -s ./leaderboard

Output

The leaderboard command generates:

tts_leaderboard.xlsx - Comparative spreadsheet of all providers
Individual metric charts: llm_judge_score.png, ttfb.png

Supported Providers

Provider	Languages
`google`	English, Hindi, Kannada, Sindhi
`openai`	English
`cartesia`	English
`elevenlabs`	English, Sindhi
`sarvam`	Hindi, Kannada
`groq`	English
`smallest`	English, Hindi

Sindhi TTS is supported by Google (uses gemini-2.5-flash-lite-preview-tts model) and ElevenLabs (uses eleven_v3 model).

Required Environment Variables

Set the appropriate API key for your chosen provider:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
export OPENAI_API_KEY=your_key
export CARTESIA_API_KEY=your_key
export ELEVENLABS_API_KEY=your_key
export SARVAM_API_KEY=your_key
export SMALLEST_API_KEY=your_key

CLI

Component Testing

End to End Tests

Input Data Format

Learn more about metrics

calibrate tts eval

Arguments

Examples

Output Structure

Output Files

calibrate tts leaderboard

Arguments

Example

Output

Supported Providers

Required Environment Variables

CLI

Component Testing

End to End Tests

​Input Data Format

Learn more about metrics

​calibrate tts eval

​Arguments

​Examples

​Output Structure

​Output Files

​calibrate tts leaderboard

​Arguments

​Example

​Output

​Supported Providers

​Required Environment Variables

Input Data Format

calibrate tts eval

Arguments

Examples

Output Structure

Output Files

calibrate tts leaderboard

Arguments

Example

Output

Supported Providers

Required Environment Variables