Speech to Text

Evaluate Speech to Text providers and generate comparative leaderboards.

Input Data Structure

Organize your input data in the following structure:

├── /path/to/data
│   └── stt.csv
│   └── audios/
│       └── audio_1.wav
│       └── audio_2.wav

The stt.csv file should have the following format:

id	text
audio_1	Hi
audio_2	Madam, my name is Geeta Shankar

All audio files should be in WAV format. The evaluation script expects files at audios/<id>.wav where <id> matches the id column in your CSV.

Learn more about metrics

Detailed explanation of all metrics, including why LLM Judge is necessary

calibrate stt eval

Run Speech to Text evaluation against a specific provider.

calibrate stt eval -p <provider> -l <language> -i <input_dir> -o <output_dir>

Arguments

Flag	Long	Type	Required	Default	Description
`-p`	`--provider`	string	Yes	-	Provider: `deepgram`, `openai`, `cartesia`, `google`, `sarvam`, `elevenlabs`, `smallest`, `groq`
`-l`	`--language`	string	No	`english`	Language: `english`, `hindi`, `kannada`, `bengali`, `malayalam`, `marathi`, `odia`, `punjabi`, `tamil`, `telugu`, `gujarati`, `sindhi`
`-i`	`--input-dir`	string	Yes	-	Path to input directory containing `stt.csv` and `audios/` folder
`-o`	`--output-dir`	string	No	`./out`	Path to output directory for results
`-f`	`--input-file-name`	string	No	`stt.csv`	Name of input CSV file
`-d`	`--debug`	flag	No	`false`	Run on first N audio files only
`-dc`	`--debug_count`	int	No	`5`	Number of files in debug mode
	`--ignore_retry`	flag	No	`false`	Skip retry if not all audios processed
	`--overwrite`	flag	No	`false`	Overwrite existing results instead of resuming

Examples

Basic evaluation:

calibrate stt eval -p deepgram -l english -i ./data -o ./out

Evaluate with Hindi language:

calibrate stt eval -p sarvam -l hindi -i ./data -o ./out

Debug mode (process only first 5 files):

calibrate stt eval -p deepgram -l english -i ./data -o ./out -d -dc 5

Output Structure

/path/to/output/<provider>
├── results.csv       # Per-file results with metrics
├── metrics.json      # Aggregated metrics (WER, similarity, LLM judge)
└── results.log       # Full logs including terminal output

Output Files

results.csv contains per-file evaluation results:

id	gt	pred	wer	string_similarity	llm_judge_score	llm_judge_reasoning
audio_1	Hello	Hello	0.0	1.0	True	Exact match
audio_2	My name is Geeta	My name is Gita	0.25	0.9	False	Name spelling differs

calibrate stt leaderboard

Generate a comparative leaderboard from multiple evaluation runs.

calibrate stt leaderboard -o <output_dir> -s <save_dir>

Arguments

Flag	Long	Type	Required	Description
`-o`	`--output-dir`	string	Yes	Directory containing evaluation run folders
`-s`	`--save-dir`	string	Yes	Directory to save leaderboard outputs

Example

calibrate stt leaderboard -o ./out -s ./leaderboard

Output

The leaderboard command generates:

stt_leaderboard.xlsx - Comparative spreadsheet of all providers
Individual metric charts: wer.png, string_similarity.png, llm_judge_score.png

Supported Providers

Provider	Languages
`deepgram`	English, Hindi
`openai`	English, Hindi
`google`	English, Hindi, Kannada, Sindhi
`sarvam`	English, Hindi, Kannada
`elevenlabs`	English, Sindhi
`cartesia`	English, Sindhi
`smallest`	English, Hindi
`groq`	English

Sindhi STT is supported by Google (uses chirp_2 model), Cartesia, and ElevenLabs providers.

Required Environment Variables

Set the appropriate API key for your chosen provider:

export DEEPGRAM_API_KEY=your_key
export OPENAI_API_KEY=your_key
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
export SARVAM_API_KEY=your_key
export ELEVENLABS_API_KEY=your_key
export CARTESIA_API_KEY=your_key
export GROQ_API_KEY=your_key
export SMALLEST_API_KEY=your_key

CLI

Component Testing

End to End Tests

Input Data Structure

Learn more about metrics

calibrate stt eval

Arguments

Examples

Output Structure

Output Files

calibrate stt leaderboard

Arguments

Example

Output

Supported Providers

Required Environment Variables

CLI

Component Testing

End to End Tests

​Input Data Structure

Learn more about metrics

​calibrate stt eval

​Arguments

​Examples

​Output Structure

​Output Files

​calibrate stt leaderboard

​Arguments

​Example

​Output

​Supported Providers

​Required Environment Variables

Input Data Structure

calibrate stt eval

Arguments

Examples

Output Structure

Output Files

calibrate stt leaderboard

Arguments

Example

Output

Supported Providers

Required Environment Variables