Skip to main content

Documentation Index

Fetch the complete documentation index at: https://penseapp.vercel.app/docs/llms.txt

Use this file to discover all available pages before exploring further.

Calibrate lets you evaluate multiple TTS providers simultaneously using your own text samples. This guide will walk you through creating an evaluation, managing reusable datasets, and sharing your results.

Start a new evaluation

From the sidebar, click on Text-to-Speech to view all your evaluations and datasets. Click the New evaluation button to create a new evaluation.
Text-to-Speech Evaluations List

Add your dataset

On the Dataset tab, choose how to provide your text samples:
New TTS Evaluation - Dataset Tab

Enter manually

Create a new dataset inline. Give it a name, then add text samples in one of two ways:
  1. Add samples inline — Type the text to synthesize in each row. Click + Add another row to add more entries.
  2. Bulk upload via CSV — Upload a CSV file with a text column:
text
Hello, how can I help you today?
Your appointment is confirmed for tomorrow at 3 PM.
Thank you for calling. Have a great day!
Click Download sample to get a template with the correct format.
Use Existing Dataset
Your dataset is automatically saved so you can reuse it in future evaluations.
If you’ve already created a dataset, switch to Use existing dataset to pick from your saved datasets.
Use Existing Dataset
You can also create and manage datasets independently from the Datasets tab. See Datasets to learn how to create reusable datasets and run evaluations directly from them.

Configure settings

Switch to the Settings tab to select the language and the providers you want to compare:
TTS Settings - Language and Providers
Select a Language from the dropdown and check the providers you want to evaluate. Each provider shows the model and Voice ID it uses.

Run evaluation

Click the Evaluate button at the top to start the evaluation. You’ll be redirected to the results page where you can monitor progress in real-time.

View results

Outputs

The Outputs tab shows per-provider results. Select a provider from the list on the left to see its overall metrics and per-sample results.
TTS Outputs with Audio Playback
Each sample row includes:
  • Text — The input text you provided
  • Audio — An inline audio player to listen to the generated speech
  • LLM Judge — Pass/Fail based on audio quality evaluation

Leaderboard

The Leaderboard tab shows a side-by-side comparison across all providers with aggregated metrics and bar charts.
TTS Leaderboard

Sharing results publicly

Once your evaluation completes, you can make the results publicly accessible by clicking the Share button on the results page.
Share button on evaluation results
This toggles the evaluation to Public and generates a shareable link. Anyone with the link can view the leaderboard and outputs without needing a Calibrate account.
Public evaluation with copy link
Click the Public button again to make the evaluation private.

Next Steps

Core Concepts

Learn about TTS metrics — LLM Judge Score and TTFB

Datasets

Save and reuse text samples across multiple evaluations

Simulations

Run simulated conversations with your agent