Skip to main content
Run full voice agent simulations with Speech to Text, LLM, and Text to Speech components, or interact with an agent in real-time.

Supported Providers

ComponentProviders
Speech to Textdeepgram, google, openai, elevenlabs, sarvam, cartesia
LLMopenrouter, openai
Text to Speechelevenlabs, cartesia, google, openai, deepgram, sarvam

Learn more about metrics

Detailed explanation of all metrics and LLM Judge

calibrate agent simulation

Run automated voice agent simulations with multiple personas and scenarios. Each simulation pairs every persona with every scenario, generating audio files and transcripts for inspection.
calibrate agent simulation -c <config_file> -o <output_dir> --port <port>

Arguments

FlagLongTypeRequiredDefaultDescription
-c--configstringYes-Path to simulation configuration JSON file
-o--output-dirstringNo./outPath to output directory
--portintNo8765Base WebSocket port

Examples

Basic simulation:
calibrate agent simulation -c ./config.json -o ./out
Custom port:
calibrate agent simulation -c ./config.json -o ./out --port 9000

Configuration File Structure

{
  "system_prompt": "You are a helpful nurse filling out a form...",
  "tools": [
    {
      "type": "client",
      "name": "plan_next_question",
      "description": "Plan the next question",
      "parameters": [
        {
          "id": "next_unanswered_question_index",
          "type": "integer",
          "description": "Next question index",
          "required": true
        }
      ]
    }
  ],
  "personas": [
    {
      "characteristics": "A shy mother named Geeta, 39 years old, gives short answers",
      "gender": "female",
      "language": "english",
      "interruption_sensitivity": "medium"
    }
  ],
  "scenarios": [
    {"description": "User completes the form without any issues"},
    {"description": "User hesitates and wants to skip some questions"}
  ],
  "evaluation_criteria": [
    {
      "name": "question_completeness",
      "description": "Whether all the questions in the form were covered"
    }
  ],
  "stt": {"provider": "google"},
  "tts": {"provider": "google"},
  "llm": {"provider": "openrouter", "model": "openai/gpt-4.1"},
  "settings": {
    "agent_speaks_first": true,
    "max_turns": 50
  }
}

Persona Configuration

FieldTypeRequiredDescription
characteristicsstringYesDescription of personality, background, behavior
genderstringYesmale or female
languagestringYesenglish, hindi, kannada, bengali, malayalam, marathi, odia, punjabi, tamil, telugu, gujarati
interruption_sensitivitystringNonone, low, medium, or high

Output Structure

/path/to/output/
├── simulation_persona_1_scenario_1/
│   ├── audios/
│   │   ├── 0_user.wav
│   │   ├── 1_bot.wav
│   │   └── ...
│   ├── conversation.wav         # Combined audio
│   ├── transcript.json          # Full conversation
│   ├── stt_outputs.json         # Speech to Text transcriptions
│   ├── tool_calls.json          # Tool calls made
│   ├── evaluation_results.csv   # Evaluation + latency metrics
│   ├── stt_results.csv          # Speech to Text accuracy evaluation
│   ├── metrics.json             # Per-processor latency
│   ├── config.json              # Persona and scenario
│   ├── results.log
│   └── logs
├── simulation_persona_1_scenario_2/
│   └── ...
├── results.csv                  # Aggregated results
└── metrics.json                 # Summary statistics

Output Files

audios/ contains alternating user and bot audio files for each turn. conversation.wav is a combined audio file of the entire conversation. evaluation_results.csv contains evaluation criteria, latency metrics, and Speech to Text scores:
namevaluereasoning
question_completeness1All questions were covered…
assistant_behavior1One question per turn…
ttft0.62
processing_time0.62
stt_llm_judge_score0.95
stt_results.csv contains per-turn Speech to Text evaluation:
referencepredictionscorereasoning
Geeta Prasad.Gita Prasad.0Name ‘Geeta’ transcribed as ‘Gita’
results.csv aggregates match scores across all simulations:
namequestion_completenessassistant_behaviorstt_llm_judge_score
simulation_persona_1_scenario_11.01.00.95
simulation_persona_1_scenario_21.00.00.92

calibrate agent test

Run an interactive voice agent session where you can talk to the agent through your browser in real-time.
calibrate agent test -c <config_file> -o <output_dir>

Arguments

FlagLongTypeRequiredDefaultDescription
-c--configstringYes-Path to agent configuration JSON file
-o--output-dirstringNo./outPath to output directory

Example

calibrate agent test -c ./config.json -o ./out/run

Configuration File Structure

{
  "system_prompt": "You are a helpful assistant.",
  "language": "english",
  "stt": {"provider": "deepgram"},
  "tts": {
    "provider": "cartesia",
    "voice_id": "YOUR_VOICE_ID"
  },
  "llm": {
    "provider": "openrouter",
    "model": "openai/gpt-4o-2024-11-20"
  },
  "tools": []
}

Usage

  1. Run the command to start the agent server
  2. Open http://localhost:7860/client/ in your browser
  3. Click Connect and start talking to the agent
Avoid using your laptop’s speaker and microphone at the same time. Audio from the speaker can be picked up by the microphone, interrupting the agent with its own voice. Use a headphone with a built-in microphone.

Required Environment Variables

Set the API keys for your chosen providers:
# Speech to Text Providers
export DEEPGRAM_API_KEY=your_key
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
export OPENAI_API_KEY=your_key
export ELEVENLABS_API_KEY=your_key
export SARVAM_API_KEY=your_key
export CARTESIA_API_KEY=your_key

# LLM Providers
export OPENAI_API_KEY=your_key
export OPENROUTER_API_KEY=your_key
You only need to set API keys for the providers you configure in your config file.