Skip to main content

Configuration

You need to create a config file that defines the following:
  • Instructions for your agent (system prompt)
  • Tools available to your agent
  • Personas that simulate different user types
  • Scenarios that define conversation patterns
  • Evaluation criteria to measure agent performance
  • The STT, TTS, and LLM providers to use (for voice simulations only)
This section explains the different keys in the config file.

system_prompt

The system prompt that defines your agent’s behavior. This is the same prompt you use in production.
You are a helpful customer support assistant for an online store.

You help customers with:
1. Checking order status
2. Processing returns
3. Answering product questions

tools

A list of tools available to your agent. See the guide on Configuring Tools for how to set it up along with examples for different tool types.

personas

An array of user personas that simulate different types of users interacting with your agent. For a primer on personas, refer to Personas. Each persona has:
KeyTypeDescription
labelstringA short name for the persona
characteristicsstringDetailed description of who the persona represents and how they behave
genderstringGender for voice simulations: male or female
languagestringLanguage the persona speaks: english, hindi, or kannada (more coming soon)
interruption_sensitivitystring(Voice only) How likely the persona is to interrupt the agent mid-sentence: none (0%), low (25%), medium (50%), high (80%)
Example:
{
  "personas": [
    {
      "label": "friendly customer",
      "characteristics": "You are a friendly customer who wants to check your order status. Your order ID is ORD-12345. You are polite and patient.",
      "gender": "neutral",
      "language": "english"
    },
    {
      "label": "impatient customer",
      "characteristics": "You are an impatient customer who has been waiting a long time for your order. Your order ID is ORD-67890. You are frustrated but not rude.",
      "gender": "neutral",
      "language": "english"
    }
  ]
}

scenarios

An array of scenarios that define different conversation patterns to test. For a primer on scenarios, refer to Scenarios. Each scenario has:
KeyTypeDescription
namestringA short name for the scenario
descriptionstringInstructions for what the simulated user should do
Example:
{
  "scenarios": [
    {
      "name": "simple order inquiry",
      "description": "Ask about your order status directly and provide your order ID right away."
    },
    {
      "name": "vague inquiry",
      "description": "Start by asking about your order without providing the order ID. Only provide it after being asked."
    }
  ]
}

evaluation_criteria

An array of criteria used to evaluate the agent’s performance. Each criterion has:
KeyTypeDescription
namestringA short name for the criterion (used in results)
descriptionstringWhat the criterion measures (used by the LLM judge)
Example:
{
  "evaluation_criteria": [
    {
      "name": "tool_usage",
      "description": "The agent should call the log_inquiry tool with the correct inquiry_type and order_id when provided by the customer."
    },
    {
      "name": "response_quality",
      "description": "The agent should be helpful, polite, and guide the customer through the process."
    }
  ]
}

settings

Optional settings to control the simulation:
KeyTypeDescription
agent_speaks_firstbooleanWhether the agent initiates the conversation (default: true)
max_turnsnumberMaximum number of agent messages after which the simulated conversation ends automatically (default: 10)

stt, llm, and tts (voice simulations only)

For voice simulations, specify the STT, LLM, and TTS providers:
{
  "stt": {
    "provider": "google"
  },
  "tts": {
    "provider": "google"
  },
  "llm": {
    "provider": "openrouter",
    "model": "openai/gpt-4.1"
  }
}

Full example

Refer to this sample for a text simulation and this sample for a voice simulation.

Get started

calibrate simulations
The interactive UI guides you through the full simulation process:
  1. Simulation type — text (LLM-only) or voice (full STT → LLM → TTS pipeline)
  2. Config file — path to the config file you created in the previous section
  3. Provider — OpenRouter or OpenAI (text simulations only)
  4. Model — enter the model you want to use for the simulation (text simulations only)
  5. Parallel count — run multiple simulations simultaneously (default: 1)
  6. Output directory — where results will be saved (defaults to ./out)
  7. API keys — enter the API keys for the selected providers
Simulated conversations are run for all persona × scenario combinations. For example, with 2 personas and 2 scenarios, you get 4 simulations.

Output

Once all simulated conversations complete, it displays the overall metrics aggregated across all simulations along with bar charts for visualization.
Simulation overview
You can drill into each simulation to view the full transcript:
Simulation transcript
and review the reasoning for each evaluation criterion:
Simulation evaluation

Resources

Personas

Learn how to create realistic user personas

Scenarios

Learn how to write effective scenarios