If you already have a deployed agent, you can connect it to Calibrate and run evaluations, benchmarks, and simulations against it — without rebuilding your agent inside Calibrate.Documentation Index
Fetch the complete documentation index at: https://penseapp.vercel.app/docs/llms.txt
Use this file to discover all available pages before exploring further.
Create an agent connection
From the sidebar, click Agents → New agent. Select Connect your existing agent, give it a name, and click Create.
Configure your connection
The Connection tab lets you set up everything Calibrate needs to communicate with your agent:
-
Agent URL (required) — The HTTPS endpoint where your agent is deployed (e.g.
https://your-agent.example.com/chat). Calibrate will send aPOSTrequest to this URL with conversation messages. -
Headers (optional) — Add any headers your agent needs for authentication or custom metadata (e.g.
Authorization: Bearer YOUR_API_KEY). Click + Add header to add multiple headers.
Expected request format
Calibrate willPOST to your agent URL with this body:
messages array contains the full conversation history in chronological order.
Expected response format
By default, your agent must respond with aresponse field containing its text reply:
tool_calls:
response— The agent’s text replytool_calls— An array of tool invocations, each with atoolname andargumentsobject
Verify your connection
Before running any tests, we need to verify we can connect to your agent and that it returns the output in the correct format. Click the Verify button in the Connection check panel. A dialog will open where you can customize the sample request that will be sent to your agent:
- Edit the Messages to set the conversation history for the test request.
- The Request body preview on the right updates live so you can see the exact JSON that will be sent.
- Click Send & Verify to send the request to your agent.
- ✅ Verified — Your agent is reachable and returns the correct format
- ❌ Failed — The panel will show the error along with the actual output received from your agent so you can debug and try again.
Connection verification is required before running LLM tests, benchmarks or
simulations
Enable benchmarking across models
Calibrate supports benchmarking different LLMs on your dataset to find the best model for your agent. For this, you need to instrument your agent API to support amodel field in the request body that Calibrate will send along with the conversation history.
To get started, turn on Support benchmarking different models from the Connection tab.

- Select your Model provider from the dropdown (OpenRouter, OpenAI, Anthropic, Google, etc.). This represents the provider you have configured for your agent on your system.
- Calibrate will include a
modelfield in the request body:
model field will be different. For example, if you have selected OpenRouter, the model field will be openrouter/gpt-4.1. If you have selected OpenAI, the model field will just be gpt-4.1.
- When running a benchmark via Compare models, Calibrate will verify the connection for each selected model before starting the evaluation. See this for details.
Next Steps
Run LLM Tests
Create and run test cases against your agent
Find the best LLM
Compare the performance of different LLMs on your agent
Run Simulations
Simulate conversations with realistic personas