Skip to main content
This guide shows you how to set up automated evaluations for your LLM based on your use case.

Create an agent

From the sidebar, click AgentsNew agent. Configure the system prompt, select your STT, TTS, and LLM providers and add tools to the agent.
Learn more about agent configuration in Core Concepts: Agents
Agent Configuration

Create your first test case

Open the LLM Evaluation tab and click on Add test to create a new test.
Create a new test
You can create two types of test cases:
These tests verify your agent responds appropriately to the last user message given a conversation history defined by you by checking if the agent’s response meets your criteria (for example, tone, content, or accuracy).
These tests verify that your agent calls the correct tools with the right parameters given a conversation history defined by you.

Create a next reply test

Next reply tests verify that your agent response adheres to your criteria given a conversation history defined by you.
Create a next reply test
As shown in the image, you need to create the conversation history for the edge case you need to test and add the success criteria for the agent’s next response.

Create a tool invocation test

Tool invocation tests verify that your agent calls the correct tools with the right parameters given a conversation history defined by you.
Create a tool invocation test
As shown in the image, you need to create the conversation history for the edge case you need to test and select the tools that must be called along with the correct parameters.

Run one test on one agent

Once the test is created, you can click on the play button to run that test.
Run one test on one agent
Select the agent from the dropdown in the dialog box that appears and hit Run test.
Run one test on one agent
Selecting the Attach this test to the agent config checkbox will attach the test to the list of all tests for the selected agent
A test runner will open up with the status of the test updating once it completes. By clicking on a test case, you can view the agent’s response and whether it passed the test.
Run one test on one agent

Run all tests for one agent

Navigate to the Tests tab of the agent you want to test. You can add new tests by selecting the Add test button or run the existing tests by clicking the Run all tests button.
Run all tests for one agent
A test runner will open up with the status of each test case updating as it completes. By clicking on a test case, you can view the agent’s response and whether it passed the test.
Results of all tests
You can view all the past test runs for that agent and their results.
Past test runs

Find the best LLM for your agent

The tests above are run using the LLM configured for that agent. But it may not be the optimal model for your use case. You can compare the performance of different LLMs on your tests by clicking the Compare models button.
Compare models
You can select upto 5 models that you want to compare and select Run comparison to start the evaluation.
Run benchmark
You will see the status of each test for each provider updating as it completes.
Benchmark status
Once the tests for all the providers are complete, a leaderboard will be displayed with the results.
Benchmark results
The pass rate for each model indicates the % of tests passed.

Next Steps