LLM Evaluation

This guide shows you how to set up automated evaluations for your LLM based on your use case.

Create an agent

From the sidebar, click Agents → New agent. Configure the system prompt, select your STT, TTS, and LLM providers and add tools to the agent.

Learn more about agent configuration in Core Concepts: Agents

Create your first test case

Open the LLM Evaluation tab and click on Add test to create a new test.

You can create two types of test cases:

Next Reply Test

These tests verify your agent responds appropriately to the last user message given a conversation history defined by you by checking if the agent’s response meets your criteria (for example, tone, content, or accuracy).

Tool Invocation Test

These tests verify that your agent calls the correct tools with the right parameters given a conversation history defined by you.

Create a next reply test

Next reply tests verify that your agent response adheres to your criteria given a conversation history defined by you.

As shown in the image, you need to create the conversation history for the edge case you need to test and add the success criteria for the agent’s next response.

Create a tool invocation test

Tool invocation tests verify that your agent calls the correct tools with the right parameters given a conversation history defined by you.

As shown in the image, you need to create the conversation history for the edge case you need to test and select the tools that must be called along with the correct parameters.

Run one test on one agent

Once the test is created, you can click on the play button to run that test.

Select the agent from the dropdown in the dialog box that appears and hit Run test.

Selecting the Attach this test to the agent config checkbox will attach the test to the list of all tests for the selected agent

A test runner will open up with the status of the test updating once it completes. By clicking on a test case, you can view the agent’s response and whether it passed the test.

Run all tests for one agent

Navigate to the Tests tab of the agent you want to test. You can add new tests by selecting the Add test button or run the existing tests by clicking the Run all tests button.

A test runner will open up with the status of each test case updating as it completes. By clicking on a test case, you can view the agent’s response and whether it passed the test.

You can view all the past test runs for that agent and their results.

Find the best LLM for your agent

The tests above are run using the LLM configured for that agent. But it may not be the optimal model for your use case. You can compare the performance of different LLMs on your tests by clicking the Compare models button.

You can select upto 5 models that you want to compare and select Run comparison to start the evaluation.

You will see the status of each test for each provider updating as it completes.

Once the tests for all the providers are complete, a leaderboard will be displayed with the results.

The pass rate for each model indicates the % of tests passed.

Next Steps

Text to Speech

Evaluate TTS providers on your dataset

Get Started

Quickstart

Core Concepts

Create an agent

Create your first test case

Create a next reply test

Create a tool invocation test

Run one test on one agent

Run all tests for one agent

Find the best LLM for your agent

Next Steps

Text to Speech

Get Started

Quickstart

Core Concepts

​Create an agent

​Create your first test case

​Create a next reply test

​Create a tool invocation test

​Run one test on one agent

​Run all tests for one agent

​Find the best LLM for your agent

​Next Steps

Text to Speech

Create an agent

Create your first test case

Create a next reply test

Create a tool invocation test

Run one test on one agent

Run all tests for one agent

Find the best LLM for your agent

Next Steps