AI Agent Use Cases

AI Agent for Testing

20 real-world examples of AI agents used for testing. Every case includes the tech stack, source link, and author. No fluff — just what people actually built and shipped.

voicetest

Testingcommunitysource

Open-source test harness for voice agents with support for Retell, VAPI, Bland, and LiveKit. Run autonomous simulations and evaluate with LLM judges.

Python

Composio

Composio enables quick integration of 90+ tools for developers and agents, offering managed authentication, easy testing, and up-to-date APIs to simpl

Python

LangSmith by LangChain

LangSmith provides tools for debugging, testing, evaluating, and monitoring LLM applications, integrating seamlessly with LangChain for comprehensive

LangChain

Open-RAG-Eval

Testingcommunitysource

an open source RAG evaluation framework that does not require golden answers, and can be used to evaluate performance of RAG tools connected to an AI

Go
RAG

Voice Lab

Testingcommunitysource

A comprehensive testing and evaluation framework for voice agents across language models, prompts, and agent personas.

Python

Arize-Phoenix

Testingcommunitysource

Arize-Phoenix is an open source library for agent testing, evaluation and observability.

Python

AgentBench

Testingcommunitysource

AgentBench v0.2 is a benchmark designed to evaluate Large Language Models as agents across a diverse set of environments, enhancing framework usabilit

Python

Manifest

Testingcommunitysource

Open-source, real-time cost observability platform for AI agents. Track tokens, costs, messages, and model usage with a local-first dashboard. Support

Python

AgentOps

Testingcommunitysource

AgentOps aims to improve AI agent development with tools for observability, evaluations, and replay analytics, offering a streamlined process for test

Python

EvoAgentX

Testingcommunitysource

EvoAgentX is building a Self-Evolving Ecosystem of AI Agents, it will give you automated framework for evaluating and evolving agentic workflows.

Python

LangFuse

Testingcommunitysource

Langfuse, an open-source LLM engineering platform, offers debugging, prompt management, metrics for LLM apps improvement, and won the #1 Golden Kitty

Python

Bananalyzer by Reworkd

Testingcommunitysource

Bananalyzer is a framework for evaluating AI agents on web tasks, utilizing Playwright for creating diverse datasets of website snapshots for reliable

Playwright

Vapi

Vapi is a developer-friendly platform that enables the rapid creation, testing, and deployment of voicebots, revolutionizing voice AI integration with

Python

Polymarket Autopilot

Financehesamsheikhsource

Automated paper trading on prediction markets with backtesting, strategy analysis, and daily performance reports.

OpenClaw
GitHub
Newsletter

Best AI agent use cases, weekly.

More AI agent categories