VoiceTest - Voice Agent Test Harness
Open-source test harness for voice agents with support for Retell, VAPI, Bland, and LiveKit. Run autonomous simulations and evaluate with LLM judges.
20 real-world examples of AI agents used for testing. Every case includes the tech stack, source link, and author. No fluff — just what people actually built and shipped.
Open-source test harness for voice agents with support for Retell, VAPI, Bland, and LiveKit. Run autonomous simulations and evaluate with LLM judges.
Composio enables quick integration of 90+ tools for developers and agents, offering managed authentication, easy testing, and up-to-date APIs to simpl
LangSmith provides tools for debugging, testing, evaluating, and monitoring LLM applications, integrating seamlessly with LangChain for comprehensive
an open source RAG evaluation framework that does not require golden answers, and can be used to evaluate performance of RAG tools connected to an AI
Arize-Phoenix is an open source library for agent testing, evaluation and observability.
AgentBench v0.2 is a benchmark designed to evaluate Large Language Models as agents across a diverse set of environments, enhancing framework usabilit
Simulate user interactions to evaluate chatbot performance, ensuring robustness and reliability in real-world scenarios.
Introduces AgentEval for evaluating and assessing the performance of LLM-based applications.
Bananalyzer is a framework for evaluating AI agents on web tasks, utilizing Playwright for creating diverse datasets of website snapshots for reliable
Vapi is a developer-friendly platform that enables the rapid creation, testing, and deployment of voicebots, revolutionizing voice AI integration with
An AI agent that automates hacking and penetration testing, enhancing security assessments efficiently.
Cursor 2.0 runs up to 8 background AI agents in parallel for autonomous multi-file refactoring, testing, and code generation.
Automated paper trading on prediction markets with backtesting, strategy analysis, and daily performance reports.
Best AI agent use cases, weekly.