VoiceTest - Voice Agent Test Harness
Open-source test harness for voice agents with support for Retell, VAPI, Bland, and LiveKit. Run autonomous simulations and evaluate with LLM judges.
Practical testing workflows automated with AI agents.
Open-source test harness for voice agents with support for Retell, VAPI, Bland, and LiveKit. Run autonomous simulations and evaluate with LLM judges.
an open source RAG evaluation framework that does not require golden answers, and can be used to evaluate performance of RAG tools connected to an AI
Arize-Phoenix is an open source library for agent testing, evaluation and observability.
AgentBench v0.2 is a benchmark designed to evaluate Large Language Models as agents across a diverse set of environments, enhancing framework usabilit
Simulate user interactions to evaluate chatbot performance, ensuring robustness and reliability in real-world scenarios.
Introduces AgentEval for evaluating and assessing the performance of LLM-based applications.
Bananalyzer is a framework for evaluating AI agents on web tasks, utilizing Playwright for creating diverse datasets of website snapshots for reliable