Testing

AgentBench

AgentBench v0.2 is a benchmark designed to evaluate Large Language Models as agents across a diverse set of environments, enhancing framework usabilit

Author: community

What was done

AgentBench v0.2 is a benchmark designed to evaluate Large Language Models as agents across a diverse set of environments, enhancing framework usability and extending model evaluations

Stack

Python

Share

Similar use cases

voicetest0 votes
Voice Lab0 votes