Research

Tool-Use
Benchmarks
Tau2-Infinity: Autonomously Mining Hard Tasks for Tool-Use Agents
Research Team @ Vibrant Labs
Read more

CUA
Benchmarks
Mining Hard Tasks for Web Agents: An Adversarial E-Commerce Benchmark
Research Team @ Vibrant Labs
Read more

Coding Agents
Benchmarks
Cloning Bench: Evaluating AI Agents on Visual Website Cloning
Research Team @ Vibrant Labs
Read more

CUA
Benchmarks
PA Bench: Evaluating Web Agents on Real World Personal Assistant Workflows
Research Team @ Vibrant Labs
Read more

Hello World
Shahul Elavakkattil Shereef
Vibrant Labs benchmarks and builds environments for long horizon agents
Read more




