Research

CUA

Benchmarks

Ecom Bench: Verifiable Shopping Tasks on the Live Web

Research Team @ Vibrant Labs

Jun 23, 2026

Read more

Tool-Use

Benchmarks

Tau2-Infinity: Autonomously Mining Hard Tasks for Tool-Use Agents

Research Team @ Vibrant Labs

May 12, 2026

Read more

CUA

Benchmarks

Mining Hard Tasks for Web Agents: An Adversarial E-Commerce Benchmark

Research Team @ Vibrant Labs

May 7, 2026

Read more

Coding Agents

Benchmarks

Cloning Bench: Evaluating AI Agents on Visual Website Cloning

Research Team @ Vibrant Labs

Mar 19, 2026

Read more

CUA

Benchmarks

PA Bench: Evaluating Web Agents on Real World Personal Assistant Workflows

Research Team @ Vibrant Labs

Feb 16, 2026

Read more

Hello World

Shahul Elavakkattil Shereef

Nov 19, 2025

Vibrant Labs benchmarks and builds environments for long horizon agents

Read more

Evaluating the Evaluators

Aug 18, 2025

Hard-Earned Lessons from 2 Years of Improving AI Applications

May 7, 2025

Aligning LLM as judge with human evaluators

Dec 11, 2024

All about synthetic data generation

Nov 19, 2024

All about evaluating Large language models

Jul 9, 2024

Vibrant Labs is proudly backed by

© 2026 Exploding Gradients Inc. All rights reserved.

Vibrant Labs is proudly backed by

© 2026 Exploding Gradients Inc. All rights reserved.

Vibrant Labs is proudly backed by

© 2026 Exploding Gradients Inc. All rights reserved.