Blog

AI
Featured
Evals
PA Bench: Evaluating Web Agents on Real World Personal Assistant Workflows
Research Team @ Vibrant Labs
Feb 16, 2026
Read more

Featured
Updates
Hello World
Shahul Elavakkattil Shereef
Nov 19, 2025
Vibrant Labs benchmarks and builds environments for long horizon agents
Read more

AI
Evaluating the Evaluators
Shahul Elavakkattil Shereef
Aug 18, 2025
Benchmarking Alignment Strategies for LLM-as-Judges
Read more

AI
Evals
OSS
Hard-Earned Lessons from 2 Years of Improving AI Applications
Shahul Elavakkattil Shereef
May 7, 2025
A step-by-step guide to setup evaluations and improve AI systems
Read more

LLM
Evals
Aligning LLM as judge with human evaluators
Shahul Elavakkattil Shereef
Dec 11, 2024
Aligning and Improving LLM based metrics using human feedback
Read more

LLM
Data
All about synthetic data generation
Shahul Elavakkattil Shereef
Nov 19, 2024
An in-depth survey blog on synthetic data generation with LLMs
Read more

LLM
Evaluation
All about evaluating Large language models
Shahul Elavakkattil Shereef
Jul 9, 2024
Deep survey blog on evaluating LLM applications
Read more