OS-Genesis: Reverse Task Synthesis
1/
Most UI agent scaling is currently throttled by the cost of human time. OS-Genesis took a much more scalable path by using Reverse Task Synthesis. Instead of recording a user completing a task, they started from a terminal state and worked backwards to hypothesize the intent.
2/
The intuition here is that it’s easier to verify a goal if you already have the state. By starting with the app state and generating the task description, they hit a 91% success rate on trajectory generation.
3/
The jump on AndroidWorld is the part that stood out to us the most. Going from 15.3% to 31.8% just by augmenting with synthetic data suggests we’re nowhere near the ceiling for what small models can do if the data diversity is high enough.
4/
For our work on CUAs, the reverse synthesis logic is a potential fix for the ground truth problem. If you can automate the trajectory, you can train on environments that humans haven’t manually mapped out yet.
5/
At Vibrant Labs, we’re very interested with how synthetic data trajectories will define the next generation of CUAs. If you can automate the training data, you can automate the agent.