This is a collection of challenges for general-purpose web-browsing AI agents.
They're designed to be:
- easy for humans to complete
- hard for AI agents to complete
- fast and simple to run
- just client-side state and a single-page JavaScript app
- easy to evaluate
- each task provides a unique password on successful completion
Read the annoucement blog on the Convergence website: https://convergence.ai/introducing-webgames/
cd webgames
pnpm install
pnpm run dev
Tasks are available as a dataset on Hugging Face.
Alternatively, you can download them from the webgames website:
- Go to webgames.convergence.ai?showDownloads=true
- Click the download buttons in the top-right corner (csv or jsonl available)
- Verify your agent solutions using
solution in messages[-1]
or equivalent, or use the Inspect AI eval scaffolding in the eval folder.