4th Apr 2025 | personal highlight

Working on PaperBench with OpenAI

A brief reflection on the development of the PaperBench benchmark in collaboration with OpenAI.


PaperBench is a benchmark that evaluates whether AI agents can replicate state-of-the-art AI research from scratch. Agents are tasked with reproducing papers, which requires understanding each paper’s contributions, building the codebase, and correctly executing the experiments.

I had the opportunity to contribute to the construction of PaperBench and to work alongside OpenAI during its development.

Thank you to everyone involved for the kind collaboration.

Tim