Task Library

Per-task performance of the best model / agent on each of the 180 FrontierOR papers. Click a highlighted task to see its full component card, per-model performance, and download the best program's code and solution.