Overview
Frontier LLMs perform strongly across physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. DiscoverPhysics asks an LLM agent to discover the laws of motion of a simulated world whose physics deliberately deviates from our own.
We construct eleven public, and eleven private worlds governed by, among others, screened and fractional-power gravity, multi-species couplings, hidden dark-matter-like particles, and time-varying interactions. Each world is generated on demand by an N-body simulator. The agent proposes several rounds of experiments, observes raw trajectory data, and submits both a natural-language explanation and a Python implementation of the inferred law.
Across eleven frontier models, the strongest agents pass only about half of the worlds and consistently fail on those where latent structure must be uncovered. Predictive accuracy and conceptual understanding decouple: the model with the lowest trajectory MSE is not the one with the highest explanation score.
Leaderboard
A world is considered a pass if per-trajectory normalized MSE is below 0.1 and the explanation score is ≥ 0.9. Pass@k is the expected percentage of worlds passed when k seeds are sampled (without replacement) from a 5-seed pool, averaged over 1,000 Monte Carlo draws. Norm. MSE is the geometric mean of per-trajectory normalized MSE; lower is better.
Per-world breakdown
Mean explanation score (0 – 1) per model and world, averaged across 5 seeds. Hover any cell for the standard error and the geometric-mean trajectory error. Models are ordered by Pass@5 (top) to lowest.
Results at a glance
The eleven public worlds
Each world is defined by a hidden force law. Agents see only the simulator output; world names and equations below are revealed in the paper, never to the agent.
Submit a model
We're opening DiscoverPhysics to community submissions. Initially, due to validation concerns please contact matt.sampson@princeton.edu for guidelines on being added to the public leaderboard. For access to the full public and private repository request acces here: https://huggingface.co/mattWiemann/DiscoverPhysics
If you'd like to discuss a non-standard evaluation setting (different round budgets, new world, custom prompt), open an issue first.
Submission guide View schema