| --- |
| tags: |
| - walrus |
| - foundation-model |
| - physics |
| - continuum-dynamics |
| - transformer |
| - PDE |
| datasets: |
| - polymathic-ai/shear_flow |
| - polymathic-ai/gray_scott_reaction_diffusion |
| - polymathic-ai/active_matter |
| - polymathic-ai/turbulent_radiative_layer_2D |
| - polymathic-ai/supernova_explosion_64 |
| - polymathic-ai/turbulence_gravity_cooling |
| - polymathic-ai/rayleigh_benard |
| - polymathic-ai/planetswe |
| - polymathic-ai/acoustic_scattering_inclusions |
| - polymathic-ai/MHD_64 |
| - polymathic-ai/rayleigh_taylor_instability |
| - polymathic-ai/acoustic_scattering_discontinuous |
| - polymathic-ai/acoustic_scattering_maze |
| - polymathic-ai/helmholtz_staircase |
| - polymathic-ai/viscoelastic_instability |
| - BGLab/FlowBench |
| license: mit |
| --- |
| |
| # Walrus: A Cross-Domain Foundation Model for Continuum Dynamics |
|
|
| [](https://opensource.org/licenses/MIT) |
| [](https://github.com/PolymathicAI/walrus) |
| [](https://arxiv.org/abs/2511.15684) |
|
|
| Walrus is a large-scale **physics foundation model** capable of modeling a broad range of continuum dynamical systems. |
|
|
| Walrus is trained jointly across **19 diverse physical domains** spanning: |
| - astrophysics |
| - geoscience |
| - rheology |
| - plasma physics |
| - acoustics |
| - classical fluids |
|
|
| These systems have diverse boundary conditions and physical parameterizations. The model is optimized to serve as a **general-purpose surrogate** for physical simulation and a **strong initialization** for downstream fine-tuning on new PDE systems. |
|
|
| --- |
|
|
| # Model Description |
|
|
| Walrus is a **1.3B-parameter space–time Transformer** trained autoregressively to predict the temporal evolution of physical fields. Walrus is trained to model the evolution of physical systems in space and time. A simulation snapshot at time t is written as u(t). |
|
|
| We define the difference between two consecutive snapshots as: |
| Δu(t+1) = u(t+1) − u(t) |
|
|
| Given a short history of snapshots: |
| U(t) = [u(t − τ + 1), ..., u(t)] |
| |
| The model predicts the next state using: |
| u(t+1) ≈ u(t) + M(U(t)) |
| |
| ### Key architectural components |
|
|
| - **Adaptive-compute patch embedding** |
| - Token count automatically balanced across resolutions |
| - Enables mixing 2D and 3D datasets efficiently |
|
|
| - **Patch Jittering** |
| - A harmonic-analysis–motivated augmentation technique |
| - Reduces aliasing and spectral artifacts |
| - Improves long-horizon stability across 17/19 pretraining datasets |
|
|
| - **Tensor-law–aware data augmentation** |
| - 2D data embedded into 3D through plane rotations |
| - Vector/tensor fields rotated with correct physical transformations |
|
|
| - **Asymmetric normalization** |
| - **Asymmetric normalization:** Walrus normalizes inputs by RMS over space-time and de-normalizes the predicted Δu using the RMS of Δ. |
|
|
| --- |
|
|
| # Pretraining Details |
|
|
| Walrus is pretrained 19 physical datasets with: |
|
|
| - **Loss**: Per-field normalized L1 loss |
| - **Optimizer**: AdamW |
| - **Batching**: System-uniform hierarchical sampling |
| - **Time-striding**: Random stride (1–5) per training example |
| - **Patch jitter range**: Uniform per-axis random offset |
| - **Dimensional unification**: 2D fields embedded as thin 3D volumes |
|
|
| The model was pretrained on 96 **NVIDIA H100 GPUs** using distributed HSDP (4 GPU per shard group) with sampling matching distribution structure for minimal deadweight loss. |
|
|
| --- |
|
|
| # Intended Use |
|
|
| This pretrained checkpoint is suitable for: |
|
|
| ### ✔ Next-step prediction |
| ### ✔ Fast surrogate simulation |
| ### ✔ Autoregressive rollout of physical systems |
| ### ✔ Transfer learning to new physical settings |
|
|
| # Resources |
|
|
| Paper: https://arxiv.org/pdf/2511.15684 |
| Github: https://github.com/PolymathicAI/walrus |
| Tutorial: https://github.com/PolymathicAI/walrus/demo_notebooks |
| |
| Note, the training code in the repository is closely coupled with tools from [the Well](https://github.com/PolymathicAI/the_well), so |
| it can be beneficial to format data to match that schema. If that's not possible, the tutorial does show how one would use the model |
| without Well-formatted data. |
| |
| |
| # Demonstrated downstream tasks |
| |
| We show the strong performance of Walrus by finetuning on a range of challenging downstream tasks as shown in the paper. |
| Paths to access the finetuned walrus checkpoints for various downstream tasks is as follows: |
| |
| ### PDEGym CE-RM: https://huggingface.co/polymathic-ai/walrus_ft_CE-RM/tree/main |
| ### PDEBench CNS Turbulent: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_64_Turb/tree/main |
| ### PDEBench CNS Random: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_128_Rand/tree/main |
| ### Flowbench FPOSkelenton: https://huggingface.co/polymathic-ai/walrus_ft_flowbench_skelenton/tree/main |
| ### The Well Postmerger Neutron Star: https://huggingface.co/polymathic-ai/walrus_ft_post_neutron_star_merger/tree/main |
| ### The Well Convective envelope RSG: https://huggingface.co/polymathic-ai/walrus_ft_convective_envelope_rsg/tree/main |
| ### PDEArena Conditioned Incompressible NS: https://huggingface.co/polymathic-ai/walrus_ft_pdearena_ins/tree/main |
| ### BubbleML 2.0 PoolBoil Subcooled: https://huggingface.co/polymathic-ai/walrus_ft_bubbleML_poolboil/tree/main |
| |
| |
| Additional checkpoints not included in the Walrus collection on HF can be found [here](https://users.flatironinstitute.org/~polymathic/data/walrus_project_checkpoints/) though the endpoint is a bit finicky. |
| |
| More finetuning checkpoints will continue to be added to HF over time. |