Reproducibility and determinism¶
MLIP inference on a GPU is not bit-reproducible by default: parallel
reductions (atomic adds, scatter operations) accumulate in a
hardware-scheduling-dependent order, so two runs with identical inputs differ
at the floating-point ULP level. For pdb2reaction the practical size of this
drift is ~1e-7 Å in coordinates and below 1e-7 a.u. in energies — far below
any chemically meaningful threshold. Results are scientifically reproducible;
they are not bit-identical.
If you need bit-identical output (e.g. golden-file regression tests, exact
re-runs for an audit), use the --deterministic flag.
--deterministic¶
--deterministic is accepted by every compute subcommand
(opt, tsopt, freq, irc, scan, scan2d, scan3d, path-opt,
path-search, all, sp). It turns on torch.use_deterministic_algorithms
plus an index_reduce_ shim so that the GPU run is bit-reproducible.
pdb2reaction opt -i input.pdb -q 0 --deterministic
pdb2reaction all -i r.pdb p.pdb -q -1 --tsopt True --deterministic
It is process-global: setting it on
allpropagates to every internal stage; you do not pass it per stage.It is slower: deterministic scatter/reduce kernels have lower throughput than the default ones. Use it only when you need exact reproducibility.
It fails loudly: if the current PyTorch build cannot provide a deterministic kernel for an operation in your run, the command raises rather than silently producing non-reproducible output.
The environment variable
PDB2REACTION_STRICT_DETERMINISTIC=1is the equivalent entry point for CI or the direct Python API (create_calculator).
Verified behavior by backend¶
Backend |
|
|---|---|
|
bit-identical energy and forces |
|
bit-identical energy and forces |
|
bit-identical energy and forces |
|
not supported — rejected (see below) |
Precision and reproducibility¶
Running in --precision fp64 reduces the default drift (roughly halving the
number of differing trajectory files in a full all pipeline) but does not
make a GPU run bit-identical — the reduction-order non-determinism is
independent of precision. Only --deterministic gives bit-exactness.
--precision fp64 and the (internal, always-on) fp64 Hessian are independent
knobs; passing --precision fp64 additionally forces the Hessian to fp64 so the
optimizer linear algebra cannot silently run in a lower precision than the model.
Choosing precision by GPU class¶
--precision selects the floating-point precision of MLIP inference
(fp32 | fp64, case-insensitive; default fp32, the screening-speed
baseline). It is backend-agnostic — the CLI routes the value into each backend’s
native key (UMA precision, ORB precision, MACE default_dtype; for
aimnet2, fp32 is a no-op and fp64 is rejected because its model inputs are
cast to float32 upstream). Which value to pick depends on the GPU class:
GPU class |
Recommended |
Why |
|---|---|---|
HPC datacenter (H100 / H200 / A100) |
|
Near-deterministic, low numerical-noise TS optimization and Hessians; the fp64 throughput cost is small on these cards. |
Consumer (RTX 50xx / 40xx) |
|
fp64 is substantially slower here; fp32 is the speed / screening baseline. |
fp64 has a non-trivial effect on TS optimization and Hessians for the
OMol-trained UMA backend, so use it for final / production numbers — not only
for screening. For bit-identical reruns, combine it with --deterministic.
AIMNet2 limitations¶
AIMNet2 does not support these features:
--precision fp64— AIMNet2’s model inputs are cast to float32 upstream, so an “fp64” run would not actually be fp64.--deterministic— AIMNet2 computes forces via a custom CUDA kernel that lies outsidetorch.use_deterministic_algorithmscontrol, so its forces are not bit-reproducible (energy is). PyTorch’s deterministic mode neither detects nor controls the custom op, so the limitation is reported explicitly.
For bit-reproducible runs use uma, orb, or mace.