Troubleshooting¶
This page collects common failure modes and practical fixes. It is written to be copy-and-paste friendly: search this page for the error message you see. If you want a symptom-first entrypoint, start with Common Error Recipes and then return here for details.
Preflight checklist¶
Before a long run, verify:
You can run
pdb2reaction -hand see the CLI help.UMA can be downloaded (Hugging Face login/token is available on the machine).
For enzyme workflows: your input PDB(s) contain hydrogens and element symbols.
When you provide multiple PDBs: they have the same atoms in the same order (only coordinates differ).
Input / extraction problems¶
“Element symbols are missing … please run add-elem-info”¶
Typical message:
Element symbols are missing in '...'.
Please run `pdb2reaction add-elem-info -i...` to populate element columns before running extract.
Fix:
Run:
pdb2reaction add-elem-info -i input.pdb -o input_with_elem.pdb
Then re-run
extract/allusing the updated PDB.
Why it happens:
Many PDBs do not populate the element column consistently.
extractrequires element symbols for reliable atom typing.
“[multi] Atom count mismatch …” or “[multi] Atom order mismatch …”¶
Typical messages:
[multi] Atom count mismatch between input #1 and input #2:...
[multi] Atom order mismatch between input #1 and input #2.
Fix:
Regenerate all structures with the same preparation workflow (same protonation tool, same settings).
If you add hydrogens, do it in a way that produces consistent ordering across all frames.
Tip:
For ensembles generated by MD, prefer extracting frames from the same trajectory/topology rather than mixing PDBs produced by different tools.
“My active site model (binding pocket) is empty / missing important residues”¶
Symptoms:
The extracted active site model is unexpectedly small.
Key catalytic residues are missing.
Fixes to try:
Increase
--radius(e.g., 2.6 → 3.5 Å).Use
--selected-resnto force-include residues (e.g.,--selected-resn 'A:123,B:456'). See --selected-resn takes residue IDs, not names in CLI Conventions for the residue-ID requirement.If backbone trimming is too aggressive, set
--no-exclude-backbone.
Unreliable energies / barriers¶
Symptoms:
Calculated energies or reaction barriers seem unreasonable.
Results change significantly when the model size is increased.
Fix:
If the extracted active site model is too small, calculated energies and barriers may be unreliable. Increase the extraction radius (e.g.,
-r 4.0or higher) to include more of the protein environment:pdb2reaction extract -i complex.pdb -c 'SUB' -o model.pdb -r 4.0
Non-standard residues not truncated correctly¶
If the extracted active site model contains modified amino acid residues (e.g., phosphoserine, methylated lysine, D-amino acids) with non-standard three-letter codes, backbone truncation and link-hydrogen placement will not be applied to them by default. Use --modified-residue to register them:
pdb2reaction extract -i complex.pdb -c PRE --modified-residue "SEP,TPO,MLY" -o pocket.pdb
If --modified-residue is insufficient (e.g., the residue has an unusual backbone topology), construct the active site model manually and pass it directly to downstream commands (opt, tsopt, path-opt, etc.).
Charge / spin problems¶
“Charge is required …” (non-GJF inputs)¶
Many stages need a net charge when the input is not .gjf. If you omit -q/--charge, the workflow may attempt to derive charge from --ligand-charge/-l (PDB-only) or from a .gjf template.
Fix:
Provide charge and multiplicity explicitly:
pdb2reaction path-search -i R.pdb P.pdb -q 0 -m 1
Or (when using extraction) provide a residue-name mapping:
pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3'
Installation / environment problems¶
UMA download/authentication errors¶
Symptoms:
Errors about missing Hugging Face authentication or being unable to download model weights.
Fix:
Log in once per environment/machine:
hf auth login
On HPC, ensure your home directory (or HF cache directory) is writable from compute nodes.
CUDA / PyTorch mismatch¶
Symptoms:
torch.cuda.is_available()is false even though you have a GPU.CUDA runtime errors at import time.
Fixes:
Install a PyTorch build matching your cluster CUDA runtime.
Confirm GPU visibility:
nvidia-smi python -c "import torch; print(torch.version.cuda, torch.cuda.is_available())"
DMF mode fails (cyipopt missing)¶
If you use DMF (--mep-mode dmf) and see errors importing IPOPT/cyipopt:
Fix:
Install
cyipoptfrom conda-forge (recommended) before installingpdb2reaction:conda install -c conda-forge cyipopt
Plot export fails (Chrome missing)¶
If figure export fails and you see Plotly/Chrome-related errors:
Fix:
Install a headless Chrome once:
plotly_get_chrome -y
Calculation / convergence problems¶
Optimization reaches max_cycles with max(force) slightly above the threshold¶
Symptoms:
The optimizer runs until
max_cyclesis hit, and the final summary shows thatmax(force)orrms(force)is just above the target (e.g. 4×10⁻⁴ au vsbaker3×10⁻⁴ au).The energy, on the other hand, has clearly flattened and oscillates at the 10⁻⁵–10⁻⁴ au level.
Why it happens:
MLIP gradient/force evaluations carry a small stochastic noise floor (typically ~4×10⁻⁴ au for UMA-class models). This noise floor can exceed the force-based convergence criterion (
baker= 3×10⁻⁴ au), so the force threshold can never be satisfied even though the geometry has already converged.
Fixes to try:
The energy plateau fallback (new in v0.3.5) should handle this automatically:
opt.energy_plateau: truedeclares convergence when the energy range over the lastopt.energy_plateau_window(default 50) steps falls belowopt.energy_plateau_thresh(default1×10⁻⁴ au ≈ 0.06 kcal/mol). No user action is required in most cases.If you need to override the automatic fallback, loosen the force threshold manually:
--thresh gau(the default foropt) or--thresh gau_loose.You can also tune
opt.energy_plateau_thresh/opt.energy_plateau_windowfrom YAML, or disable the fallback withopt.energy_plateau: false.Note: the plateau fallback is skipped for chain-of-states optimizers (
path-opt,path-searchstring/GSM/DMF stages) because they store per-image energies rather than a single scalar energy trace.
TS optimization fails to converge¶
Symptoms:
TS optimization runs for many cycles without converging.
Multiple imaginary frequencies remain after optimization.
Fixes to try (CLI flags and YAML knobs are complementary — use both as needed):
Switch optimizer modes:
--opt-mode grad(Dimer) or--opt-mode hess(RS-I-RFO).Enable flattening of extra imaginary modes:
--flatten(available on standalonetsopt,opt, andpdb2reaction all; default disabled).Increase max cycles:
--max-cycles 20000(for standalonetsopt;--tsopt-max-cycles 20000forall).Use tighter convergence:
--thresh bakeror--thresh gau_tight.Reduce step sizes / trust radii via YAML — for LBFGS/Dimer:
lbfgs.max_step/hessian_dimer.max_step; for RFO/RS-I-RFO:rfo.trust_radius/rfo.trust_min/rfo.trust_max(and thersirfosection). See YAML Reference for section layout.
IRC does not terminate properly¶
Symptoms:
IRC stops before reaching a clear minimum.
Energy oscillates or gradient remains high.
Fixes to try:
Reduce step size:
--step-size 0.05(default is 0.10 bohr, unweighted Cartesian).Increase max cycles:
--max-cycles 200.Check if the TS candidate has only one imaginary frequency before running IRC. See Imaginary-mode thresholds: 5 cm⁻¹ vs 100 cm⁻¹ in the glossary for the 5 cm⁻¹ detection threshold vs 100 cm⁻¹ quality gate.
MEP search (GSM/DMF) fails or gives unexpected results¶
Symptoms:
Path search terminates with no valid MEP.
Bond changes are not detected correctly.
Fixes to try:
Increase
--max-nodesabove the default of 20 (e.g., 30 or 40) for complex reactions.Enable endpoint pre-optimization:
--preopt.Try the alternative MEP method:
--mep-mode dmf(if GSM fails) or vice versa.Adjust bond detection parameters in YAML (
bond.bond_factor,bond.delta_fraction).
Performance / stability tips¶
Out of memory (VRAM): reduce active site model size (
--radius), reduce nodes (--max-nodes), or use lighter optimizer settings (--opt-mode grad).Analytical Hessian is slow or causes OOM: keep the default
FiniteDifferencemode. Only use--hessian-calc-mode Analyticalif you have ample VRAM (16 GB+ recommended for 500+ atom systems).Workers > 1: improves UMA throughput on HPC, but disables the analytical Hessian evaluation.
Large systems (1000+ atoms): consider extracting a smaller active site model (
--radius 2.5) or running on multi-GPU setups.
Choosing a backend¶
Informal per-step LBFGS inference cost on small-to-medium cluster models on an NVIDIA RTX 5080 (16 GB VRAM). The numbers below are order-of-magnitude guidance for backend selection, not a rigorous benchmark.
Backend |
Speed (median s/step) |
VRAM usage |
Notes |
|---|---|---|---|
UMA-s-1.1 (default) |
0.03 s |
~2 GB |
Fast, good for exploration |
UMA-m-1.1 |
0.22 s |
~8 GB |
Medium model, heavy VRAM |
MACE-OMOL-0 |
0.37 s |
~4 GB |
Requires a separate env ( |
Orb-v3-omol |
0.02 s |
~2 GB |
Fastest; see caveat below |
Recommendations:
Start with UMA-s-1.1 for rapid screening, then cross-check key results with MACE or UMA-m-1.1.
For SAM-dependent S~N~2 / methyltransfer chemistries the MACE and UMA-s-1.1 backends are complementary; try both when one produces a suspect TS.
Orb-v3-omol often identifies the correct reaction coordinate but tends to converge transition states with extra small imaginary modes. Orb is therefore a good first-pass mechanism-recovery backend, but a clean single-imaginary-mode TS is not guaranteed — for quantitative kinetics or frequency analysis, re-score Orb geometries with UMA / MACE / DFT or switch backend.
GPU memory (VRAM) requirements¶
Approximate VRAM usage by system size:
Atoms |
LBFGS opt |
Hessian (analytical) |
Hessian (finite diff) |
|---|---|---|---|
50 |
~2 GB |
~3 GB |
~2 GB |
100 |
~3 GB |
~6 GB |
~3 GB |
200 |
~4 GB |
~12 GB |
~4 GB |
500 |
~6 GB |
OOM on 16 GB |
~6 GB |
If you encounter torch.cuda.OutOfMemoryError:
Use
--hessian-calc-mode FiniteDifference(slower but lower VRAM)Reduce cluster model size with a smaller
--radiusUse a smaller model (
calc.model: uma-s-1p1in your YAML config instead ofuma-m-1p1)
How to report an issue¶
When asking for help, include:
The exact command line you ran
summary.log(or console output)The smallest input files that reproduce the problem (if possible)
Your environment: OS, Python, CUDA, PyTorch versions