Troubleshooting

This page collects common failure modes and practical fixes. It is written to be copy-and-paste friendly: search this page for the error message you see. If you want a symptom-first entrypoint, start with Common Error Recipes and then return here for details.


Preflight checklist

Before a long run, verify:

  • You can run pdb2reaction -h and see the CLI help.

  • UMA can be downloaded (Hugging Face login/token is available on the machine).

  • For enzyme workflows: your input PDB(s) contain hydrogens and element symbols.

  • When you provide multiple PDBs: they have the same atoms in the same order (only coordinates differ).


Input / extraction problems

“Element symbols are missing … please run add-elem-info”

Typical message:

Element symbols are missing in '...'.
Please run `pdb2reaction add-elem-info -i...` to populate element columns before running extract.

Fix:

  • Run:

    pdb2reaction add-elem-info -i input.pdb -o input_with_elem.pdb
    
  • Then re-run extract / all using the updated PDB.

Why it happens:

  • Many PDBs do not populate the element column consistently. extract requires element symbols for reliable atom typing.


“[multi] Atom count mismatch …” or “[multi] Atom order mismatch …”

Typical messages:

[multi] Atom count mismatch between input #1 and input #2:...
[multi] Atom order mismatch between input #1 and input #2.

Fix:

  • Regenerate all structures with the same preparation workflow (same protonation tool, same settings).

  • If you add hydrogens, do it in a way that produces consistent ordering across all frames.

Tip:

  • For ensembles generated by MD, prefer extracting frames from the same trajectory/topology rather than mixing PDBs produced by different tools.


“My active site model (binding pocket) is empty / missing important residues”

Symptoms:

  • The extracted active site model is unexpectedly small.

  • Key catalytic residues are missing.

Fixes to try:

  • Increase --radius (e.g., 2.6 → 3.5 Å).

  • Use --selected-resn to force-include residues (e.g., --selected-resn 'A:123,B:456'). See --selected-resn takes residue IDs, not names in CLI Conventions for the residue-ID requirement.

  • If backbone trimming is too aggressive, set --no-exclude-backbone.


Unreliable energies / barriers

Symptoms:

  • Calculated energies or reaction barriers seem unreasonable.

  • Results change significantly when the model size is increased.

Fix:

  • If the extracted active site model is too small, calculated energies and barriers may be unreliable. Increase the extraction radius (e.g., -r 4.0 or higher) to include more of the protein environment:

    pdb2reaction extract -i complex.pdb -c 'SUB' -o model.pdb -r 4.0
    

Non-standard residues not truncated correctly

If the extracted active site model contains modified amino acid residues (e.g., phosphoserine, methylated lysine, D-amino acids) with non-standard three-letter codes, backbone truncation and link-hydrogen placement will not be applied to them by default. Use --modified-residue to register them:

pdb2reaction extract -i complex.pdb -c PRE --modified-residue "SEP,TPO,MLY" -o pocket.pdb

If --modified-residue is insufficient (e.g., the residue has an unusual backbone topology), construct the active site model manually and pass it directly to downstream commands (opt, tsopt, path-opt, etc.).


Charge / spin problems

“Charge is required …” (non-GJF inputs)

Many stages need a net charge when the input is not .gjf. If you omit -q/--charge, the workflow may attempt to derive charge from --ligand-charge/-l (PDB-only) or from a .gjf template.

Fix:

  • Provide charge and multiplicity explicitly:

    pdb2reaction path-search -i R.pdb P.pdb -q 0 -m 1
    
  • Or (when using extraction) provide a residue-name mapping:

    pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3'
    

Installation / environment problems

UMA download/authentication errors

Symptoms:

  • Errors about missing Hugging Face authentication or being unable to download model weights.

Fix:

  • Log in once per environment/machine:

    hf auth login
    
  • On HPC, ensure your home directory (or HF cache directory) is writable from compute nodes.


CUDA / PyTorch mismatch

Symptoms:

  • torch.cuda.is_available() is false even though you have a GPU.

  • CUDA runtime errors at import time.

Fixes:

  • Install a PyTorch build matching your cluster CUDA runtime.

  • Confirm GPU visibility:

    nvidia-smi
    python -c "import torch; print(torch.version.cuda, torch.cuda.is_available())"
    

DMF mode fails (cyipopt missing)

If you use DMF (--mep-mode dmf) and see errors importing IPOPT/cyipopt:

Fix:

  • Install cyipopt from conda-forge (recommended) before installing pdb2reaction:

    conda install -c conda-forge cyipopt
    

Plot export fails (Chrome missing)

If figure export fails and you see Plotly/Chrome-related errors:

Fix:

  • Install a headless Chrome once:

    plotly_get_chrome -y
    

Calculation / convergence problems

Optimization reaches max_cycles with max(force) slightly above the threshold

Symptoms:

  • The optimizer runs until max_cycles is hit, and the final summary shows that max(force) or rms(force) is just above the target (e.g. 4×10⁻⁴ au vs baker 3×10⁻⁴ au).

  • The energy, on the other hand, has clearly flattened and oscillates at the 10⁻⁵–10⁻⁴ au level.

Why it happens:

  • MLIP gradient/force evaluations carry a small stochastic noise floor (typically ~4×10⁻⁴ au for UMA-class models). This noise floor can exceed the force-based convergence criterion (baker = 3×10⁻⁴ au), so the force threshold can never be satisfied even though the geometry has already converged.

Fixes to try:

  • The energy plateau fallback (new in v0.3.5) should handle this automatically: opt.energy_plateau: true declares convergence when the energy range over the last opt.energy_plateau_window (default 50) steps falls below opt.energy_plateau_thresh (default 1×10⁻⁴ au 0.06 kcal/mol). No user action is required in most cases.

  • If you need to override the automatic fallback, loosen the force threshold manually: --thresh gau (the default for opt) or --thresh gau_loose.

  • You can also tune opt.energy_plateau_thresh / opt.energy_plateau_window from YAML, or disable the fallback with opt.energy_plateau: false.

  • Note: the plateau fallback is skipped for chain-of-states optimizers (path-opt, path-search string/GSM/DMF stages) because they store per-image energies rather than a single scalar energy trace.


TS optimization fails to converge

Symptoms:

  • TS optimization runs for many cycles without converging.

  • Multiple imaginary frequencies remain after optimization.

Fixes to try (CLI flags and YAML knobs are complementary — use both as needed):

  • Switch optimizer modes: --opt-mode grad (Dimer) or --opt-mode hess (RS-I-RFO).

  • Enable flattening of extra imaginary modes: --flatten (available on standalone tsopt, opt, and pdb2reaction all; default disabled).

  • Increase max cycles: --max-cycles 20000 (for standalone tsopt; --tsopt-max-cycles 20000 for all).

  • Use tighter convergence: --thresh baker or --thresh gau_tight.

  • Reduce step sizes / trust radii via YAML — for LBFGS/Dimer: lbfgs.max_step / hessian_dimer.max_step; for RFO/RS-I-RFO: rfo.trust_radius / rfo.trust_min / rfo.trust_max (and the rsirfo section). See YAML Reference for section layout.


IRC does not terminate properly

Symptoms:

  • IRC stops before reaching a clear minimum.

  • Energy oscillates or gradient remains high.

Fixes to try:

  • Reduce step size: --step-size 0.05 (default is 0.10 bohr, unweighted Cartesian).

  • Increase max cycles: --max-cycles 200.

  • Check if the TS candidate has only one imaginary frequency before running IRC. See Imaginary-mode thresholds: 5 cm⁻¹ vs 100 cm⁻¹ in the glossary for the 5 cm⁻¹ detection threshold vs 100 cm⁻¹ quality gate.


MEP search (GSM/DMF) fails or gives unexpected results

Symptoms:

  • Path search terminates with no valid MEP.

  • Bond changes are not detected correctly.

Fixes to try:

  • Increase --max-nodes above the default of 20 (e.g., 30 or 40) for complex reactions.

  • Enable endpoint pre-optimization: --preopt.

  • Try the alternative MEP method: --mep-mode dmf (if GSM fails) or vice versa.

  • Adjust bond detection parameters in YAML (bond.bond_factor, bond.delta_fraction).


Performance / stability tips

  • Out of memory (VRAM): reduce active site model size (--radius), reduce nodes (--max-nodes), or use lighter optimizer settings (--opt-mode grad).

  • Analytical Hessian is slow or causes OOM: keep the default FiniteDifference mode. Only use --hessian-calc-mode Analytical if you have ample VRAM (16 GB+ recommended for 500+ atom systems).

  • Workers > 1: improves UMA throughput on HPC, but disables the analytical Hessian evaluation.

  • Large systems (1000+ atoms): consider extracting a smaller active site model (--radius 2.5) or running on multi-GPU setups.


Choosing a backend

Informal per-step LBFGS inference cost on small-to-medium cluster models on an NVIDIA RTX 5080 (16 GB VRAM). The numbers below are order-of-magnitude guidance for backend selection, not a rigorous benchmark.

Backend

Speed (median s/step)

VRAM usage

Notes

UMA-s-1.1 (default)

0.03 s

~2 GB

Fast, good for exploration

UMA-m-1.1

0.22 s

~8 GB

Medium model, heavy VRAM

MACE-OMOL-0

0.37 s

~4 GB

Requires a separate env (e3nn conflict)

Orb-v3-omol

0.02 s

~2 GB

Fastest; see caveat below

Recommendations:

  • Start with UMA-s-1.1 for rapid screening, then cross-check key results with MACE or UMA-m-1.1.

  • For SAM-dependent S~N~2 / methyltransfer chemistries the MACE and UMA-s-1.1 backends are complementary; try both when one produces a suspect TS.

  • Orb-v3-omol often identifies the correct reaction coordinate but tends to converge transition states with extra small imaginary modes. Orb is therefore a good first-pass mechanism-recovery backend, but a clean single-imaginary-mode TS is not guaranteed — for quantitative kinetics or frequency analysis, re-score Orb geometries with UMA / MACE / DFT or switch backend.

GPU memory (VRAM) requirements

Approximate VRAM usage by system size:

Atoms

LBFGS opt

Hessian (analytical)

Hessian (finite diff)

50

~2 GB

~3 GB

~2 GB

100

~3 GB

~6 GB

~3 GB

200

~4 GB

~12 GB

~4 GB

500

~6 GB

OOM on 16 GB

~6 GB

If you encounter torch.cuda.OutOfMemoryError:

  • Use --hessian-calc-mode FiniteDifference (slower but lower VRAM)

  • Reduce cluster model size with a smaller --radius

  • Use a smaller model (calc.model: uma-s-1p1 in your YAML config instead of uma-m-1p1)

How to report an issue

When asking for help, include:

  • The exact command line you ran

  • summary.log (or console output)

  • The smallest input files that reproduce the problem (if possible)

  • Your environment: OS, Python, CUDA, PyTorch versions