Troubleshooting¶

This page collects common failure modes and practical fixes. It is written to be copy-and-paste friendly: search this page for the error message you see. If you want a symptom-first entrypoint, start with Common Error Recipes and then return here for details.

Preflight checklist¶

Before a long run, verify:

You can run pdb2reaction -h and see the CLI help.
UMA can be downloaded (Hugging Face login/token is available on the machine).
For enzyme workflows: your input PDB(s) contain hydrogens and element symbols.
When you provide multiple PDBs: they have the same atoms in the same order (only coordinates differ).

Input / extraction problems¶

“Element symbols are missing … please run add-elem-info”¶

Typical message:

Element symbols are missing in '...'.
Please run `pdb2reaction add-elem-info -i...` to populate element columns before running extract.

Fix:

Run:

pdb2reaction add-elem-info -i input.pdb -o input_with_elem.pdb

Then re-run extract / all using the updated PDB.

Why it happens:

Many PDBs do not populate the element column consistently. extract requires element symbols for reliable atom typing.

“[multi] Atom count mismatch …” or “[multi] Atom order mismatch …”¶

Typical messages:

[multi] Atom count mismatch between input #1 and input #2:...
[multi] Atom order mismatch between input #1 and input #2.

Fix:

Regenerate all structures with the same preparation workflow (same protonation tool, same settings).
If you add hydrogens, do it in a way that produces consistent ordering across all frames.

Tip:

For ensembles generated by MD, prefer extracting frames from the same trajectory/topology rather than mixing PDBs produced by different tools.

“Active site model (binding pocket) is empty / missing important residues”¶

Symptoms:

The extracted active site model is unexpectedly small.
Key catalytic residues are missing.

Fixes to try:

Increase --radius (e.g., 2.6 → 3.5 Å).
Use --selected-resn to force-include residues (e.g., --selected-resn 'A:123,B:456'). See --selected-resn takes residue IDs, not names in CLI Conventions for the residue-ID requirement.
If backbone trimming is too aggressive, set --no-exclude-backbone.

Unreliable energies / barriers¶

Symptoms:

Calculated energies or reaction barriers seem unreasonable.
Results change significantly when the model size is increased.

Fix:

If the extracted active site model is too small, calculated energies and barriers may be unreliable. Increase the extraction radius (e.g., -r 4.0 or higher) to include more of the protein environment:
```
pdb2reaction extract -i complex.pdb -c 'SUB' -o model.pdb -r 4.0
```

Non-standard residues not truncated correctly¶

If the extracted active site model contains modified amino acid residues (e.g., phosphoserine, methylated lysine, D-amino acids) with non-standard three-letter codes, backbone truncation and link-hydrogen placement will not be applied to them by default. Use --modified-residue to register them:

pdb2reaction extract -i complex.pdb -c PRE --modified-residue "SEP,TPO,MLY" -o pocket.pdb

If --modified-residue is insufficient (e.g., the residue has an unusual backbone topology), construct the active site model manually and pass it directly to downstream commands (opt, tsopt, path-opt, etc.).

Charge / spin problems¶

“Charge is required …” (non-GJF inputs)¶

Many stages need a net charge when the input is not .gjf. If you omit -q/--charge, the workflow may attempt to derive charge from --ligand-charge/-l (PDB-only) or from a .gjf template.

Fix:

Provide charge and multiplicity explicitly:

pdb2reaction path-search -i R.pdb P.pdb -q 0 -m 1

Or (when using extraction) provide a residue-name mapping:

pdb2reaction -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3'

Installation / environment problems¶

UMA download/authentication errors¶

Symptoms:

Errors about missing Hugging Face authentication or being unable to download model weights.

Fix:

Log in once per environment/machine:
```
hf auth login
```
On HPC, ensure your home directory (or HF cache directory) is writable from compute nodes.

CUDA / PyTorch mismatch¶

Symptoms:

torch.cuda.is_available() is false even though you have a GPU.
CUDA runtime errors at import time.

Fixes:

Install a PyTorch build matching your cluster CUDA runtime.

Confirm GPU visibility:

nvidia-smi
python -c "import torch; print(torch.version.cuda, torch.cuda.is_available())"

DMF mode fails (cyipopt missing)¶

If you use DMF (--mep-mode dmf) and see errors importing IPOPT/cyipopt:

Fix:

Install cyipopt from conda-forge (recommended) before installing pdb2reaction:
```
conda install -c conda-forge cyipopt
```

Plot export fails (Chrome missing)¶

If figure export fails and you see Plotly/Chrome-related errors:

Fix:

Install a headless Chrome once:
```
plotly_get_chrome -y
```

Calculation / convergence problems¶

Optimization reaches `max_cycles` with `max(force)` slightly above the threshold¶

Symptoms:

The optimizer runs until max_cycles is hit, and the final summary shows that max(force) or rms(force) is just above the target (e.g. 4×10⁻⁴ au vs baker 3×10⁻⁴ au).
The energy, on the other hand, has clearly flattened and oscillates at the 10⁻⁵–10⁻⁴ au level.

Why it happens:

MLIP gradient/force evaluations carry a small stochastic noise floor (typically ~4×10⁻⁴ au for UMA-class models). This noise floor can exceed the force-based convergence criterion (baker = 3×10⁻⁴ au), so the force threshold can never be satisfied even though the geometry has already converged.

Fixes to try:

The energy plateau fallback (new in v0.3.5) should handle this automatically: opt.energy_plateau: true declares convergence when the energy range over the last opt.energy_plateau_window (default 50) steps falls below opt.energy_plateau_thresh (default 1×10⁻⁴ au ≈ 0.06 kcal/mol). No user action is required in most cases.
If you need to override the automatic fallback, loosen the force threshold manually: --thresh gau (the default for opt) or --thresh gau_loose.
You can also tune opt.energy_plateau_thresh / opt.energy_plateau_window from YAML, or disable the fallback with opt.energy_plateau: false.
Note: the plateau fallback is skipped for chain-of-states optimizers (path-opt, path-search string/GSM/DMF stages) because they store per-image energies rather than a single scalar energy trace.

TS optimization fails to converge¶

Symptoms:

TS optimization runs for many cycles without converging.
Multiple imaginary frequencies remain after optimization.

Fixes to try (CLI flags and YAML knobs are complementary — use both as needed):

Switch optimizer modes: --opt-mode grad (Dimer) or --opt-mode hess (RS-I-RFO).
Enable flattening of extra imaginary modes: --flatten (available on standalone tsopt, opt, and pdb2reaction all; default disabled).
Increase max cycles: --max-cycles 20000 (for standalone tsopt; --tsopt-max-cycles 20000 for all).
Use tighter convergence: --thresh baker or --thresh gau_tight.
Reduce step sizes / trust radii via YAML — for LBFGS/Dimer: lbfgs.max_step / hessian_dimer.max_step; for RFO/RS-I-RFO: rfo.trust_radius / rfo.trust_min / rfo.trust_max (and the rsirfo section). See YAML Reference for section layout.

IRC does not terminate properly¶

Symptoms:

IRC stops before reaching a clear minimum.
Energy oscillates or gradient remains high.

Fixes to try:

Reduce step size: --step-size 0.05 (default is 0.10 bohr, unweighted Cartesian).
Increase max cycles: --max-cycles 200.
Check if the TS candidate has only one imaginary frequency before running IRC. See Imaginary-mode thresholds: 5 cm⁻¹ vs 100 cm⁻¹ in the glossary for the 5 cm⁻¹ detection threshold vs 100 cm⁻¹ quality gate.

MEP search (GSM/DMF) fails or gives unexpected results¶

Symptoms:

Path search terminates with no valid MEP.
Bond changes are not detected correctly.

Fixes to try:

Increase --max-nodes above the default of 20 (e.g., 30 or 40) for complex reactions.
Enable endpoint pre-optimization: --preopt.
Try the alternative MEP method: --mep-mode dmf (if GSM fails) or vice versa.
Adjust bond detection parameters in YAML (bond.bond_factor, bond.delta_fraction).

Performance / stability tips¶

Out of memory (VRAM): reduce active site model size (--radius), reduce nodes (--max-nodes), or use lighter optimizer settings (--opt-mode grad).
Analytical Hessian is slow or causes OOM: keep the default FiniteDifference mode. Only use --hessian-calc-mode Analytical if you have ample VRAM (16 GB+ recommended for 500+ atom systems).
Workers > 1: improves UMA throughput on HPC, but disables the analytical Hessian evaluation.
Large systems (1000+ atoms): consider extracting a smaller active site model (--radius 2.5) or running on multi-GPU setups.

Choosing a backend¶

Informal per-step LBFGS inference cost on small-to-medium cluster models on an NVIDIA RTX 5080 (16 GB VRAM). The numbers below are order-of-magnitude guidance for backend selection, not a rigorous benchmark.

Backend	Speed (median s/step)	VRAM usage	Notes
UMA-s-1.1 (default)	0.03 s	~2 GB	Fast, good for exploration
UMA-m-1.1	0.22 s	~8 GB	Medium model, heavy VRAM
MACE-OMOL-0	0.37 s	~4 GB	Requires a separate env (`e3nn` conflict)
Orb-v3-omol	0.02 s	~2 GB	Fastest; see caveat below

Recommendations:

Start with UMA-s-1.1 for rapid screening, then cross-check key results with MACE or UMA-m-1.1.
For SAM-dependent S~N~2 / methyltransfer chemistries the MACE and UMA-s-1.1 backends are complementary; try both when one produces a suspect TS.
Orb-v3-omol often identifies the correct reaction coordinate but tends to converge transition states with extra small imaginary modes. Orb is therefore a good first-pass mechanism-recovery backend, but a clean single-imaginary-mode TS is not guaranteed — for quantitative kinetics or frequency analysis, re-score Orb geometries with UMA / MACE / DFT or switch backend.

GPU memory (VRAM) requirements¶

Approximate VRAM usage by system size:

Atoms	LBFGS opt	Hessian (analytical)	Hessian (finite diff)
50	~2 GB	~3 GB	~2 GB
100	~3 GB	~6 GB	~3 GB
200	~4 GB	~12 GB	~4 GB
500	~6 GB	OOM on 16 GB	~6 GB

If you encounter torch.cuda.OutOfMemoryError:

Use --hessian-calc-mode FiniteDifference (slower but lower VRAM)
Reduce cluster model size with a smaller --radius
Use a smaller model (calc.model: uma-s-1p1 in your YAML config instead of uma-m-1p1)

How to report an issue¶

When asking for help, include:

The exact command line you ran
summary.log (or console output)
The smallest input files that reproduce the problem (if possible)
Your environment: OS, Python, CUDA, PyTorch versions

Troubleshooting¶

Preflight checklist¶

Input / extraction problems¶

“Element symbols are missing … please run add-elem-info”¶

“[multi] Atom count mismatch …” or “[multi] Atom order mismatch …”¶

“Active site model (binding pocket) is empty / missing important residues”¶

Unreliable energies / barriers¶

Non-standard residues not truncated correctly¶

Charge / spin problems¶

“Charge is required …” (non-GJF inputs)¶

Installation / environment problems¶

UMA download/authentication errors¶

CUDA / PyTorch mismatch¶

DMF mode fails (cyipopt missing)¶

Plot export fails (Chrome missing)¶

Calculation / convergence problems¶

Optimization reaches max_cycles with max(force) slightly above the threshold¶

TS optimization fails to converge¶

IRC does not terminate properly¶

MEP search (GSM/DMF) fails or gives unexpected results¶

Performance / stability tips¶

Choosing a backend¶

GPU memory (VRAM) requirements¶

How to report an issue¶

Optimization reaches `max_cycles` with `max(force)` slightly above the threshold¶