Troubleshooting¶

Search this page for the error message you encounter. For a symptom-first index see Common Error Recipes.

Preflight checklist¶

Before a long run, verify:

mlmm -h runs and shows the CLI help.
UMA weights can be downloaded (Hugging Face login is set up — see Getting Started › Installation).
Your input PDB(s) contain hydrogens and element symbols.
When you pass multiple PDBs, they share the same atoms in the same order.
tleap is on $PATH (required by mm-parm).
The hessian_ff C++ native extension built successfully (rebuild with cd hessian_ff/native && make if the auto-build failed).

Input / extraction¶

Symptom	Cause	Fix
Missing element columns (cols 77–78)	Some design tools leave the column blank; `extract` needs element symbols for atom typing. `mlmm all` auto-runs `add-elem-info` as preflight.	`mlmm add-elem-info -i input.pdb -o input_with_elem.pdb`, then re-run.
`[multi] Atom count mismatch` / `[multi] Atom order mismatch`	Inputs prepared by different tools / settings.	Regenerate all structures with the same protonation tool + settings. For MD ensembles, extract frames from the same trajectory + topology. As a fallback, switch to the staged-scan workflow (one PDB + `--scan-lists`).
Pocket too small / catalytic residues missing	Default radius too small for the system.	Increase `--radius` (e.g. 2.6 → 3.5 Å); force-include residues with `--selected-resn 'A:123,B:456'`; or hand-craft an ML-region PDB in PyMOL and pass via `--model-pdb`.
Unreliable energies / barriers shifting with model size	Extracted pocket too small.	Increase `-r` (e.g. `mlmm extract -i complex.pdb -c 'SUB' -o pocket.pdb -r 4.0`).
Non-standard residues not truncated correctly (SEP / TPO / MLY / D-amino acids)	Backbone truncation + link-H placement only apply to known three-letter codes.	`--modified-residue "SEP,TPO,MLY"` (also accepted on `mlmm all`). If insufficient (unusual backbone topology), build the pocket manually and pass `--parm` + `--model-pdb` to downstream subcommands directly.

Charge / spin¶

Per-stage calc subcommands require explicit -q/--charge (ML-region net charge) and -m/--multiplicity. In mlmm all, charge is resolved in order: -q override → extraction summary → --ligand-charge fallback (when extraction is skipped).

mlmm path-search -i R.pdb P.pdb --parm real.parm7 --model-pdb model.pdb -q 0 -m 1
mlmm all -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3'

AmberTools / `mm-parm`¶

Symptom	Fix
`AmberTools preflight failed. Missing required command(s): tleap, antechamber, parmchk2`	`conda install -c conda-forge ambertools -y` (or `module load ambertools` on HPC, or build from source: https://ambermd.org/AmberTools.php). Verify with `which tleap`. Without AmberTools you can still run `opt` / `tsopt` / `path-search` if you supply `--parm` manually.
`antechamber` fails for a ligand	Check ligand element symbols + connectivity + TER records. Specify `-l 'LIG:-1'` and (for non-singlet) `--ligand-mult 'HEM:1,NO:2'`. Inspect `<resname>.antechamber.log` via `--keep-temp`. Try manually: `antechamber -i ligand.pdb -fi pdb -o ligand.mol2 -fo mol2 -c bcc -nc -3 -at gaff2`. For higher-accuracy partial charges, generate RESP from HF/6-31G* and pass custom `frcmod` / `lib`.
`Atom count in parm7 does not match input PDB` / `parm7 topology does not match the input structure` / `Coordinate shape mismatch for... got (N,3), expected (M,3)`	Re-run `mm-parm` from the current PDB; use its output `<prefix>.pdb` for downstream subcommands (tleap may add / remove hydrogens). Never reorder PDB atoms after `mm-parm`.
`oniom-export` reports `Element sequence mismatch at atom index ...`	Use the same PDB for `-i` that was used to generate the parm7. As an escape hatch, `--no-element-check` disables the check (verify results manually).

`hessian_ff` build / import¶

ImportError: cannot import name 'ForceFieldTorch' from 'hessian_ff'
RuntimeError: hessian_ff build attempts failed: ...

The C++ native extension is JIT-built on first use, in a local temp directory (override with TORCH_EXTENSIONS_DIR; a network-mounted build dir — NFS/Lustre — can hang on torch’s build lock, so a local path is used by default). If that fails: ensure g++ >= 9 (g++ --version; on conda, conda install -c conda-forge gxx_linux-64), that PyTorch headers are available (python -c "import torch; print(torch.utils.cmake_prefix_path)"), and that ninja is installed. On HPC: module load gcc/11. Then clean + rebuild:

conda install -c conda-forge ninja -y
cd $(python -c "import hessian_ff; print(hessian_ff.__path__[0])")/native && make clean && make

Also ensure hessian_ff is importable at all (it is if you installed mlmm-toolkit with pip install -e .).

B-factor layer assignment¶

Encoding: ML = 0.0, Movable-MM = 10.0, Frozen-MM = 20.0 (tolerance ±1.0). Common symptoms:

Wrong layer assignments / ML region too small or too large — verify --model-pdb selects the intended atoms; adjust --radius-freeze (default 8.0 Å) for the Movable / Frozen boundary; control Hessian-target MM separately via hess_cutoff / hess_mm_atoms. Inspect the layered PDB visually (color by B-factor).
B-factors not recognized (calculator treats all atoms as one layer) — re-run define-layer; do not hand-edit B-factors to arbitrary values.
--detect-layer produces unexpected splits or fails without --model-pdb — supply a PDB input (or XYZ + --ref-pdb); re-run define-layer explicitly; for distance-based control, set hess_cutoff / movable_cutoff and use --no-detect-layer (supplying --movable-cutoff already disables --detect-layer).

Installation / environment¶

Symptom	Fix
MLIP model download fails / HF auth missing (`huggingface_hub.errors.GatedRepoError`, `401`, `403`)	`hf auth login` once per env / machine; accept the model license on the HF page. On HPC, ensure HF cache dir is writable from compute nodes.
`torch.cuda.is_available()` returns `False`	Install PyTorch matching your cluster CUDA runtime; verify `nvidia-smi` and `python -c "import torch; print(torch.version.cuda, torch.cuda.is_available())"`.
`[orb]` install fails building torch_scatter (`No module named 'torch'`)	torch_scatter ships no PyPI binary wheel (only an sdist) → source-build fails under PEP517 build isolation. Install from PyG’s prebuilt-wheel index matching your torch+CUDA tag: `pip install -e ".[orb]" -f https://data.pyg.org/whl/torch-2.8.0+cu129.html`. Fallback (CUDA toolchain present): `pip install torch_scatter --no-build-isolation`.
`DMF mode requires ase, cyipopt, and pydmf to be installed.`	`conda install -c conda-forge ase cyipopt -y && pip install 'pydmf>=1.2'`. (`pydmf>=1.2` ships the `dmf.torch` backend used by the default `--dmf-backend gpu`.)
Plot export fails (Plotly / Chrome)	`plotly_get_chrome -y`.

DMF mode fails (cyipopt / pydmf / ase missing)¶

See the DMF mode requires ase, cyipopt, and pydmf to be installed. row above.

CUDA / PyTorch mismatch¶

See the torch.cuda.is_available() row above.

Plot export fails (Chrome missing)¶

See the Plot export fails (Plotly / Chrome) row above.

Calculation / convergence¶

CUDA OOM (`torch.cuda.OutOfMemoryError`)¶

ML/MM systems are larger than pure MLIP, so VRAM pressure is higher. Try in order:

Verify Frozen-MM — define-layer should put distal atoms at B=20.0. If the Frozen region is too small, the Movable-MM region (and its Hessian) inflates. Decrease --radius-freeze to expand Frozen.
Shrink ML region — smaller --radius in extract, or hand-craft a smaller --model-pdb.
--hessian-calc-mode FiniteDifference — slower but lower peak VRAM.
Pre-define layers with define-layer and use_bfactor_layers: true in YAML.
Bigger GPU — 24 GB+ for 500+ ML atoms; 48 GB+ for 1000+.

TS optimization does not converge / multiple imaginary modes remain¶

Try --opt-mode grad (Dimer) ↔ --opt-mode hess (RS-I-RFO); --flatten to flatten extra imaginary modes; --max-cycles 20000; tighter --thresh baker / gau_tight; expand Hessian-target atoms via hess_cutoff.

Optimizer “stalls” with flat energy + forces just above threshold (MLIP force noise floor)¶

MLIPs have finite numerical precision. For large ML/MM systems the noise floor can exceed the gau / baker gradient thresholds, so forces never drop further even though the geometry is stationary. This is handled automatically via energy_plateau: true (declares convergence when the 50-step energy range falls below 1.0e-4 au ≈ 0.06 kcal/mol). To tighten or disable:

opt:
  energy_plateau_thresh: 1.0e-05   # stricter (au)
  energy_plateau_window: 100        # require a longer flat stretch
  # energy_plateau: false           # disable entirely (e.g. for benchmarking)

The plateau check is skipped automatically for chain-of-states optimizers (GS / DMF), so path-opt / path-search are unaffected.

IRC does not terminate properly¶

Reduce --step-size 0.05 (default 0.10); raise --max-cycles 200; verify the TS candidate has exactly one imaginary frequency before launching IRC.

MEP search (GSM / DMF) fails or misses bonds¶

Raise --max-nodes (e.g. 15–20) for complex reactions; enable --preopt; try the alternate method (--mep-mode dmf ↔ gsm); tune bond-detection in YAML (bond.bond_factor, bond.delta_fraction).

Performance / stability tips¶

OOM — shrink ML region, shrink Hessian-target MM, lower --max-nodes, or use --opt-mode grad.
Analytical ML Hessian is fastest when VRAM allows (24 GB+ recommended for 300+ ML atoms); else FiniteDifference.
MM Hessian — default mm_fd: true (finite-difference) trades speed for memory; mm_fd: false is faster on small systems but heavier on memory. Cap MM atom count with hess_cutoff.
Large systems (2000+ atoms) — make sure the Frozen layer is generous (define-layer with appropriate cutoffs) to keep the movable DOF count down.
Multi-GPU — ML on one device (ml_cuda_idx: 0), MM on another (mm_device: cuda, mm_cuda_idx: 1).
ML/MM parallelism — ML (GPU) and MM (CPU) run in parallel by default; tune CPU threads with mm_threads.

Backend-specific¶

Symptom	Fix
`ImportError: orb-models is required for the ORB backend` (or similar for AIMNet2 / MACE)	`pip install "mlmm-toolkit[orb]"` / `"[aimnet]"` / `pip install --no-deps mace-torch` (MACE in a dedicated env).
CUDA OOM on ORB / MACE / AIMNet2	These backends use FD Hessians (more VRAM). Lower `hess_cutoff`, or set `ml_device: cpu` (slow but avoids the VRAM limit).
`XTBEmbedError: xTB command not found`	`conda install -c conda-forge xtb -y` and ensure `xtb` is on `$PATH`. Custom binary: set `xtb_cmd` in YAML.

How to report an issue¶

Include: the exact command line, summary.log (or console output), the smallest reproducing inputs, your env (OS / Python / CUDA / PyTorch), and whether AmberTools + hessian_ff are properly installed.