Troubleshooting¶
Search this page for the error message you encounter. For a symptom-first index see Common Error Recipes.
Preflight checklist¶
Before a long run, verify:
mlmm -hruns and shows the CLI help.UMA weights can be downloaded (Hugging Face login is set up — see Getting Started › Installation).
Your input PDB(s) contain hydrogens and element symbols.
When you pass multiple PDBs, they share the same atoms in the same order.
tleapis on$PATH(required bymm-parm).The
hessian_ffC++ native extension built successfully (rebuild withcd hessian_ff/native && makeif the auto-build failed).
Input / extraction¶
Symptom |
Cause |
Fix |
|---|---|---|
Missing element columns (cols 77–78) |
Some design tools leave the column blank; |
|
|
Inputs prepared by different tools / settings. |
Regenerate all structures with the same protonation tool + settings. For MD ensembles, extract frames from the same trajectory + topology. As a fallback, switch to the staged-scan workflow (one PDB + |
Pocket too small / catalytic residues missing |
Default radius too small for the system. |
Increase |
Unreliable energies / barriers shifting with model size |
Extracted pocket too small. |
Increase |
Non-standard residues not truncated correctly (SEP / TPO / MLY / D-amino acids) |
Backbone truncation + link-H placement only apply to known three-letter codes. |
|
Charge / spin¶
Per-stage calc subcommands require explicit -q/--charge (ML-region net charge) and -m/--multiplicity. In mlmm all, charge is resolved in order: -q override → extraction summary → --ligand-charge fallback (when extraction is skipped).
mlmm path-search -i R.pdb P.pdb --parm real.parm7 --model-pdb model.pdb -q 0 -m 1
mlmm all -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3'
AmberTools / mm-parm¶
Symptom |
Fix |
|---|---|
|
|
|
Check ligand element symbols + connectivity + TER records. Specify |
|
Re-run |
|
Use the same PDB for |
hessian_ff build / import¶
ImportError: cannot import name 'ForceFieldTorch' from 'hessian_ff'
RuntimeError: hessian_ff build attempts failed: ...
The C++ native extension is JIT-built on first use, in a local temp directory (override with TORCH_EXTENSIONS_DIR; a network-mounted build dir — NFS/Lustre — can hang on torch’s build lock, so a local path is used by default). If that fails: ensure g++ >= 9 (g++ --version; on conda, conda install -c conda-forge gxx_linux-64), that PyTorch headers are available (python -c "import torch; print(torch.utils.cmake_prefix_path)"), and that ninja is installed. On HPC: module load gcc/11. Then clean + rebuild:
conda install -c conda-forge ninja -y
cd $(python -c "import hessian_ff; print(hessian_ff.__path__[0])")/native && make clean && make
Also ensure hessian_ff is importable at all (it is if you installed mlmm-toolkit with pip install -e .).
B-factor layer assignment¶
Encoding: ML = 0.0, Movable-MM = 10.0, Frozen-MM = 20.0 (tolerance ±1.0). Common symptoms:
Wrong layer assignments / ML region too small or too large — verify
--model-pdbselects the intended atoms; adjust--radius-freeze(default 8.0 Å) for the Movable / Frozen boundary; control Hessian-target MM separately viahess_cutoff/hess_mm_atoms. Inspect the layered PDB visually (color by B-factor).B-factors not recognized (calculator treats all atoms as one layer) — re-run
define-layer; do not hand-edit B-factors to arbitrary values.--detect-layerproduces unexpected splits or fails without--model-pdb— supply a PDB input (or XYZ +--ref-pdb); re-rundefine-layerexplicitly; for distance-based control, sethess_cutoff/movable_cutoffand use--no-detect-layer(supplying--movable-cutoffalready disables--detect-layer).
Installation / environment¶
Symptom |
Fix |
|---|---|
MLIP model download fails / HF auth missing ( |
|
|
Install PyTorch matching your cluster CUDA runtime; verify |
|
torch_scatter ships no PyPI binary wheel (only an sdist) → source-build fails under PEP517 build isolation. Install from PyG’s prebuilt-wheel index matching your torch+CUDA tag: |
|
|
Plot export fails (Plotly / Chrome) |
|
DMF mode fails (cyipopt / pydmf / ase missing)¶
See the DMF mode requires ase, cyipopt, and pydmf to be installed. row above.
CUDA / PyTorch mismatch¶
See the torch.cuda.is_available() row above.
Plot export fails (Chrome missing)¶
See the Plot export fails (Plotly / Chrome) row above.
Calculation / convergence¶
CUDA OOM (torch.cuda.OutOfMemoryError)¶
ML/MM systems are larger than pure MLIP, so VRAM pressure is higher. Try in order:
Verify Frozen-MM —
define-layershould put distal atoms at B=20.0. If the Frozen region is too small, the Movable-MM region (and its Hessian) inflates. Decrease--radius-freezeto expand Frozen.Shrink ML region — smaller
--radiusinextract, or hand-craft a smaller--model-pdb.--hessian-calc-mode FiniteDifference— slower but lower peak VRAM.Pre-define layers with
define-layeranduse_bfactor_layers: truein YAML.Bigger GPU — 24 GB+ for 500+ ML atoms; 48 GB+ for 1000+.
TS optimization does not converge / multiple imaginary modes remain¶
Try --opt-mode grad (Dimer) ↔ --opt-mode hess (RS-I-RFO); --flatten to flatten extra imaginary modes; --max-cycles 20000; tighter --thresh baker / gau_tight; expand Hessian-target atoms via hess_cutoff.
Optimizer “stalls” with flat energy + forces just above threshold (MLIP force noise floor)¶
MLIPs have finite numerical precision. For large ML/MM systems the noise floor can exceed the gau / baker gradient thresholds, so forces never drop further even though the geometry is stationary. This is handled automatically via energy_plateau: true (declares convergence when the 50-step energy range falls below 1.0e-4 au ≈ 0.06 kcal/mol). To tighten or disable:
opt:
energy_plateau_thresh: 1.0e-05 # stricter (au)
energy_plateau_window: 100 # require a longer flat stretch
# energy_plateau: false # disable entirely (e.g. for benchmarking)
The plateau check is skipped automatically for chain-of-states optimizers (GS / DMF), so path-opt / path-search are unaffected.
IRC does not terminate properly¶
Reduce --step-size 0.05 (default 0.10); raise --max-cycles 200; verify the TS candidate has exactly one imaginary frequency before launching IRC.
MEP search (GSM / DMF) fails or misses bonds¶
Raise --max-nodes (e.g. 15–20) for complex reactions; enable --preopt; try the alternate method (--mep-mode dmf ↔ gsm); tune bond-detection in YAML (bond.bond_factor, bond.delta_fraction).
Performance / stability tips¶
OOM — shrink ML region, shrink Hessian-target MM, lower
--max-nodes, or use--opt-mode grad.Analytical ML Hessian is fastest when VRAM allows (24 GB+ recommended for 300+ ML atoms); else
FiniteDifference.MM Hessian — default
mm_fd: true(finite-difference) trades speed for memory;mm_fd: falseis faster on small systems but heavier on memory. Cap MM atom count withhess_cutoff.Large systems (2000+ atoms) — make sure the Frozen layer is generous (
define-layerwith appropriate cutoffs) to keep the movable DOF count down.Multi-GPU — ML on one device (
ml_cuda_idx: 0), MM on another (mm_device: cuda,mm_cuda_idx: 1).ML/MM parallelism — ML (GPU) and MM (CPU) run in parallel by default; tune CPU threads with
mm_threads.
Backend-specific¶
Symptom |
Fix |
|---|---|
|
|
CUDA OOM on ORB / MACE / AIMNet2 |
These backends use FD Hessians (more VRAM). Lower |
|
|
How to report an issue¶
Include: the exact command line, summary.log (or console output), the smallest reproducing inputs, your env (OS / Python / CUDA / PyTorch), and whether AmberTools + hessian_ff are properly installed.