Troubleshooting

This page collects common failure modes and practical fixes. Search this page for the error message you encounter. If you want a symptom-first entrypoint, start with Common Error Recipes and then return here for details.


Preflight checklist

Before a long run, verify:

  • You can run mlmm -h and see the CLI help.

  • UMA can be downloaded (Hugging Face login/token is available on the machine).

  • For enzyme workflows: your input PDB(s) contain hydrogens and element symbols.

  • When you provide multiple PDBs: they have the same atoms in the same order (only coordinates differ).

  • AmberTools is installed and tleap is on your PATH (required for mm-parm).

  • The hessian_ff C++ native extension is built (cd hessian_ff/native && make).


Input / extraction problems

“Element symbols are missing… please run add-elem-info”

Typical message:

Element symbols are missing in '...'.
Please run `mlmm add-elem-info -i...` to populate element columns before running extract.

Fix:

  • Run:

mlmm add-elem-info -i input.pdb -o input_with_elem.pdb
  • Then re-run extract / all using the updated PDB.

Why it happens:

  • Many PDBs do not populate the element column consistently. extract requires element symbols for reliable atom typing.


“[multi] Atom count mismatch…” or “[multi] Atom order mismatch…”

Typical messages:

[multi] Atom count mismatch between input #1 and input #2:...
[multi] Atom order mismatch between input #1 and input #2.

Fix:

  • Regenerate all structures with the same preparation workflow (same protonation tool, same settings).

  • If you add hydrogens, do it in a way that produces consistent ordering across all frames.

Tip:

  • For ensembles generated by MD, prefer extracting frames from the same trajectory/topology rather than mixing PDBs produced by different tools.


“My pocket is empty / missing important residues”

Symptoms:

  • The extracted pocket is unexpectedly small.

  • Key catalytic residues are missing.

Fixes to try:

  • Increase --radius (e.g., 2.6 -> 3.5 Angstrom).

  • Use --selected-resn to force-include residues (e.g., --selected-resn 'A:123,B:456').

  • If backbone trimming is too aggressive, set --no-exclude-backbone.


Charge / spin problems

“Charge is required…” (non-GJF inputs)

Calculation subcommands require explicit -q/--charge. In all, charge is resolved in order: -q/--charge override -> extraction summary -> --ligand-charge fallback when extraction is skipped.

Fix:

  • Provide charge and multiplicity explicitly:

mlmm path-search -i R.pdb P.pdb --parm real.parm7 --model-pdb model.pdb -q 0 -m 1
  • Or, when using extraction, provide a residue-name mapping and run through all:

mlmm -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3'

AmberTools / mm-parm problems

tleap not found

Typical message:

FileNotFoundError: tleap not found on PATH

or

mm-parm requires AmberTools (tleap, antechamber, parmchk2).

Fix:

  • Install AmberTools via conda:

conda install -c conda-forge ambertools -y
  • Or load the appropriate module on HPC:

module load ambertools
  • Verify availability:

which tleap
which antechamber
which parmchk2
  • Note: without AmberTools, you can still run opt, tsopt, path-search, etc. if you supply --parm manually.


antechamber fails for a ligand

Symptoms:

  • mm-parm fails during ligand parameterization.

  • Errors about atom type assignment or charge calculation.

Fixes to try:

  • Check that the ligand has correct element symbols and bond connectivity in the PDB.

  • Ensure --ligand-charge is specified correctly: -l 'GPP:-3,SAM:1'.

  • Use --keep-temp to preserve intermediate files and inspect <resname>.antechamber.log:

mlmm mm-parm -i input.pdb -l 'LIG:-1' --keep-temp
  • Check that hydrogen atoms are correctly added and TER records are appropriate.

  • Try running antechamber manually on the extracted ligand PDB to diagnose the issue:

antechamber -i ligand.pdb -fi pdb -o ligand.mol2 -fo mol2 -c bcc -nc -3 -at gaff2

parm7/rst7 mismatch errors

Typical messages:

Atom count in parm7 (...) does not match input PDB (...)

or

RuntimeError: parm7 topology does not match the input structure

or

Coordinate shape mismatch for... got (N, 3), expected (M, 3)

Fix:

  • The parm7 file must correspond to exactly the same atoms (in the same order) as the input PDB.

  • Re-run mm-parm to regenerate the parm7 from the current PDB.

  • Do not edit or reorder PDB atoms after running mm-parm.

  • When re-running mm-parm, use the output PDB (<prefix>.pdb) as the input for subsequent calculations, since tleap may add or remove hydrogens.


parm7 element order does not match PDB

Symptoms:

  • oniom-export reports “Element sequence mismatch at atom index…”

Fix:

  • Use --no-element-check to disable the element check (verify results manually).

  • The correct fix is to use the same PDB for -i that was used when generating the parm7.


hessian_ff build problems

Build fails (“make” errors)

Typical symptoms:

  • make in hessian_ff/native/ produces compilation errors.

  • ImportError: cannot import name 'ForceFieldTorch' from 'hessian_ff'.

  • RuntimeError: hessian_ff build attempts failed: ...

Fixes to try:

  • Ensure you have a C++ compiler (g++ >= 9) installed:

g++ --version
  • Ensure PyTorch headers are available:

python -c "import torch; print(torch.utils.cmake_prefix_path)"
  • On HPC, load a compiler module:

module load gcc/11
  • Clean and rebuild:

conda install -c conda-forge ninja -y
cd hessian_ff/native && make clean && make

hessian_ff import errors

Typical message:

ImportError: cannot import name 'ForceFieldTorch' from 'hessian_ff'

or:

RuntimeError: hessian_ff build attempts failed: ...
To rebuild hessian_ff native extensions in this environment:
  conda install -c conda-forge ninja -y
  cd $(python -c "import hessian_ff; print(hessian_ff.__path__[0])")/native && make clean && make

Fix:

  • The C++ native extension needs to be built first:

cd hessian_ff/native && make
  • Ensure the hessian_ff package is in your Python path (it should be if you installed mlmm-toolkit with pip install -e .).


B-factor layer assignment problems

Wrong layer assignments

Symptoms:

  • Atoms are assigned to unexpected layers.

  • ML region is too small or too large.

Fixes to try:

  • Inspect the layer-assigned PDB visually (color by B-factor in your molecular viewer).

  • Check that --model-pdb correctly defines the ML region atoms.

  • Adjust the distance cutoffs in define-layer:

  • --radius-freeze (default 8.0 Angstrom): controls Movable-MM/Frozen boundary.

  • If needed, control Hessian-target MM separately in calc options (hess_cutoff, hess_mm_atoms).

  • If using use_bfactor_layers: true in YAML, verify that B-factor values match the expected encoding (0.0, 10.0, 20.0 with tolerance 1.0).


B-factor values are not recognized

Typical symptoms:

  • Calculator treats all atoms as frozen or all as ML.

  • B-factor values are not one of {0.0, 10.0, 20.0}.

Fix:

  • Re-run define-layer to ensure correct B-factor encoding.

  • A tolerance of 1.0 is applied: B-factors near 0/10/20 map to ML/Movable/Frozen.

  • Do not manually edit B-factors to arbitrary values.


–detect-layer does not work as expected

Symptoms:

  • Automatic layer detection from B-factors produces unexpected ML/Movable/Frozen splits.

  • Running with --detect-layer without --model-pdb fails.

Fixes to try:

  • Ensure the input is a PDB (or an XYZ with --ref-pdb).

  • Re-run define-layer to explicitly assign B-factors, then use the generated PDB.

  • For distance-based control, specify hess_cutoff / movable_cutoff and switch to --no-detect-layer if needed.

  • Note that supplying --movable-cutoff disables --detect-layer.


Installation / environment problems

UMA download/authentication errors

Symptoms:

  • Errors about missing Hugging Face authentication or being unable to download model weights.

Fix:

  • Log in once per environment/machine:

huggingface-cli login
  • On HPC, ensure your home directory (or HF cache directory) is writable from compute nodes.


CUDA / PyTorch mismatch

Symptoms:

  • torch.cuda.is_available() is false even though you have a GPU.

  • CUDA runtime errors at import time.

Fixes:

  • Install a PyTorch build matching your cluster CUDA runtime.

  • Confirm GPU visibility:

nvidia-smi
python -c "import torch; print(torch.version.cuda, torch.cuda.is_available())"

DMF mode fails (cyipopt missing)

If you use DMF (--mep-mode dmf) and see errors importing IPOPT/cyipopt:

Fix:

  • Install cyipopt from conda-forge (recommended) before installing mlmm:

conda install -c conda-forge cyipopt

Plot export fails (Chrome missing)

If figure export fails and you see Plotly/Chrome-related errors:

Fix:

  • Install a headless Chrome once:

plotly_get_chrome -y

Calculation / convergence problems

CUDA out of memory (VRAM)

Symptoms:

  • torch.cuda.OutOfMemoryError: CUDA out of memory

  • System hangs or crashes during Hessian calculation.

ML/MM systems are typically larger than pure cluster models, so VRAM pressure is higher.

Fixes to try:

  • Reduce ML region size: use a smaller extraction radius or manually trim --model-pdb.

  • Use FiniteDifference ML Hessian: set --hessian-calc-mode FiniteDifference (uses less VRAM but is slower).

  • Move MM to CPU: set mm_device: cpu in YAML (default).

  • Reduce Hessian-target MM region: decrease hess_cutoff (YAML/CLI where available).

  • Use 3-layer + Hessian-target control: set hess_cutoff and movable_cutoff in YAML to limit the number of atoms included in the Hessian:

calc:
  hess_cutoff: 3.6
  movable_cutoff: 8.0
  • Pre-define layers with define-layer and use use_bfactor_layers: true.

  • Use a GPU with more VRAM: 24 GB+ recommended for systems with 500+ ML atoms; 48 GB+ for 1000+ ML atoms.

  • Reduce pocket size: use a smaller --radius during extraction.


TS optimization fails to converge

Symptoms:

  • TS optimization runs for many cycles without converging.

  • Multiple imaginary frequencies remain after optimization.

Fixes to try:

  • Switch optimizer modes: --opt-mode grad (Dimer) or --opt-mode hess (RS-I-RFO).

  • Enable flattening of extra imaginary modes: --flatten.

  • Increase max cycles: --max-cycles 20000.

  • Use tighter convergence: --thresh baker or --thresh gau_tight.

  • Adjust hess_cutoff to expand the range of atoms included in the Hessian calculation.


IRC does not terminate properly

Symptoms:

  • IRC stops before reaching a clear minimum.

  • Energy oscillates or gradient remains high.

Fixes to try:

  • Reduce step size: --step-size 0.05 (default is 0.10).

  • Increase max cycles: --max-cycles 200.

  • Check if the TS candidate has only one imaginary frequency before running IRC.


MEP search (GSM/DMF) fails or gives unexpected results

Symptoms:

  • Path search terminates with no valid MEP.

  • Bond changes are not detected correctly.

Fixes to try:

  • Increase --max-nodes (e.g., 15 or 20) for complex reactions.

  • Enable endpoint pre-optimization: --preopt.

  • Try the alternative MEP method: --mep-mode dmf (if GSM fails) or vice versa.

  • Adjust bond detection parameters in YAML (bond.bond_factor, bond.delta_fraction).


Performance / stability tips

  • Out of memory (VRAM): reduce ML region size, reduce Hessian-target MM region, reduce nodes (--max-nodes), or use lighter optimizer settings (--opt-mode grad).

  • Analytical ML Hessian is slow or OOM: use --hessian-calc-mode FiniteDifference for the ML region. Only use Analytical if you have ample VRAM (24 GB+ recommended for 300+ ML atoms).

  • MM Hessian: mm_fd: true (default) uses finite-difference for MM Hessian. Analytical MM Hessian (mm_fd: false) is faster for small systems but may require more memory.

  • MM Hessian is slow: set hess_cutoff to limit the number of Hessian-target MM atoms.

  • Large systems (2000+ atoms): ensure frozen atoms are properly set (Frozen layer, B=20) to reduce the movable DOF count. Use define-layer with appropriate cutoffs.

  • Multi-GPU: place ML on one GPU (ml_cuda_idx: 0) and MM on another (mm_device: cuda, mm_cuda_idx: 1) if available.

  • ML and MM parallel execution: by default, ML (GPU) and MM (CPU) run in parallel. Tune CPU thread count with mm_threads.


Backend-specific issues

ImportError when using –backend orb/mace/aimnet2

Symptom: ImportError: orb-models is required for the ORB backend

Fix: Install the optional dependency for the chosen backend:

pip install "mlmm-toolkit[orb]"      # ORB backend
pip install "mlmm-toolkit[aimnet2]"  # AIMNet2 backend
# MACE: pip uninstall fairchem-core && pip install mace-torch (separate env required)

CUDA out of memory with non-UMA backends

Symptom: RuntimeError: CUDA out of memory when using ORB, MACE, or AIMNet2.

Fix: Non-UMA backends use finite-difference Hessians, which require more VRAM. Options:

  • Reduce --radius-partial-hessian to limit Hessian-target atoms

  • Use --hessian-calc-mode FiniteDifference explicitly with a smaller hess_cutoff

  • Use ml_device: cpu in YAML (slower but avoids VRAM limits)


xTB not found when using –embedcharge

Symptom: FileNotFoundError: xtb command not found

Fix: Install xTB and ensure it’s on $PATH:

conda install -c conda-forge xtb

How to report an issue

When asking for help, include:

  • The exact command line you ran

  • summary.log (or console output)

  • The smallest input files that reproduce the problem (if possible)

  • Your environment: OS, Python, CUDA, PyTorch versions

  • Whether AmberTools and hessian_ff are properly installed