Concepts & Workflow¶
This page explains the key terms in mlmm-toolkit – pockets, templates, segments, images, and the ML/MM 3-layer system – and how the all command ties together the subcommands.
Workflow at a glance¶
Most workflows follow this flow:
Full system(s) (PDB/XYZ/GJF)
|
+- (optional) pocket extraction [extract] <- requires PDB when you use --center/-c
| |
| Pocket/cluster model(s) (PDB)
| |
| +- Amber topology [mm-parm] <- generates parm7/rst7 via AmberTools
| | |
| +- 3-layer assignment [define-layer] <- B-factor encoding for ML/MM layers
| | |
| | v
| | Layered ML/MM system (PDB with B-factors)
| | |
| +- (optional) staged scan [scan] <- single-structure workflows
| | |
| | v
| | Ordered intermediates
| |
| +- MEP search [path-search] or [path-opt]
| |
| MEP trajectory (mep_trj.xyz) + energy diagrams
|
+- (optional) TS optimization + IRC [tsopt] -> [irc]
+- (optional) thermo [freq]
+- (optional) single-point DFT [dft]
Each stage is available as an individual subcommand. The mlmm all command runs many stages end-to-end.
ML/MM 3-layer system¶
A central concept in mlmm-toolkit is the 3-layer ML/MM partitioning of the system. Each atom belongs to one of three layers, encoded in the PDB B-factor column:
Layer |
B-factor |
Description |
|---|---|---|
ML (Layer 1) |
0.0 |
The reactive region. Full MLIP energy, forces, and Hessian. |
Movable-MM (Layer 2) |
10.0 |
MM atoms allowed to move during optimization. |
Frozen (Layer 3) |
20.0 |
Coordinates are fixed; no optimization. |
B-factor values are encoded in PDB columns 61-66 (the temperature factor column).
The define-layer subcommand assigns these B-factors based on distance from the ML region:
Atoms/residues within
--radius-freeze(default 8.0 Å) are assigned to Movable-MM.Atoms/residues beyond
--radius-freezeare Frozen.
Hessian-target MM atoms are controlled by calculator options (hess_cutoff, explicit hess_mm_atoms, etc.), not by a dedicated B-factor layer.
Tip
The B-factor encoding allows you to visually inspect layer assignments in any molecular viewer that can color by B-factor.
ONIOM-like energy decomposition¶
mlmm-toolkit uses an ONIOM-like scheme to combine ML and MM energies:
E_total = E_REAL_low + E_MODEL_high - E_MODEL_low
where:
REAL = the full system (all atoms)
MODEL = the ML region (subset of atoms)
high = the selected MLIP backend (default: UMA; ORB, MACE, AIMNet2 also supported)
low = hessian_ff (Amber-based classical force field)
This means:
The full system is evaluated at the MM level (hessian_ff).
The ML region is evaluated at both the MLIP level and the MM level.
The MM contribution of the ML region is subtracted to avoid double-counting.
The same decomposition applies to forces and (where applicable) Hessians. Link-hydrogen contributions are redistributed to the ML and MM host atoms via a Jacobian.
The MLIP backend is selected via -b/--backend (default: uma). Alternative backends (orb, mace, aimnet2) are installed as optional dependencies (e.g., pip install "mlmm-toolkit[orb]").
When --embedcharge is enabled, an xTB point-charge embedding correction is applied to account for the electrostatic influence of the MM environment on the ML region.
hessian_ff: the MM engine¶
hessian_ff is a C++ native extension that evaluates Amber force field energies, forces, and analytical Hessians. It reads Amber parm7/rst7 topology files and supports:
Bond, angle, dihedral, and improper terms
Van der Waals (Lennard-Jones) interactions
Electrostatic interactions
Analytical second derivatives (Hessian)
CPU execution (GPU memory is reserved for MLIP inference)
Unlike OpenMM, hessian_ff is designed specifically for providing the MM Hessian needed by the ONIOM-like coupling and vibrational analysis.
Amber parm7/rst7 topology¶
The MM calculation requires Amber topology files:
parm7 (parameter/topology file): Contains atom types, charges, bonding connectivity, and force field parameters.
rst7 (restart/coordinate file): Contains atomic coordinates.
These are generated by the mm-parm subcommand using AmberTools (tleap, antechamber, parmchk2). The command automatically:
Identifies non-standard residues (substrates, cofactors)
Parameterizes them with GAFF2 (General Amber Force Field 2)
Assigns AM1-BCC partial charges
Builds the full topology with ff19SB for protein residues
Key objects and terms¶
Full system vs. pocket (cluster model)¶
Full system: your original structure(s). In enzyme use-cases this is typically a protein-ligand complex.
Pocket / cluster model: a truncated structure around the substrate(s) used to reduce system size for MEP/TS search.
Pocket extraction is controlled by:
-c/--center: how to locate the substrate (residue IDs, residue names, or a substrate-only PDB).-r/--radius,--radius-het2het,--include-H2O,--exclude-backbone,--add-linkH,--selected-resn.
Real system vs. Model system (ONIOM terminology)¶
Real system: the entire set of atoms (all 3 layers). Evaluated at the MM (low) level.
Model system: the ML region (Layer 1 only). Evaluated at both the MLIP (high) and MM (low) levels.
Images and segments¶
Image: a single geometry (one “node”) along a chain-of-states path.
Segment: an MEP between two adjacent endpoints (e.g., R -> I1, I1 -> I2,…). A multi-structure run is decomposed into segments.
Templates and file conversion (--convert-files)¶
mlmm-toolkit often writes a trajectory (e.g., mep_trj.xyz, irc_trj.xyz). When you supply topology-aware inputs (PDB templates or Gaussian inputs), it can optionally write companion files:
.pdbcompanions when a PDB template exists.gjfcompanions when a Gaussian template exists
This behavior is controlled globally by --convert-files/--no-convert-files (default: True).
Three common workflow modes¶
1) Multi-structure MEP search (R ->… -> P)¶
Use this when you already have two or more full structures along a reaction coordinate.
Typical command:
mlmm -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3'
2) Single-structure staged scan -> MEP¶
Use this when you only have one structure, but you can define a scan that generates endpoints.
Typical command:
mlmm -i holo.pdb -c '308,309' \
--scan-lists '[("TYR,285,CA","MMT,309,C10",2.20)]'
3) TSOPT-only mode (pocket TS optimization)¶
Use this when you already have a TS candidate (or want a quick TS optimization on one structure).
Typical command:
mlmm -i ts_guess.pdb -c 'SAM,GPP' --tsopt
When to use all vs individual subcommands¶
Prefer mlmm all when…¶
You want an end-to-end run (extract -> mm-parm -> define-layer -> MEP -> TSOPT/IRC -> freq/DFT).
You are still exploring the workflow and want a single command to manage outputs.
Prefer subcommands when…¶
You want to debug a specific stage (e.g., only
extract, onlymm-parm, onlypath-search).You want to mix-and-match a custom workflow (e.g., your own endpoint preparation).
You already have parm7/rst7 and layer-assigned PDB files from a previous run.
You want to generate Gaussian/ORCA ONIOM input files via
oniom-export --mode g16|orca.
A few CLI conventions worth knowing¶
Important
Boolean options accept both
--flag/--no-flagand value style--flag True/False(yes/no,1/0are also accepted). Prefer toggle style.With multiple PDB inputs, all files should have the same atoms in the same order (only coordinates differ).
For enzyme use-cases, you usually want hydrogens present in the input PDB.
Most subcommands require
--parmand--model-pdbfor ML/MM calculations.
Next steps¶
Getting started¶
Getting Started – installation and first run
Common Error Recipes – symptom-first failure routing
Troubleshooting – common errors and fixes
Core subcommands¶
Subcommand |
Purpose |
Documentation |
|---|---|---|
|
End-to-end workflow |
|
|
Pocket extraction |
|
|
Amber topology generation |
|
|
3-layer assignment |
|
|
Recursive MEP search |
|
|
TS optimization |
|
|
Vibrational analysis |
|
|
Single-point DFT |
|
|
Gaussian ONIOM / ORCA QM/MM input generation ( |
Reference¶
YAML Reference – complete YAML configuration options
Glossary – terminology reference