all¶
mlmm all runs the end-to-end ML/MM enzymatic-reaction workflow on full-system layered PDBs in one command, instead of chaining extract → mm-parm → define-layer → scan / path-search → tsopt → irc / freq / dft by hand. It chains active-site extraction, MM topology preparation, ML/MM layer assignment, an optional staged scan, MEP search (single-pass path-opt by default; recursive path-search with --refine-path), and optional post-processing (TS optimization, EulerPC IRC, thermochemistry, single-point DFT, and DFT//MLIP diagrams). The default MLIP backend for the ML region is UMA; choose an alternative with -b/--backend.
all runs in one of three modes, chosen by what you pass:
Multi-structure ensemble — give ≥ 2 full PDBs in reaction order to drive a GSM MEP search across the supplied structures.
Single-structure staged scan — give one PDB plus
--scan-lists; each literal is a scan stage and the relaxed endpoints become the MEP endpoints.TSOPT-only — give a single PDB and set
--tsopt(no--scan-lists) to run TS optimization directly, with no MEP search.
Important
--tsopt produces TS candidates. all runs IRC and freq automatically for validation, but always inspect the results (imaginary mode count + endpoint connectivity) before mechanistic interpretation.
Examples¶
Command form:
mlmm all -i INPUT1 [INPUT2 ...] -c SUBSTRATE [options]
mlmm all --help shows core options; mlmm all --help-advanced shows the full option list.
Multi-structure MEP with full post-processing:
mlmm all -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
--tsopt True --thermo True --dft True --out-dir ./result_all
Single-structure staged scan (two stages):
mlmm all -i A.pdb -c '308,309' --scan-lists '[(12,45,1.35)]' '[(10,55,2.20)]' \
--multiplicity 1 --out-dir ./result_scan_all
# a single literal can drive several bonds at once: '[(10,55,2.20),(23,34,1.80)]'
TSOPT-only validation (single input, no MEP search):
mlmm all -i A.pdb -c 'GPP,MMT' -l 'GPP:-3,MMT:-1' \
--tsopt True --thermo True --dft True --out-dir result_tsopt_only
ORB backend with xTB point-charge embedding:
mlmm all -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
--backend orb --embedcharge --out-dir ./result_all_orb
PDB companion files are generated when reference templates are available; control with --convert-files (on by default).
Workflow¶
Active-site extraction and ML-region definition (multi-structure union when multiple inputs)
Define the substrate via
-c/--center(PDB path, residue IDs, or residue names) and optionally--ligand-chargeas a total number (distributed) or a mapping such asGPP:-3,MMT:-1.The extractor writes per-input pocket PDBs under
<out-dir>/_work/pockets/. The first pocket is copied to<out-dir>/ml_region.pdb(a reusable deliverable you can pass back as--model-pdb) and defines the ML region for all subsequent ML/MM calculations.The first-model net ML-region charge becomes the net ML-region charge for later steps.
Omitting
-c/--centerskips extraction and uses the full input structures directly.
ML/MM preparation (parm7 + layer assignment)
mm_parmruns once on the first full input PDB and writes<out-dir>/mm_parm/<input_basename>.parm7/.rst7(a reusable deliverable you can pass back as--parm), which are passed automatically as--parm.define-layerruns on each full-system PDB and assigns 3-layer B-factors (ML = 0.0, Movable-MM = 10.0, Frozen-MM = 20.0) based on the ML-region definition. The layered full-system PDBs are written under<out-dir>/layered/.
Optional staged scan (single-structure only)
When exactly one input PDB is provided and
--scan-listsis given, the tool performs a staged, bond-length-driven scan on the layered full-system PDB using the ML/MM calculator.Each stage’s relaxed structure (
stage_XX/result.pdb) is collected as an intermediate / product candidate. The ordered input series for the path search becomes[initial layered PDB, stage_01/result.pdb, stage_02/result.pdb, ...].
MEP search on full-system layered PDBs
All MEP calculations run on full-system layered PDBs (with
--parmand--detect-layer), not on pockets.--refine-pathruns recursivepath_searchwith automatic refinement, detecting multistep reactions and building a detailed MEP per elementary step. Complex multistep mechanisms may need manual trial-and-error to obtain a converged pathway.--no-refine-path(default) runspath-optGSM per adjacent pair, then concatenates trajectories, extracts the HEI per segment, detects bond changes, and writessummary.json. Both modes support Stage 5 post-processing.For multi-input runs, the original full PDBs are supplied as merge references automatically. In the scan-derived series (single-structure case), the single original full PDB is reused as the reference template.
Summary and optional post-processing
The raw MEP-engine output (per-segment trajectories, the full MEP trajectory, and the engine
summary.json) is written under<out-dir>/_work/path_opt/(or<out-dir>/_work/path_search/with--refine-path); the merged products (mep.pdb,mep_trj.xyz,mep_plot.png,energy_diagram_MEP.png) are moved to<out-dir>/andsummary.{json,log}copied there.--tsoptruns TS optimization on each HEI, follows with EulerPC IRC, and emits segment energy diagrams.--thermocomputes ML/MM thermochemistry on (R, TS, P) and adds a Gibbs diagram.--dftruns DFT single-point on (R, TS, P) and adds a DFT diagram. With--thermo, a DFT//MLIP Gibbs diagram is also produced.When VRAM allows, set
--hessian-calc-mode Analytical(strongly recommended over the FiniteDifference default).
TSOPT-only mode (single input,
--tsopt, no--scan-lists)Skips steps 4–5 and runs
tsopton the layered full-system PDB, performs EulerPC IRC, minimizes both ends, builds ML/MM energy diagrams for R-TS-P, and optionally adds Gibbs, DFT, and DFT//MLIP diagrams.In this mode only, the IRC endpoint with higher energy is adopted as the reactant (R).
Outputs¶
The tree has three zones: deliverables at the root, per-segment deliverables under segments/seg_NN/, and pipeline scratch under _work/ (safe to remove once you have the results). The three you check first are summary.log, summary.json, and mep.pdb (the concatenated reaction path, moved to the root; raw engine output stays under _work/path_opt/ by default, or _work/path_search/ with --refine-path).
<out-dir>/
summary.json # mirrored top-level summary (when the MEP stage runs)
summary.log
mep.pdb # concatenated MEP path (copied to the root)
mep_trj.xyz
mep_plot.png # smooth MEP energy profile
energy_diagram_MEP.png # all-segment MEP barriers
energy_diagram_UMA_all.png # aggregated post-processing diagrams (when enabled)
energy_diagram_G_UMA_all.png
energy_diagram_DFT_all.png
energy_diagram_G_DFT_plus_UMA_all.png
irc_plot_all.png
ml_region.pdb # ML-region definition (reusable as --model-pdb for follow-up runs)
mm_parm/<input1>.parm7,.rst7 # MM topology from the first full-enzyme input (reusable as --parm)
layered/ # Layered full-system PDBs (B-factor annotated; reusable inputs)
segments/ # per-reactive-segment deliverables
seg_NN/ # 1-based 2-digit index, e.g. seg_01, seg_02
reactant.pdb · ts.pdb · product.pdb # canonical R/TS/P
ts/, irc/ # TS optimisation + EulerPC IRC (--tsopt)
freq/ (--thermo), dft/ (--dft)
structures/{reactant,ts,product}.pdb # nested copy + raw IRC endpoints
energy_diagram_{UMA,G_UMA,DFT,G_DFT_plus_UMA}.png
_work/ # pipeline scratch (safe to delete)
pockets/ # Per-input pocket PDBs (multi-structure union)
scan/ # present only in single-structure + scan mode (stage_01/result.pdb …)
path_opt/ # raw MEP-engine output (path_search/ with --refine-path)
summary.{json,log} · seg_NN_mep/ # raw per-segment GSM trajectories (merged MEP products are moved to the root)
In TSOPT-only mode (single input + --tsopt, no --scan-lists) there is no MEP stage: the optimized R/TS/P plus ts/, irc/, freq/, and dft/ land under segments/seg_01/, and _work/path_opt/ is absent.
At -v 2 the console summarises extraction, MM preparation, scan stages, MEP progress (GSM), and per-stage timing; see Verbosity levels.
Reading summary.log¶
The log is organised into numbered sections:
[1] Global MEP overview — image / segment counts, MEP trajectory plot paths, aggregate MEP energy diagram.
[2] Segment-level MEP summary (MLIP path) — per-segment barriers, reaction energies, bond-change summaries.
[3] Per-segment post-processing (TSOPT / Thermo / DFT) — TS imaginary-frequency checks, IRC outputs, energy tables.
[4] Energy diagrams (overview) — diagram tables for MEP / MLIP / Gibbs / DFT plus an optional cross-method summary.
[5] Output directory structure — a compact tree of generated files with inline annotations.
Reading summary.json¶
Top-level keys: out_dir, n_images, n_segments (run metadata and counts); segments (per-segment entries with index, tag, kind, barrier_kcal, delta_kcal, bond_changes); energy_diagrams (optional payloads with labels, energies_kcal, energies_au, ylabel, image paths).
CLI options¶
Defaults shown are used when the option is not specified. The full flag list is in the generated command reference; the tables below cover the options that need explanation.
Input / output¶
Option |
Description |
Default |
|---|---|---|
|
Two or more full PDBs in reaction order (single input allowed with |
Required |
|
Substrate specification (PDB path, residue IDs, or residue names). Omit to skip extraction. |
None |
|
Total charge or residue-specific mapping (e.g. |
None |
|
Force net system charge (highest-priority override). |
None |
|
Top-level output directory. |
|
|
AMBER parm7 topology for the full (real) system. Auto-generated by |
None |
|
Pre-built ML-region PDB. When provided, ML-region determination is skipped. |
None |
|
Reference PDB for XYZ input (required so PDB metadata can be recovered). |
None |
|
Global toggle for XYZ / TRJ → PDB companions. |
|
|
Save optimizer dumps. Always forwarded to |
|
|
Base YAML applied first. |
None |
|
Print resolved configuration before execution. |
|
|
Validate and print plan without running stages (shown in |
|
Extraction¶
Option |
Description |
Default |
|---|---|---|
|
Pocket inclusion cutoff (Å). |
|
|
Independent hetero-hetero cutoff (Å). |
|
|
Include water molecules (HOH / WAT / H2O / DOD / TIP / TIP3 / SOL). |
|
|
Remove backbone atoms on non-substrate amino acids. |
|
|
Add link hydrogens for severed bonds. |
|
|
Residues to force include. |
|
|
Comma-separated residue names (with optional charge) to treat as amino acids for backbone truncation and charge assignment (e.g. |
|
MM preparation¶
Option |
Description |
Default |
|---|---|---|
|
Force-field set for |
|
|
Control TER insertion around ligand / water / ion blocks. |
|
|
Keep the |
|
|
Spin multiplicity mapping forwarded to |
None |
MEP search¶
Option |
Description |
Default |
|---|---|---|
|
Spin multiplicity (2S+1). |
|
|
Internal nodes for segment GSM. |
|
|
Maximum GSM macro-cycles. |
|
|
Enable TS refinement for segment GSM. |
|
|
Optimizer preset for scan / path-search and single optimizations ( |
|
|
Optimizer preset override for TSOPT / post-IRC endpoint optimizations ( |
|
|
Convergence preset ( |
None |
|
Convergence preset for post-IRC endpoint optimizations. |
|
|
Pre-optimize endpoints before segmentation. |
|
|
|
|
|
MLIP backend for the ML region: |
|
|
xTB point-charge embedding correction for MM-to-ML environmental effects (experimental). |
|
|
Cutoff radius (Å) for embed-charge MM atoms. |
|
|
Enable CMAP (backbone cross-map dihedral correction) in the model parm7. Disabled by default, consistent with Gaussian ONIOM. |
|
|
ML/MM Hessian mode ( |
|
|
Detect ML/MM layers from input PDB B-factors (B = 0 / 10 / 20). If disabled, downstream tools require |
|
TSOPT optimizer selection order: --opt-mode-post (if set) → --opt-mode (only when explicitly provided) → TSOPT default (hess → RS-I-RFO).
Scan (single-input runs)¶
Option |
Description |
Default |
|---|---|---|
|
Staged scans: |
None |
|
Override the scan output directory. |
None |
|
Override scan indexing (True = 1-based, False = 0-based). |
None |
|
Maximum step size (Å). |
Default |
|
Harmonic bias strength (eV / Ų). |
Default |
|
Relaxation max cycles per step. |
Default |
|
Override scan pre-optimization toggle. |
None |
|
Override scan end-of-stage optimization. |
None |
Post-processing + freq / DFT overrides¶
Option |
Description |
Default |
|---|---|---|
|
Run TS optimization + EulerPC IRC per reactive segment. |
|
|
Run vibrational analysis ( |
|
|
Run single-point DFT on R / TS / P. |
|
|
Surplus-imaginary-mode flattening in |
|
|
Override |
Default |
|
Custom tsopt subdirectory. |
None |
|
Base directory override for freq outputs. |
None |
|
Maximum modes to write. |
Default |
|
Mode animation amplitude (Å). |
Default |
|
Frames per mode animation. |
Default |
|
Mode sorting behavior. |
Default |
|
Thermochemistry temperature (K). |
Default |
|
Thermochemistry pressure (atm). |
Default |
|
Base directory override for DFT outputs. |
None |
|
Functional / basis pair. |
Default |
|
Maximum SCF iterations. |
Default |
|
SCF convergence tolerance. |
Default |
|
PySCF grid level. |
Default |
|
DFT engine (GPU or CPU PySCF). |
None |
YAML configuration¶
all supports layered YAML — --config FILE for base settings, with the precedence defaults < config < CLI < override-yaml. The effective YAML is forwarded to downstream subcommands, and each tool reads the sections described in its own documentation:
Subcommand |
YAML sections |
|---|---|
|
|
|
|
|
|
|
|
|
# Minimal example
calc:
charge: 0
spin: 1
mlmm:
real_parm7: real.parm7
model_pdb: ml_region.pdb
backend: uma # uma | orb | mace | aimnet2
embedcharge: false # xTB point-charge embedding correction
uma_model: uma-s-1p1 # uma-s-1p1 | uma-m-1p1
hessian_calc_mode: Analytical # recommended when VRAM permits
gs:
max_nodes: 12
climb: true
dft:
grid_level: 6
Full schema: YAML Reference.
Notes¶
Input format depends on extraction:
Extraction enabled (
-c/--center): inputs must be PDB so residues can be located.Extraction skipped: inputs may be PDB / XYZ.
Multi-structure runs require ≥ 2 structures.
Charge is resolved in order of priority — -q/--charge (explicit CLI override) → pocket extraction (when -c is provided, summing amino acids + ions + --ligand-charge) → -l, --ligand-charge fallback (when extraction is skipped) → default (unresolved charge is an error). Spin resolution: --multiplicity (CLI) → default (1). Always provide --ligand-charge for non-standard substrates so the correct net charge propagates downstream. The first-model net ML-region charge is cast to the nearest integer, with a console note if rounding occurs.
See Also¶
extract (called internally by all) · mm_parm (called internally by all) · path-search · tsopt · freq · dft · trj2fig · Common Error Recipes (symptom-first failure routing) · Troubleshooting (common errors and fixes) · YAML Reference · Glossary.