all¶
Overview¶
Summary: End-to-end enzymatic reaction workflow – active-site extraction, ML/MM layer assignment, MM topology preparation, optional staged scan, MEP search (GSM) on full-system layered PDBs, with optional TS optimization, pseudo-IRC, thermochemistry, DFT, and DFT//MLIP diagrams.
mlmm all runs a one-shot pipeline that operates on full-system layered PDBs with ML/MM. It supports three modes:
Multi-structure ensemble – Provide two or more full PDBs in reaction order. The tool extracts the active-site region (for ML-region definition), builds MM topology, assigns ML/MM layers, runs GSM MEP search on the layered full-system PDBs, and optionally runs per-segment post-processing (TSOPT/freq/DFT).
Single-structure + staged scan – Provide one PDB plus
--scan-lists. The scan generates intermediate/product candidates that become MEP endpoints.One
--scan-listsliteral runs a single scan stage.Multiple stages are passed as multiple values after a single
--scan-listsflag.
TSOPT-only – Provide a single PDB, omit
--scan-lists, and set--tsopt. The tool runs TS optimization on the layered full-system PDB, performs pseudo-IRC, minimizes both ends, and builds energy diagrams.
Important
--tsopt produces TS candidates. all automatically runs IRC and freq for validation, but always inspect the results (imaginary mode + endpoint connectivity) before mechanistic interpretation.
Minimal example¶
mlmm all -i R.pdb P.pdb -c "SAM,GPP" -l "SAM:1,GPP:-3" --out-dir ./result_all
Output checklist¶
result_all/summary.logresult_all/summary.yamlresult_all/path_search/mep.pdb(orresult_all/path_search/seg_*/)
Common examples¶
Run full post-processing in one command.
mlmm all -i R.pdb P.pdb -c "SAM,GPP" -l "SAM:1,GPP:-3" \
--tsopt --thermo --dft --out-dir ./result_all
Single-structure staged scan route.
mlmm all -i A.pdb -c "308,309" --scan-lists "[(12,45,1.35)]" "[(10,55,2.20)]" \
--multiplicity 1 --out-dir ./result_scan_all
Validate parsing and plan only.
mlmm all -i R.pdb P.pdb -c "SAM,GPP" -l "SAM:1,GPP:-3" --dry-run
Use the ORB backend with xTB point-charge embedding.
mlmm all -i R.pdb P.pdb -c "SAM,GPP" -l "SAM:1,GPP:-3" \
--backend orb --embedcharge --out-dir ./result_all_orb
PDB companions are generated when templates are available, controlled by --convert-files/--no-convert-files (enabled by default).
Usage¶
mlmm all -i INPUT1 [INPUT2...] -c SUBSTRATE [options]
Run mlmm all --help for core options or mlmm all --help-advanced for the full option list.
Examples¶
# Minimal end-to-end run with explicit substrate and ligand charges (multi-structure)
mlmm all -i reactant.pdb product.pdb -c "GPP,MMT" -l "GPP:-3,MMT:-1"
# Full ensemble with an intermediate, residue-ID substrate spec, and full post-processing
mlmm all -i A.pdb B.pdb C.pdb -c "308,309" -l "-1" \
--multiplicity 1 --max-nodes 10 --max-cycles 100 --climb \
--opt-mode grad --no-dump --config params.yaml --preopt \
--out-dir result_all --tsopt --thermo --dft
# Single-structure + scan to build an ordered series
mlmm all -i A.pdb -c "308,309" --scan-lists "[(10,55,2.20),(23,34,1.80)]" \
--multiplicity 1 --out-dir result_scan_all --tsopt --thermo --dft
# Single-structure TSOPT-only mode (no path_search)
mlmm all -i A.pdb -c "GPP,MMT" -l "GPP:-3,MMT:-1" \
--tsopt --thermo --dft --out-dir result_tsopt_only
Workflow¶
Active-site extraction and ML-region definition (multi-structure union when multiple inputs)
Define the substrate (
-c/--center, by PDB, residue IDs, or residue names).Optionally provide
--ligand-chargeas a total number (distributed) or a mapping (e.g.,GPP:-3,MMT:-1).The extractor writes per-input pocket PDBs under
<out-dir>/pockets/. The first pocket is copied as<out-dir>/ml_region.pdbto define the ML region for all subsequent ML/MM calculations.The extractor’s first-model total pocket charge is used as the total charge in later steps, cast to the nearest integer with a console note if rounding occurs.
Additional extractor toggles:
--radius,--radius-het2het,--include-H2O/--no-include-H2O,--exclude-backbone/--no-exclude-backbone,--add-linkH/--no-add-linkH,--selected-resn,--verbose/--no-verbose.If
-c/--centeris omitted, extraction is skipped and full input structures are used directly.
ML/MM preparation (parm7 + layer assignment)
Run
mm_parmonce on the first full input PDB to build<out-dir>/mm_parm/<input_basename>.parm7/.rst7, automatically passed as--parm.Run
define-layeron each full-system PDB to assign 3-layer B-factors (ML=0.0, MovableMM=10.0, FrozenMM=20.0) based on the ML-region definition. The layered full-system PDBs are written under<out-dir>/layered/.Tune this stage with
--auto-mm-ff-set,--auto-mm-add-ter, and--auto-mm-keep-temp.
Optional staged scan (single-structure only)
If exactly one full input PDB is provided and
--scan-listsis given, the tool performs a staged, bond-length-driven scan on the layered full-system PDB using the ML/MM calculator.For each stage, the final relaxed structure (
stage_XX/result.pdb) is collected as an intermediate/product candidate.The ordered input series for the path search becomes:
[initial layered PDB, stage_01/result.pdb, stage_02/result.pdb,...].
MEP search on full-system layered PDBs
All MEP calculations run on full-system layered PDBs (with
--parmand--detect-layer), not on pockets.--refine-path(default): Runs recursivepath_searchwith kink-detection and refinement.--no-refine-path: Runs single-passpath-optGSM per adjacent pair, then concatenates trajectories, extracts HEI per segment, detects bond changes, and writessummary.yaml— enabling Stage 4 post-processing (TSOPT, thermo, DFT) on both modes.For multi-input runs, the original full PDBs are supplied as merge references automatically. In the scan-derived series (single-structure case), the single original full PDB is reused (repeated) as the reference template.
Summary and optional post-processing
Per-segment trajectories, full MEP trajectory, and a
summary.yamlare written under<out-dir>/path_search/.--tsopt: run TS optimization on each HEI, follow with EulerPC IRC, and emit segment energy diagrams.--thermo: Compute ML/MM thermochemistry on (R, TS, P) and add a Gibbs diagram.--dft: Do DFT single-point on (R, TS, P) and add a DFT diagram. With--thermo, also generate a DFT//MLIP Gibbs diagram.Shared overrides include
--opt-mode,--opt-mode-post(overrides TSOPT and post-IRC endpoint optimization modes),--flatten/--no-flatten,--hessian-calc-mode,--tsopt-max-cycles,--tsopt-out-dir,--freq-*,--dft-*.When you have ample VRAM available, setting
--hessian-calc-modetoAnalyticalis strongly recommended.
TSOPT-only mode (single input,
--tsopt, no--scan-lists)Skips steps (4)-(5) and runs
tsopton the layered full-system PDB, does a pseudo-IRC and minimizes both ends, builds ML/MM energy diagrams for R-TS-P, and optionally adds Gibbs, DFT, and DFT//MLIP diagrams.In this mode only, the IRC endpoint with higher energy is adopted as the reactant (R).
Charge and spin precedence¶
Charge resolution (highest to lowest priority):
Priority |
Source |
When Used |
|---|---|---|
1 |
|
Explicit CLI override |
2 |
Pocket extraction |
When |
3 |
|
Fallback when extraction is skipped |
4 |
Default |
None (unresolved charge is an error) |
Spin resolution: --multiplicity (CLI) -> default (1)
Tip: Always provide
--ligand-chargefor non-standard substrates to ensure correct charge propagation.
Input expectations¶
Extraction enabled (
-c/--center): inputs must be PDB files so residues can be located.Extraction skipped: inputs may be PDB/XYZ.
Multi-structure runs require at least 2 structures.
CLI options¶
Note: Default values shown are used when the option is not specified.
Input/Output Options¶
Option |
Description |
Default |
|---|---|---|
|
Two or more full PDBs in reaction order (single input allowed with |
Required |
|
Substrate specification (PDB path, residue IDs, or residue names). Omit to skip extraction. |
None |
|
Total charge or residue-specific mapping (e.g., |
None |
|
Force total system charge (highest priority override). |
None |
|
Top-level output directory. |
|
|
AMBER parm7 topology file for the full (real) system. When omitted, |
None |
|
Pre-built ML-region PDB. When provided, pocket extraction is skipped and this file defines the ML region directly. |
None |
|
Reference PDB for XYZ input. Required when the input is XYZ so that PDB metadata (residues, chains, B-factors) can be recovered. |
None |
|
Global toggle for XYZ/TRJ to PDB companions when templates are available. |
|
|
Save optimizer dumps. Always forwarded to |
|
|
Base YAML applied first. |
None |
|
Print resolved configuration before execution. |
|
|
Validate and print plan without running stages. Shown in |
|
Extraction Options¶
Option |
Description |
Default |
|---|---|---|
|
Pocket inclusion cutoff (Å). |
|
|
Independent hetero-hetero cutoff (Å). |
|
|
Include water molecules (HOH/WAT/TIP3/SOL). |
|
|
Remove backbone atoms on non-substrate amino acids. |
|
|
Add link hydrogens for severed bonds. |
|
|
Residues to force include. |
|
|
Enable INFO-level extractor logging. |
|
MM Preparation Options¶
Option |
Description |
Default |
|---|---|---|
|
Force field set for |
|
|
Control TER insertion around ligand/water/ion blocks. |
|
|
Keep the |
|
|
Spin multiplicity mapping forwarded to |
None |
MEP Search Options¶
Option |
Description |
Default |
|---|---|---|
|
Spin multiplicity (2S+1). |
|
|
Internal nodes for segment GSM. |
|
|
Max GSM macro-cycles. |
|
|
Enable TS refinement for segment GSM. |
|
|
Optimizer preset for scan/path-search and single optimizations ( |
|
|
Optimizer preset override for TSOPT/post-IRC endpoint optimizations ( |
|
|
Convergence preset ( |
|
|
Convergence preset for post-IRC endpoint optimizations. |
|
|
Pre-optimize endpoints before segmentation. |
|
|
If True, run recursive |
|
|
MLIP backend for the ML region: |
|
|
Enable xTB point-charge embedding correction for MM-to-ML environmental effects. |
|
|
Cutoff radius (Å) for embed-charge MM atoms. |
|
|
ML/MM Hessian mode ( |
|
|
Detect ML/MM layers from input PDB B-factors (B=0/10/20) in downstream tools. If disabled, downstream tools require |
|
TSOPT optimizer selection order: --opt-mode-post (if set) -> --opt-mode (only when explicitly provided) -> TSOPT default (hess -> heavy).
Scan Options (Single-Input Runs)¶
Option |
Description |
Default |
|---|---|---|
|
Staged scans: |
None |
|
Override the scan output directory. |
None |
|
Override scan indexing interpretation (True = 1-based, False = 0-based). |
None |
|
Maximum step size (Å). |
Default |
|
Harmonic bias strength (eV/Ų). |
Default |
|
Relaxation max cycles per step. |
Default |
|
Override scan pre-optimization toggle. |
None |
|
Override scan end-of-stage optimization. |
None |
Post-Processing Options¶
Option |
Description |
Default |
|---|---|---|
|
Run TS optimization + pseudo-IRC per reactive segment. |
|
|
Run vibrational analysis ( |
|
|
Run single-point DFT on R/TS/P. |
|
|
Enable extra-imaginary-mode flattening in |
|
|
Override |
Default |
|
Custom tsopt subdirectory. |
None |
Freq Overrides¶
Option |
Description |
Default |
|---|---|---|
|
Base directory override for freq outputs. |
None |
|
Maximum modes to write. |
Default |
|
Mode animation amplitude (Å). |
Default |
|
Frames per mode animation. |
Default |
|
Mode sorting behavior. |
Default |
|
Thermochemistry temperature (K). |
Default |
|
Thermochemistry pressure (atm). |
Default |
DFT Overrides¶
Option |
Description |
Default |
|---|---|---|
|
Base directory override for DFT outputs. |
None |
|
Functional/basis pair. |
Default |
|
Maximum SCF iterations. |
Default |
|
SCF convergence tolerance. |
Default |
|
PySCF grid level. |
Default |
Outputs¶
<out-dir>/
ml_region.pdb # ML-region definition (copy of the first pocket)
pockets/
pocket_<input1_basename>.pdb
pocket_<input2_basename>.pdb
...
mm_parm/
<input1_basename>.parm7 # Generated from the first full-enzyme input PDB
<input1_basename>.rst7
layered/ # Layered full-system PDBs (B-factor annotated)
scan/ # present only in single-structure+scan mode
stage_01/result.pdb
stage_02/result.pdb
...
summary.yaml # mirrored top-level summary (when path_search runs)
summary.log
mep_plot.png
energy_diagram_MEP.png
energy_diagram_UMA_all.png # aggregated post-processing diagrams (when enabled)
energy_diagram_G_UMA_all.png
energy_diagram_DFT_all.png
energy_diagram_G_DFT_plus_UMA_all.png
irc_plot_all.png
path_search/ # present when path_search is executed
mep_trj.xyz
mep.pdb
summary.yaml
summary.log
mep_plot.png
energy_diagram_MEP.png
post_seg_XX/ # when post-processing is enabled
ts/...
irc/...
freq/... # with --thermo
dft/... # with --dft
energy_diagram_UMA.png
energy_diagram_G_UMA.png
energy_diagram_DFT.png
energy_diagram_G_DFT_plus_UMA.png
tsopt_single/ # present only in single-structure TSOPT-only mode
ts/...
irc/...
structures/
reactant.pdb
ts.pdb
product.pdb
freq/... # with --thermo
dft/... # with --dft
energy_diagram_UMA.png
energy_diagram_G_UMA.png
energy_diagram_DFT.png
energy_diagram_G_DFT_plus_UMA.png
Reading summary.log¶
The log is organized into numbered sections:
[1] Global MEP overview – image/segment counts, MEP trajectory plot paths, and the aggregate MEP energy diagram.
[2] Segment-level MEP summary (MLIP path) – per-segment barriers, reaction energies, and bond-change summaries.
[3] Per-segment post-processing (TSOPT / Thermo / DFT) – per-segment TS imaginary frequency checks, IRC outputs, and energy tables.
[4] Energy diagrams (overview) – diagram tables for MEP/MLIP/Gibbs/DFT series plus an optional cross-method summary table.
[5] Output directory structure – a compact tree of generated files with inline annotations.
Reading summary.yaml¶
The YAML is a compact, machine-readable summary. Common top-level keys include:
out_dir,n_images,n_segments– run metadata and total counts.segments– list of per-segment entries withindex,tag,kind,barrier_kcal,delta_kcal, andbond_changes.energy_diagrams(optional) – diagram payloads withlabels,energies_kcal,energies_au,ylabel, andimagepaths.
YAML configuration¶
all supports layered YAML:
--config FILE: base settings.
defaults < config < CLI < override-yaml
The resulting effective YAML is forwarded to downstream subcommands. Each tool reads the sections described in its own documentation:
Subcommand |
YAML Sections |
|---|---|
|
|
|
|
|
|
|
|
|
Note: Applied after CLI values.
Minimal example:
calc:
charge: 0
spin: 1
mlmm:
real_parm7: real.parm7
model_pdb: ml_region.pdb
backend: uma # MLIP backend: uma | orb | mace | aimnet2
embedcharge: false # xTB point-charge embedding correction
uma_model: uma-s-1p1 # uma-s-1p1 | uma-m-1p1
ml_hessian_mode: Analytical # recommended when VRAM permits
gs:
max_nodes: 12
climb: true
dft:
grid_level: 6
For a complete reference of all YAML options, see YAML Configuration Reference.
See Also¶
extract – Standalone pocket extraction (called internally by
all)mm_parm – Build AMBER topology (called internally by
all)path-search – Standalone recursive MEP search
tsopt – Standalone TS optimization
freq – Vibrational analysis and thermochemistry
dft – Single-point DFT calculations
trj2fig – Plot energy profiles from trajectories
Common Error Recipes – Symptom-first failure routing
Troubleshooting – Common errors and fixes
YAML Reference – Complete YAML configuration options
Glossary – Definitions of MEP, TS, IRC, GSM