scan¶
mlmm scan drives a reaction coordinate on a layered enzyme PDB to generate a coarse reaction trajectory from a single starting structure, providing intermediate/product candidates for downstream MEP refinement. It performs a staged, bond-length-driven scan with the ML/MM calculator (mlmm.backends.mlmm_calc.mlmm), driving one or more interatomic distances toward target values under harmonic restraints. At each step the temporary targets are updated, restraint wells are applied, and the structure is relaxed with LBFGS. The ML/MM calculator couples an MLIP backend (selected via -b/--backend; default: UMA) and mlmm-toolkit’s MM force field. Use -s/--scan-lists to define targets as a YAML/JSON spec file (recommended) or as inline Python literals.
Examples¶
Command form:
mlmm scan -i INPUT.pdb --parm real.parm7 --model-pdb ml_region.pdb \
-q CHARGE [-m MULT] \
[-s scan.yaml | -s "[(I,J,TARGET_ANG)]"] [options]
Spec-file scan (add --print-parsed to validate the parsed scan spec and exit without running the GPU calculation):
mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
-q 0 -s scan.yaml -o ./result_scan
Inline Python literal:
# Inline Python literal
mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
-q 0 -s "[(12,45,2.20)]"
Dump trajectories for stage-by-stage inspection:
# Dump trajectories for stage-by-stage inspection
mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
-q 0 -s scan.yaml --dump -o ./result_scan_dump
Workflow¶
Load the structure through
geom_loader, resolving charge/spin from the CLI or defaults. Provide--parm,--model-pdb,-q/--charge, and optionally-m/--multiplicityfor the ML/MM calculator.Optionally run an unbiased preoptimization (
--preopt) before any biasing so the starting point is relaxed.Parse stage targets from
-s/--scan-lists(YAML/JSON spec file or inline literal), then normalize the(i, j)indices (1-based by default). When the input is a PDB, each entry may be either an integer index or an atom selector string like'TYR,285,CA'; selector fields can be separated by spaces, commas, slashes, backticks, or backslashes and may be in any order.Compute the per-bond displacement and split into steps:
For scan tuples
[(i, j, target_A)], compute the per-pair displacementdelta_k = target_k - current_distance_A_k.With
--max-step-size = h, the stage takesN = ceil(max(|delta_k|) / h)biased relaxations.Each pair’s incremental change is
step_k = delta_k / N(Å). At steps, the temporary target isr_k(s) = r_k(0) + s * step_k.
March through all steps, applying the harmonic wells
E_bias = sum 1/2 * k * (|r_i - r_j| - target_k)^2and minimizing with LBFGS.kcomes from--bias-k(eV/Ų) and is converted once to Hartree/Bohr^2. Coordinates are stored in Bohr for PySisyphus and converted internally for reporting.After the last step of each stage, optionally run an unbiased relaxation (
--endopt) before reporting covalent bond changes and writing theresult.*files.Repeat for every stage; optional trajectories are dumped only when
--dumpisTrue.
Outputs¶
Each stage writes its final geometry and biased-step trajectory under stage_XX/, with a combined trajectory at the root. Check the per-stage result.pdb (or result.xyz) and the always-generated scan_trj.xyz / scan.pdb first.
out_dir/ (default: ./result_scan/)
├─ scan_trj.xyz # Combined trajectory across all stages (always written)
├─ scan.pdb # Combined PDB companion (PDB inputs only; always written)
├─ preopt/ # Present when --preopt is True
│ ├─ result.xyz
│ └─ result.pdb # Only for PDB inputs
└─ stage_XX/ # One folder per stage (k = 01..K)
├─ result.xyz # Final (possibly endopt) geometry
├─ result.pdb # If input was PDB
├─ scan_trj.xyz # Per-stage biased step frames (always written)
└─ scan.pdb # PDB version of scan_trj.xyz (PDB inputs only; always written)
CLI options¶
The full flag list is in the generated command reference; the table below covers the options that need explanation.
Option |
Description |
Default |
|---|---|---|
|
Input PDB (or XYZ with |
Required |
|
Amber prmtop for the full REAL system. |
Required |
|
PDB defining the ML region (atom IDs). Optional when |
None |
|
Comma-separated ML-region atom indices (ranges allowed). |
None |
|
Interpret |
|
|
Detect ML/MM layers from input PDB B-factors. |
|
|
Net ML-region charge. |
None (required unless |
|
Per-resname charge mapping (e.g., |
None |
|
Spin multiplicity (2S+1). |
|
|
Comma-separated 1-based atom indices to freeze (merged with YAML |
None |
|
Distance cutoff (Å) from ML region for MM atoms to include in Hessian calculation. Can be combined with |
None |
|
Movable-MM distance cutoff (Å); providing this disables |
None |
|
Scan targets: a YAML/JSON spec file path (auto-detected) or inline Python literal(s) with |
Required |
|
Interpret atom indices as 1-based (default) or 0-based. |
|
|
Print parsed stage tuples after |
|
|
Maximum change in any scanned bond per step (Å). Controls the number of biased relaxation steps. |
|
|
Harmonic bias strength |
|
|
Compatibility option for |
None |
|
Maximum LBFGS cycles per biased step and per pre/end optimization stage. |
|
|
Compatibility alias of |
None |
|
Run an unbiased optimization before scanning. |
|
|
Run an unbiased optimization after each stage. |
|
|
Dump per-step optimizer trajectory files. Note: |
|
|
Output directory root. |
|
|
Convergence preset ( |
None (inherits |
|
Base YAML configuration file (applied first). |
None |
|
Reference PDB topology when |
None |
|
MLIP backend for the ML region: |
|
|
Enable xTB point-charge embedding correction for MM-to-ML environmental effects (experimental). |
|
|
Cutoff radius (Å) for embed-charge MM atoms. |
|
|
Enable CMAP (backbone cross-map dihedral correction) in model parm7. Default: disabled (consistent with Gaussian ONIOM). |
|
|
MM backend (analytical Hessian vs OpenMM finite-difference). |
|
|
Link-atom placement: scaled ($g$-factor) or fixed 1.09/1.01 Å. |
|
|
Write machine-readable |
|
|
Validate options and print the execution plan without running the scan. Shown in |
|
|
Toggle XYZ/TRJ to PDB companions when a PDB template is available. |
|
Scan target syntax¶
YAML/JSON spec format (recommended)
-s/--scan-lists auto-detects YAML/JSON files. Pass a file path to use the spec format:
one_based: true # optional; defaults to CLI --one-based/--zero-based
stages:
- [[12, 45, 2.20]]
- [[10, 55, 1.35], [23, 34, 1.80]]
stagesis required.Each stage is a list of
(i, j, target_A)triples.Indices may be integers or PDB selectors (for PDB input), same as inline literals.
Inline literal format
When -s/--scan-lists receives a value that is not a file path, it is treated as a Python literal string evaluated by the CLI. Shell quoting matters.
Each literal is a Python list of triples (atom1, atom2, target_A):
-s '[(atom1, atom2, target_A),...]'
Wrap the entire literal in single quotes so the shell does not interpret parentheses or spaces.
Each triple drives the distance between
atom1–atom2towardtarget_A.One literal = one stage. For multiple stages, pass multiple literals after a single
-s/--scan-listsflag (do not repeat the flag).
Atoms can be given as integer indices or PDB selector strings:
Method |
Example |
Notes |
|---|---|---|
Integer index |
|
1-based by default ( |
PDB selector |
|
Residue name, residue number, atom name |
PDB selector tokens can be separated by any of: comma ,, space, slash /, backtick `, or backslash \. Token order is flexible.
# All of these specify the same atom:
"TYR,285,CA"
"TYR 285 CA"
"TYR/285/CA"
"285,TYR,CA" # order is flexible
Quoting rules:
# Correct: single-quote the list, double-quote selector strings inside
-s '[("TYR,285,CA","MMT,309,C10",1.35)]'
# Correct: integer indices need no inner quotes
-s '[(1, 5, 2.0)]'
# Avoid: double-quoting the outer literal requires escaping inner quotes
-s "[(\"TYR,285,CA\",\"MMT,309,C10\",1.35)]"
Pass multiple literals after a single -s/--scan-lists flag. Each literal becomes one stage:
# Stage 1: drive one bond to 1.35 Å
# Stage 2: drive two bonds simultaneously
-s \
'[("TYR,285,CA","MMT,309,C10",1.35)]' \
'[("TYR,285,CA","MMT,309,C10",2.20),("TYR,285,CB","MMT,309,C11",1.80)]'
Stages run sequentially; each starts from the previous stage’s relaxed result. Do not repeat the -s/--scan-lists flag – supply all stage literals after a single flag.
Concerted versus staged scans
The number of literals you pass decides whether the coordinates are driven together (concerted) or in sequence (staged):
Form |
How to invoke |
Meaning |
Mechanism needed up front? |
|---|---|---|---|
Concerted |
a single literal containing several |
all coordinates driven together within one stage |
No |
Staged |
several literals, one per stage |
each stage is its own restrained relaxation, written to |
Yes — define the mechanism per stage |
# Concerted: one stage, two distances driven together
mlmm scan -i r.pdb --parm enzyme.parm7 -l 'LIG:Q' \
-s '[(1,5,1.40),(7,9,1.60)]' -o result_concerted
# Staged: two sequential stages
mlmm scan -i r.pdb --parm enzyme.parm7 -l 'LIG:Q' \
-s '[(1,5,1.40)]' \
'[(7,9,0.95)]' -o result_staged
A concerted scan needs no mechanism breakdown — path-search performs the multistep auto-segmentation for you. A staged scan needs the mechanism defined up front, but when the mechanism is known, staged scans give cleaner per-step control and are generally preferred. (A four-tuple expands into two stages for a bidirectional scan.)
Bidirectional scan (4-tuple)
Instead of a 3-tuple (i, j, target), you can pass a 4-tuple (i, j, start, end) to scan in both directions from the current geometry. The CLI automatically expands each 4-tuple into two stages:
Pass 1: Drive
i–jfrom the current distance towardstart.Pass 2: Restore the initial geometry and drive
i–jtowardend.
The concatenated trajectory is assembled as start → initial → end, giving a continuous path through the starting structure.
# Bidirectional scan: drive bond 12--45 from current geometry
# toward 1.35 Å (pass 1) and toward 2.50 Å (pass 2)
mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
-q 0 -s '[(12, 45, 1.35, 2.50)]'
This is equivalent to two manual stages with a geometry reset between them, but avoids the need to script it yourself. Mixed 3-tuples and 4-tuples are accepted in the same literal.
Reading the barrier direction¶
The barrier you read depends on which endpoint the scan started from. If the scan (or the path it seeds) starts at the product, the raw reported barrier is the reverse direction.
Quantity |
Formula |
|---|---|
Forward barrier |
|
Reverse barrier (the raw product-start number) |
|
This is a read-time interpretation, not a CLI flag. Always confirm which endpoint is the reactant versus the product by reading segments/seg_NN/{reactant,product}.pdb from the IRC, rather than trusting the scan direction. For a product-start campaign, the forward barrier you want is E(TS) − E(reactant), not the number printed against the product start.
YAML configuration¶
The scan reads the shared geom (coord_type, freeze_atoms), calc / mlmm (ML/MM calculator setup), and opt / lbfgs (optimizer) sections, plus bias (k, harmonic strength in eV/Ų) and a bond section for MLIP-based bond-change detection.
Full schema (every key and default): YAML Reference.
See Also¶
Common Error Recipes — Symptom-first failure routing
Troubleshooting — Detailed troubleshooting guide
scan2d — 2D distance grid scan
scan3d — 3D distance grid scan
opt — Single-structure geometry optimization
all — End-to-end workflow with
--scan-listsfor single-structure inputspath-search — MEP search using scan endpoints as intermediates