scan2d

Perform a two-distance (d1, d2) grid scan with harmonic restraints and ML/MM relaxations on a layered enzyme PDB. Use it to map a 2D potential energy surface across two reactive distances (e.g., bond-forming and bond-breaking) to locate saddle points and bifurcation features that a 1D scan would miss. mlmm scan2d constructs linear grids for two bond distances using --max-step-size, relaxes each grid point with the appropriate restraints active, and records unbiased ML/MM energies for visualization. Pass -s/--scan-lists a YAML/JSON spec file (recommended) or an inline Python literal; both forms accept exactly two scan axes. The 3D scan2d_landscape.html includes a bottom contour projection.

Examples

Command form:

mlmm scan2d -i INPUT.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q CHARGE [-m MULT] \
 [-s scan2d.yaml | -s "[(I1,J1,LOW1,HIGH1),(I2,J2,LOW2,HIGH2)]"] \
 [--one-based|--zero-based] [--max-step-size FLOAT] [--bias-k FLOAT] \
 [--freeze-atoms "1,3,5"] [--relax-max-cycles INT] [--thresh PRESET] \
 [--dump/--no-dump] [--out-dir DIR] \
 [--preopt/--no-preopt] [--baseline {min|first}] [--zmin FLOAT] [--zmax FLOAT]

Recommended: YAML/JSON spec.

# Recommended: YAML/JSON spec
cat > scan2d.yaml << 'YAML'
one_based: true
pairs:
 - [12, 45, 1.30, 3.10]
 - [10, 55, 1.20, 3.20]
YAML
mlmm scan2d -i input.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q 0 -s scan2d.yaml --print-parsed

Alternative: inline Python literal.

# Alternative: inline Python literal
mlmm scan2d -i input.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q 0 -s "[(12,45,1.30,3.10),(10,55,1.20,3.20)]"

LBFGS scan with TRJ dumps and fixed color scale for the contour plot.

# LBFGS scan with TRJ dumps and fixed color scale for the contour plot
mlmm scan2d -i input.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q 0 -s "[(12,45,1.30,3.10),(10,55,1.20,3.20)]" \
 --max-step-size 0.20 --dump -o ./result_scan2d/ --preopt --baseline min \
 --zmin 0.0 --zmax 40.0

Add --print-parsed to validate the parsed scan spec and exit without running the GPU calculation.

Workflow

  1. Input & preoptimization – Load the enzyme PDB, resolve charge/spin, build the ML/MM calculator (MLIP backend + hessian_ff; backend selected via -b/--backend, default uma), and optionally run an unbiased pre-optimization when --preopt. When --embedcharge is enabled, xTB point-charge embedding is applied for MM-to-ML environmental corrections.

  2. Grid construction – Parse targets from -s/--scan-lists (YAML/JSON spec file or inline literal) into two quadruples, normalize indices (1-based by default or PDB atom selectors like "TYR,285,CA"). Build linear grids with ceil(|high - low| / h) + 1 points where h = --max-step-size.

  3. Outer loop (d1) – For each d1 value, relax the system with only the d1 restraint active.

  4. Inner loop (d2) – For each d2 value at the current d1, relax with both restraints active starting from the nearest previously converged structure.

  5. Energy evaluation – At each (i, j) pair, evaluate the ML/MM energy without bias and record to surface.csv.

  6. Visualization – Write scan2d_map.png (2D contour) and scan2d_landscape.html (3D surface). Use --zmin/--zmax to clamp the color scale. Baselines: --baseline min zeroes the minimum energy; --baseline first zeroes the (i=0, j=0) grid point.

Outputs

Check surface.csv (the PES grid), scan2d_map.png (2D contour), and scan2d_landscape.html (3D landscape) first; per-point geometries land under grid/ (the i### / j### filename tags are integer hundredths of an ångström, not step indices).

out_dir/ (default: ./result_scan2d/)
├── surface.csv # PES grid: i, j, d1_A, d2_A, energy_hartree, bias_converged, is_preopt, energy_kcal, d1_label, d2_label
├── scan2d_map.png # 2D contour map
├── scan2d_landscape.html # 3D surface visualization (Plotly)
├── grid/
│ ├── point_i###_j###.xyz # Relaxed geometry for every (i, j) pair
│ ├── point_i###_j###.pdb # PDB companion (when input is PDB)
│ ├── preopt_i###_j###.xyz # reference / pre-optimized structure (written whenever reference distances are finite)
│ └── inner_path_d1_###_trj.xyz # Inner d2 trajectory per d1 slice (when --dump)
└── (stdout) # Progress and energy summaries

CLI options

The full flag list is in the generated command reference; do not hand-duplicate it. The table below covers the options that need explanation.

Option

Description

Default

-i, --input PATH

Input enzyme complex PDB (required).

Required

--parm PATH

Amber parm7 topology for the enzyme (required).

Required

--model-pdb PATH

PDB defining the ML region. Optional when --detect-layer is enabled.

None

--model-indices TEXT

Comma-separated atom indices for the ML region (ranges allowed).

None

--model-indices-one-based / --model-indices-zero-based

Interpret --model-indices as 1-based or 0-based.

True (1-based)

--detect-layer / --no-detect-layer

Detect ML/MM layers from input PDB B-factors.

True

-q, --charge INT

ML-region net charge.

None (required unless -l is given)

-l, --ligand-charge TEXT

Per-resname charge mapping (e.g., GPP:-3,SAM:1). Derives total charge when -q is omitted.

None

-m, --multiplicity INT

Spin multiplicity (2S+1).

1

--freeze-atoms TEXT

Comma-separated 1-based indices to freeze.

None

--hess-cutoff FLOAT

Distance cutoff (Å) from ML region for MM atoms to include in Hessian calculation. Can be combined with --detect-layer.

None

--movable-cutoff FLOAT

Distance cutoff (Å) from ML region for movable MM atoms. Providing this disables --detect-layer.

None

-s, --scan-lists TEXT

Scan targets: a YAML/JSON spec file path (auto-detected, with pairs containing 2 quadruples) or an inline Python literal "[(i1,j1,low1,high1),(i2,j2,low2,high2)]". Indices can be integers or PDB atom selectors.

Required

--one-based / --zero-based

Interpret (i,j) indices in -s/--scan-lists as 1-based or 0-based.

True (1-based)

--print-parsed/--no-print-parsed

Print parsed pair tuples after -s/--scan-lists resolution.

False

--max-step-size FLOAT

Maximum distance increment per step (Å). Determines grid density.

0.20

--bias-k FLOAT

Harmonic well strength k (eV/Ų).

300.0

--relax-max-cycles INT

Maximum LBFGS cycles per biased relaxation.

10000

--dump/--no-dump

Write inner d2 scan TRJs per d1 slice.

False

-o, --out-dir TEXT

Base output directory.

./result_scan2d/

--thresh TEXT

Convergence preset (gau_loose|gau|gau_tight|gau_vtight|baker|never).

baker

--config FILE

Base YAML configuration file (applied first).

None

--ref-pdb FILE

Reference PDB topology when --input is XYZ.

None

--preopt/--no-preopt

Run an unbiased pre-optimization before scanning.

False

--baseline {min,first}

Reference for relative energy (kcal/mol).

min

--zmin FLOAT

Lower bound of the contour color scale (kcal/mol).

Autoscaled

--zmax FLOAT

Upper bound of the contour color scale (kcal/mol).

Autoscaled

-b, --backend CHOICE

MLIP backend for the ML region: uma, orb, mace, aimnet2.

uma

--embedcharge/--no-embedcharge

Enable xTB point-charge embedding correction for MM-to-ML environmental effects (experimental).

False

--embedcharge-cutoff FLOAT

Cutoff radius (Å) for embed-charge MM atoms.

12.0

--cmap/--no-cmap

Enable CMAP (backbone cross-map dihedral correction) in model parm7. Default: disabled (consistent with Gaussian ONIOM).

--no-cmap

--mm-backend [hessian_ff|openmm]

MM backend (analytical Hessian vs OpenMM finite-difference).

hessian_ff

--link-atom-method [scaled|fixed]

Link-atom placement: scaled ($g$-factor) or fixed 1.09/1.01 Å.

scaled

--out-json/--no-out-json

Write machine-readable result.json to out_dir.

False

--convert-files/--no-convert-files

Toggle XYZ/TRJ to PDB companions when a PDB template is available.

True

Scan spec formats

Inline literal format

When -s/--scan-lists receives a value that is not a file path, it is treated as a single Python literal string. Shell quoting matters.

The literal is a Python list of exactly two quadruples (atom1, atom2, low_A, high_A):

-s '[(atom1, atom2, low_A, high_A), (atom3, atom4, low_A, high_A)]'
  • Wrap the entire literal in single quotes so the shell does not interpret parentheses or spaces.

  • Each quadruple defines one scan axis: the distance between atom1atom2 is scanned from low_A to high_A.

  • Unlike scan, only one literal is accepted (no multi-stage support).

Atoms can be given as integer indices or PDB selector strings:

Method

Example

Notes

Integer index

(1, 5, 1.30, 3.10)

1-based by default (--one-based)

PDB selector

("TYR,285,CA", "MMT,309,C10", 1.30, 3.10)

Residue name, residue number, atom name

PDB selector tokens can be separated by any of: comma ,, space, slash /, backtick `, or backslash \. Token order is flexible.

# All of these specify the same atom:
"TYR,285,CA"
"TYR 285 CA"
"TYR/285/CA"
"285,TYR,CA" # order is flexible

Quoting rules:

# Correct: single-quote the list, double-quote selector strings inside
-s '[("TYR,285,CA","MMT,309,C10",1.30,3.10),("TYR,285,CB","MMT,309,C11",1.20,3.20)]'

# Correct: integer indices need no inner quotes
-s '[(1, 5, 1.30, 3.10), (2, 8, 1.20, 3.20)]'

# Avoid: double-quoting the outer literal requires escaping inner quotes
-s "[(\"TYR,285,CA\",\"MMT,309,C10\",1.30,3.10),...]"

YAML configuration

geom:
 coord_type: cart
 freeze_atoms: []
calc:
 charge: 0
 spin: 1
mlmm:
 real_parm7: real.parm7
 model_pdb: ml_region.pdb
opt:
 thresh: baker
 max_cycles: 10000
 dump: false
 out_dir: ./result_scan2d/
lbfgs:
 max_step: 0.3
 out_dir: ./result_scan2d/
bias:
 k: 300.0

Full schema (every key and default): YAML Reference.

See Also