mlmm-toolkit Documentation¶

Version: v0.2.4

mlmm-toolkit is a Python CLI toolkit for automated enzymatic reaction-path modeling using ML/MM (machine-learning / molecular mechanics) methods.

Quick Start by Goal¶

Objectives	Command	Guide
First run (end-to-end)	`mlmm all`	Quickstart: all
Single-structure staged scan (`-s`)	`mlmm scan`	Quickstart: scan with spec
TS validation (`tsopt` -> `freq`)	`mlmm tsopt`, `mlmm freq`	Quickstart: tsopt -> freq
Run complete reaction path search from PDB	`mlmm all`	all.md
View current configuration	`mlmm opt --show-config`	YAML Reference
Extract QM region from protein-ligand complex	`mlmm extract`	extract.md
Build MM topology (parm7/rst7)	`mlmm mm-parm`	mm_parm.md
Define ML/MM layers	`mlmm define-layer`	define_layer.md
Optimize a single structure	`mlmm opt`	opt.md
Find and optimize a transition state	`mlmm tsopt`	tsopt.md
Search for minimum energy path	`mlmm path-search`	path_search.md
Run IRC from a transition state	`mlmm irc`	irc.md
Visualize energy profile	`mlmm trj2fig`	trj2fig.md
Export to Gaussian ONIOM / ORCA QM/MM	`mlmm oniom-export --mode g16\|orca`	oniom_export.md
Rebuild XYZ/layered PDB from ONIOM input	`mlmm oniom-import`	oniom_import.md
Draw state energy diagram from numeric values	`mlmm energy-diagram`	energy_diagram.md
Follow worked tutorials	–	Tutorial
Diagnose failures by symptom	–	Common Error Recipes
Understand the big picture (concepts & terms)	–	Concepts & Workflow
Resolve common errors	–	Troubleshooting
Look up abbreviations and terms	–	Glossary

Documentation Guide¶

Topic	Page
Installation & first run	Getting Started
Key terms & workflow overview	Concepts & Workflow
Symptom-first failure routing	Common Error Recipes
Common errors & fixes	Troubleshooting
CLI conventions & input requirements	CLI Conventions

CLI Subcommands¶

Main Workflow¶

Subcommand	Description
`all`	End-to-end workflow: extraction -> MM parm -> MEP -> TS optimization -> IRC -> freq -> DFT
`init`	(Removed) Previously generated a starter YAML template

Structure Preparation¶

Subcommand	Description
`extract`	Extract active-site pocket (cluster model) from protein-ligand complex
`add-elem-info`	Repair PDB element columns (77-78)
`mm-parm`	Build AMBER topology (parm7/rst7) with tleap + GAFF2
`define-layer`	Define 3-layer ML/MM regions via B-factor annotation

Geometry Optimization¶

Subcommand	Description
`opt`	Single-structure geometry optimization (L-BFGS / RFO)
`tsopt`	Transition state optimization (Dimer / RS-I-RFO)

Path Search & Optimization¶

Subcommand	Description
`path-opt`	MEP optimization via GSM or DMF (two structures)
`path-search`	Recursive MEP search with automatic refinement (2+ structures)

Scans¶

Subcommand	Description
`scan`	1D bond-length driven scan with restraints
`scan2d`	2D distance grid scan
`scan3d`	3D distance grid scan

Analysis & Post-processing¶

Subcommand	Description
`irc`	Intrinsic Reaction Coordinate calculation
`freq`	Vibrational frequency analysis & thermochemistry
`dft`	Single-point DFT calculations (GPU4PySCF / PySCF)
`trj2fig`	Plot energy profiles from XYZ trajectories
`energy-diagram`	Build an energy diagram from numeric input values

Export¶

Subcommand	Description
`oniom-export`	Export to Gaussian ONIOM / ORCA QM/MM (`–mode g16
`oniom-import`	Import Gaussian/ORCA ONIOM input and reconstruct XYZ + layered PDB

Configuration & Reference¶

Topic	Page
CLI command reference	Command Reference
YAML schema	YAML Schema
YAML configuration options	YAML Reference
ML/MM calculator architecture	ML/MM Calculator
Terminology	Glossary

System Requirements¶

Hardware¶

OS: Linux (Ubuntu 20.04+ or CentOS 8+ tested)
GPU: CUDA 12.x compatible
VRAM: Minimum 8 GB (16 GB+ recommended for 1000+ atoms)
RAM: 16 GB+ recommended

Software¶

Python >= 3.11
PyTorch with CUDA support
CUDA 12.x toolkit
AmberTools (for mm-parm)

Quick Examples¶

Basic ML/MM MEP search¶

mlmm -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3'

Full workflow with TS optimization¶

mlmm -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
 --tsopt --thermo --dft

Single-structure scan mode¶

mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q 0 -s scan.yaml --print-parsed

TS-only optimization¶

mlmm -i TS_candidate.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
 --tsopt

Key Concepts¶

ML/MM 3-Layer System¶

mlmm uses a 3-layer partitioning scheme encoded via PDB B-factors:

ML region (B=0.0): Treated with the selected MLIP backend (default: UMA)
Movable-MM (B=10.0): MM atoms that move during optimization
Frozen (B=20.0): Fixed MM atoms

Hessian-target MM atoms are selected by calculator options (hess_cutoff / explicit lists), not by a dedicated B-factor layer.

Charge and spin¶

Use --ligand-charge to specify unknown residue charges: 'SAM:1,GPP:-3'
Use -q/--charge to set the ML-region total charge
Spin multiplicity is set with -m/--multiplicity (default 1)

Boolean options¶

Boolean CLI options use toggle form (--flag / --no-flag):

--tsopt --thermo --no-dft

YAML configuration¶

See the YAML Reference for all options.

Output Structure¶

Typical mlmm all output:

result_all/
├── ml_region.pdb # ML-region definition
├── summary.log # Human-readable summary
├── summary.yaml # Machine-readable summary
├── pockets/ # Extracted cluster models
├── mm_parm/ # AMBER topology files
├── scan/ # (Optional) scan results
├── path_search/ # MEP trajectories and diagrams
│ ├── mep_trj.xyz # MEP trajectory
│ ├── mep.pdb # MEP in PDB format
│ └── seg_*/ # Per-segment details
└── path_search/post_seg_*/ # Post-processing outputs
 ├── tsopt/ # TS optimization results
 ├── irc/ # IRC trajectories
 ├── freq/ # Vibrational modes
 └── dft/ # DFT results

Citation¶

If you use this software in your research, please cite:

[1] Ohmura, T., Inoue, S., Terada, T. (2025). ML/MM toolkit – Towards Accelerated Mechanistic Investigation of Enzymatic Reactions. ChemRxiv. https://doi.org/10.26434/chemrxiv-2025-jft1k

License¶

mlmm-toolkit is distributed under the GNU General Public License version 3 (GPL-3.0).

Getting Help¶

# General help
mlmm --help

# Command help
mlmm <subcommand> --help

Note: This documentation is under active development. Some sections may be incomplete or subject to change.