mlmm-toolkit Documentation

Version: v0.2.4

mlmm-toolkit is a Python CLI toolkit for automated enzymatic reaction-path modeling using ML/MM (machine-learning / molecular mechanics) methods.


Quick Start by Goal

Objectives

Command

Guide

First run (end-to-end)

mlmm all

Quickstart: all

Single-structure staged scan (-s)

mlmm scan

Quickstart: scan with spec

TS validation (tsopt -> freq)

mlmm tsopt, mlmm freq

Quickstart: tsopt -> freq

Run complete reaction path search from PDB

mlmm all

all.md

View current configuration

mlmm opt --show-config

YAML Reference

Extract QM region from protein-ligand complex

mlmm extract

extract.md

Build MM topology (parm7/rst7)

mlmm mm-parm

mm_parm.md

Define ML/MM layers

mlmm define-layer

define_layer.md

Optimize a single structure

mlmm opt

opt.md

Find and optimize a transition state

mlmm tsopt

tsopt.md

Search for minimum energy path

mlmm path-search

path_search.md

Run IRC from a transition state

mlmm irc

irc.md

Visualize energy profile

mlmm trj2fig

trj2fig.md

Export to Gaussian ONIOM / ORCA QM/MM

mlmm oniom-export --mode g16|orca

oniom_export.md

Rebuild XYZ/layered PDB from ONIOM input

mlmm oniom-import

oniom_import.md

Draw state energy diagram from numeric values

mlmm energy-diagram

energy_diagram.md

Follow worked tutorials

Tutorial

Diagnose failures by symptom

Common Error Recipes

Understand the big picture (concepts & terms)

Concepts & Workflow

Resolve common errors

Troubleshooting

Look up abbreviations and terms

Glossary


Documentation Guide

Topic

Page

Installation & first run

Getting Started

Key terms & workflow overview

Concepts & Workflow

Symptom-first failure routing

Common Error Recipes

Common errors & fixes

Troubleshooting

CLI conventions & input requirements

CLI Conventions


CLI Subcommands

Main Workflow

Subcommand

Description

all

End-to-end workflow: extraction -> MM parm -> MEP -> TS optimization -> IRC -> freq -> DFT

init

(Removed) Previously generated a starter YAML template

Structure Preparation

Subcommand

Description

extract

Extract active-site pocket (cluster model) from protein-ligand complex

add-elem-info

Repair PDB element columns (77-78)

mm-parm

Build AMBER topology (parm7/rst7) with tleap + GAFF2

define-layer

Define 3-layer ML/MM regions via B-factor annotation

Geometry Optimization

Subcommand

Description

opt

Single-structure geometry optimization (L-BFGS / RFO)

tsopt

Transition state optimization (Dimer / RS-I-RFO)

Path Search & Optimization

Subcommand

Description

path-opt

MEP optimization via GSM or DMF (two structures)

path-search

Recursive MEP search with automatic refinement (2+ structures)

Scans

Subcommand

Description

scan

1D bond-length driven scan with restraints

scan2d

2D distance grid scan

scan3d

3D distance grid scan

Analysis & Post-processing

Subcommand

Description

irc

Intrinsic Reaction Coordinate calculation

freq

Vibrational frequency analysis & thermochemistry

dft

Single-point DFT calculations (GPU4PySCF / PySCF)

trj2fig

Plot energy profiles from XYZ trajectories

energy-diagram

Build an energy diagram from numeric input values

Export

Subcommand

Description

oniom-export

Export to Gaussian ONIOM / ORCA QM/MM (`–mode g16

oniom-import

Import Gaussian/ORCA ONIOM input and reconstruct XYZ + layered PDB


Configuration & Reference

Topic

Page

CLI command reference

Command Reference

YAML schema

YAML Schema

YAML configuration options

YAML Reference

ML/MM calculator architecture

ML/MM Calculator

Terminology

Glossary


System Requirements

Hardware

  • OS: Linux (Ubuntu 20.04+ or CentOS 8+ tested)

  • GPU: CUDA 12.x compatible

  • VRAM: Minimum 8 GB (16 GB+ recommended for 1000+ atoms)

  • RAM: 16 GB+ recommended

Software

  • Python >= 3.11

  • PyTorch with CUDA support

  • CUDA 12.x toolkit

  • AmberTools (for mm-parm)


Quick Examples

Full workflow with TS optimization

mlmm -i R.pdb P.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
 --tsopt --thermo --dft

Single-structure scan mode

mlmm scan -i pocket.pdb --parm real.parm7 --model-pdb ml_region.pdb \
 -q 0 -s scan.yaml --print-parsed

TS-only optimization

mlmm -i TS_candidate.pdb -c 'SAM,GPP' -l 'SAM:1,GPP:-3' \
 --tsopt

Key Concepts

ML/MM 3-Layer System

mlmm uses a 3-layer partitioning scheme encoded via PDB B-factors:

  • ML region (B=0.0): Treated with the selected MLIP backend (default: UMA)

  • Movable-MM (B=10.0): MM atoms that move during optimization

  • Frozen (B=20.0): Fixed MM atoms

Hessian-target MM atoms are selected by calculator options (hess_cutoff / explicit lists), not by a dedicated B-factor layer.

Charge and spin

  • Use --ligand-charge to specify unknown residue charges: 'SAM:1,GPP:-3'

  • Use -q/--charge to set the ML-region total charge

  • Spin multiplicity is set with -m/--multiplicity (default 1)

Boolean options

Boolean CLI options use toggle form (--flag / --no-flag):

--tsopt --thermo --no-dft

YAML configuration

See the YAML Reference for all options.


Output Structure

Typical mlmm all output:

result_all/
├── ml_region.pdb # ML-region definition
├── summary.log # Human-readable summary
├── summary.yaml # Machine-readable summary
├── pockets/ # Extracted cluster models
├── mm_parm/ # AMBER topology files
├── scan/ # (Optional) scan results
├── path_search/ # MEP trajectories and diagrams
│ ├── mep_trj.xyz # MEP trajectory
│ ├── mep.pdb # MEP in PDB format
│ └── seg_*/ # Per-segment details
└── path_search/post_seg_*/ # Post-processing outputs
 ├── tsopt/ # TS optimization results
 ├── irc/ # IRC trajectories
 ├── freq/ # Vibrational modes
 └── dft/ # DFT results

Citation

If you use this software in your research, please cite:

[1] Ohmura, T., Inoue, S., Terada, T. (2025). ML/MM toolkit – Towards Accelerated Mechanistic Investigation of Enzymatic Reactions. ChemRxiv. https://doi.org/10.26434/chemrxiv-2025-jft1k

License

mlmm-toolkit is distributed under the GNU General Public License version 3 (GPL-3.0).


Getting Help

# General help
mlmm --help

# Command help
mlmm <subcommand> --help

Note: This documentation is under active development. Some sections may be incomplete or subject to change.