Architecture: pdb2reaction¶
1. Overview¶
pdb2reaction is a Python CLI that performs pure-MLIP enzymatic reaction-path analysis on an active-site cluster model. From a PDB plus a substrate name, it extracts the active-site cluster, adds cap hydrogens to severed bonds, and runs Hessian-based RS-I-RFO TS optimization on the MLIP potential to produce the reaction path (extract → MEP → tsopt → IRC → freq → dft).
Two bundled forks (pysisyphus/, thermoanalysis/) live at the repo top as repo-internal modules. They are deliberately not the upstream PyPI distributions; reinstalling them from PyPI alongside this package silently breaks the local extensions. See §6.
2. Layered structure (6 physical directories)¶
2.1 Layer table¶
layer |
dir |
responsibility |
may depend on |
|---|---|---|---|
L1 Interface |
|
Click root group, shared option-decorator factories ( |
|
L2 Application |
|
per-subcommand orchestration; one file per stage runner ( |
|
L3 Domain |
|
chemistry-aware helper logic (bond change detection, bond summary, element-info propagation) |
|
L4a Infra (MLIP) |
|
MLIP backend dispatcher + per-backend adapter (UMA / Orb / MACE / AIMNet2) + xTB ALPB delta correction |
|
L4b Infra (I/O) |
|
output layout, summary, trajectory, PDB fix, energy diagram, Hessian cache |
|
L5 Foundation |
|
defaults (single source of truth), utils (PDB / XYZ / plot helpers), logging, future |
(none) |
(bundle, not a layer) |
|
repo-internal forks (optimizer / thermochemistry) |
(sibling, layer-external) |
Dependency direction (one-way): L1 → L2 → {L3, L4} → L5. The directional rule is enforced by CI marker coverage (.github/scripts/check_engineering_markers.py). Bundled forks sit outside the layer graph and may be imported from any layer via their absolute package path (from pysisyphus.X import Y).
2.2 ASCII map of the package tree¶
pdb2reaction/ [GH: t-0hmura/pdb2reaction]
├── pyproject.toml packages.find = ["pdb2reaction*", ...] (glob, frozen)
├── README.md / CONTRIBUTING.md / CHANGELOG.md
├── docs/
│ ├── architecture.md ← this file
│ └──... (Sphinx site, unchanged)
├── pdb2reaction/ ← package body, 6-layer physical dir
│ ├── __init__.py PEP 562 lazy: _LAZY_SYMBOLS / _LAZY_MODULES + __getattr__
│ ├── __main__.py `from pdb2reaction.cli.app import cli`
│ ├── _version.py / py.typed
│ │
│ ├── cli/ # === L1 Interface ===
│ │ ├── app.py Click group + _LAZY_SUBCOMMANDS registry (absolute paths)
│ │ ├── common_options.py @add_print_every_option / @add_irc_pos_def_option / @add_precision_option / @add_coord_type_option / @add_ml_charge_spin_options
│ │ ├── decorators.py resolve_yaml_sources / load_merged_yaml_cfg / _write_error_json
│ │ ├── help_pages.py --help-advanced pager
│ │ ├── bool_compat.py --flag / --no-flag normalization
│ │ └── default_group.py subcommand resolver, lazy module import
│ │
│ ├── workflows/ # === L2 Application ===
│ │ ├── all.py full pipeline orchestrator (extract → … → DFT)
│ │ ├── path_search.py / path_opt.py MEP search / COS wrapper
│ │ ├── tsopt.py / freq.py / irc.py / dft.py per-stage runners
│ │ ├── opt.py / sp.py / scan.py / scan2d.py /
│ │ │ scan3d.py / scan_common.py geometry opt / single point / scans
│ │ ├── extract.py active-site extraction CLI
│ │ ├── restraints.py restraint helpers
│ │ └── align_freeze.py Kabsch + frozen-subset rmsd
│ │
│ ├── domain/ # === L3 Domain ===
│ │ ├── bond_changes.py R↔P bond detection
│ │ ├── bond_summary.py post-IRC diagnostic
│ │ └── add_elem_info.py PDB element column normalizer
│ │
│ ├── backends/ # === L4a Infra (MLIP) ===
│ │ ├── __init__.py backend dispatch + registry
│ │ ├── base.py MLIPCalculator protocol
│ │ ├── uma.py / orb.py / mace.py / aimnet2.py per-backend adapters
│ │ ├── solvent.py xTB ALPB implicit-solvent helper
│ │ └── xtb_alpb_correction.py xTB ALPB delta correction
│ │
│ ├── io/ # === L4b Infra (I/O) ===
│ │ ├── summary.py summary.json / summary.log writer
│ │ ├── energy_diagram.py Plotly diagram
│ │ ├── trj2fig.py trajectory → PNG / SVG / PDF / HTML / CSV
│ │ ├── pdb_fix.py altloc resolution
│ │ └── hessian_cache.py in-memory Hessian cache
│ │
│ └── core/ # === L5 Foundation ===
│ ├── defaults.py C1 single source of truth for every default
│ └── utils.py PDB / XYZ / plot helpers
│
├── tests/ smoke / unit
├── .github/ workflows/ + scripts/ (docs-quality lint helpers; CI-only)
└── (repo-top sibling, layer-external bundled forks)
pysisyphus/ ~90 files, repo-internal fork (slimmed; CLI driver + QM backends + wavefunction + dead optimisers / IRC / NEB variants removed)
thermoanalysis/ 5 files, repo-internal fork
2.3 Per-layer responsibility detail¶
L1 cli/ (~6 files). Only this layer constructs Click commands and parses argv. app.py holds the root Click.Group plus the _LAZY_SUBCOMMANDS registry — every entry uses an absolute module path (pdb2reaction.workflows.all, pdb2reaction.io.trj2fig, …) so the resolver is independent of where default_group.py itself lives. common_options.py collects the option-decorator factories shared across subcommands (@add_print_every_option, @add_irc_pos_def_option, @add_precision_option, @add_coord_type_option, @add_ml_charge_spin_options); subcommand bodies stack these decorators above @click.pass_context to keep --help text in lock-step.
L2 workflows/ (18 files). One file per subcommand. Each file owns a single @click.command() named cli and its private helpers. Large stage runners (all.py = 5,131 LOC, path_search.py = 2,771 LOC, tsopt.py = 2,121 LOC, extract.py = 2,113 LOC) remain as single files in the current layout.
L3 domain/. Chemistry-aware helper logic that may import torch / numpy / pysisyphus.constants (numeric back-ends), but may not import MLIP runtimes (fairchem, orb_models, mace, aimnet). .github/scripts/check_engineering_markers.py enforces this deny list via an external-library import-scope check across non-backends/ files. (The # DOMAIN_PURE docstring marker itself lives on selected workflow modules — workflows/dft.py, tsopt.py, sp.py — not on domain/.) Domain helpers are reusable by any L2 stage runner.
L4a backends/ (~8 files). MLIP backend dispatcher (__init__.py + base.py) plus one adapter per supported MLIP (uma.py, orb.py, mace.py, aimnet2.py). solvent.py and xtb_alpb_correction.py carry the xTB ALPB implicit-solvent delta correction (an opt-in MLIP wrapper). pdb2reaction is a pure-MLIP cluster-model package.
L4b io/. Output-side I/O concerns: per-stage summary writer, energy diagram, trajectory rendering, PDB altloc fix, in-memory Hessian cache. io/ never depends on workflows/; output format is owned here and consumed by stage runners.
L5 core/. The lowest layer. defaults.py is the single source of truth for every CLI default — grep here before adding a number anywhere else. utils.py is a ~3,200-LOC grab-bag of PDB / XYZ / plotting helpers.
2.4 Lazy-import mechanism (conceptual diagram)¶
External consumer Package root Layer dir
--------------------------------------- ---------------------- ---------
from pdb2reaction.core.utils import x ──► (direct dotted import) ──► pdb2reaction/core/utils.py
import pdb2reaction.io.trj2fig ──► (direct dotted import) ──► pdb2reaction/io/trj2fig.py
from pdb2reaction import <Symbol> ──► pdb2reaction/__init__.py
__getattr__("<Symbol>")
└─► _LAZY_SYMBOLS["<Symbol>"]
= "pdb2reaction.<layer>.<module>"
└─► importlib.import_module(...)
from pdb2reaction import <module> ──► pdb2reaction/__init__.py
(= module attr) __getattr__("<module>")
└─► _LAZY_MODULES["<module>"]
= "pdb2reaction.<layer>.<module>"
└─► importlib.import_module(...) returns module
pdb2reaction myaction ──► pdb2reaction/cli/app.py
_LAZY_SUBCOMMANDS["myaction"]
= ("pdb2reaction.workflows.myaction", "cli", "...")
└─► importlib.import_module(absolute path)
└─► getattr(module, "cli") → Click command
Two layers of lazy-import compatibility plus CLI dispatch:
Root symbol attribute (
from pdb2reaction import <Symbol>) — handled bypdb2reaction/__init__.py:_LAZY_SYMBOLS+ PEP 562__getattr__. Symbols are loaded on first access from the layer-dir path; import cost stays zero atpdb2reactionimport time.Root module attribute (
from pdb2reaction import <module>) — handled by_LAZY_MODULES.__getattr__returns the module object itself viaimportlib.import_module.pdb2reactioncurrently has 0 consumed module-attr paths (the registry is empty — root attribute access is reserved for future expansion).
The CLI subcommand resolver (cli/app.py:_LAZY_SUBCOMMANDS) uses absolute module paths (e.g. "pdb2reaction.workflows.all") so that moving default_group.py into cli/ does not silently break subcommand discovery (the registry no longer depends on __package__).
4. File index — “where does this concern live?”¶
4.1 CLI / entry (L1 cli/)¶
concern |
file |
|---|---|
Click root group + subcommand dispatch |
|
Subcommand resolver (lazy import) |
|
|
|
YAML source resolution + standardized exception handling |
|
|
|
Bool flag compat ( |
|
Shared option-decorator factories ( |
|
4.2 Workflow stage runners (L2 workflows/)¶
concern |
file |
|---|---|
Full pipeline orchestrator |
|
Geometry optimization (LBFGS / RFO) |
|
1D / 2D / 3D scans + shared |
|
MEP search (GSM) |
|
MEP optimizer core (pysisyphus COS) |
|
TS optimization (RSIRFO + Bofill + macro/micro) |
|
Vibrational analysis (PHVA + UMA active block) |
|
IRC integration (macro / micro) |
|
Single-point DFT (gpu4pyscf subprocess) |
|
Active-site extraction (cluster cap) |
|
Restraint helpers |
|
Kabsch / frozen-subset alignment |
|
4.3 Chemistry helpers (L3 domain/)¶
concern |
file |
|---|---|
R↔P bond change detection |
|
Post-IRC bond summary |
|
PDB element column normalizer |
|
4.4 MLIP backends (L4a backends/)¶
concern |
file |
|---|---|
Backend dispatch + registry |
|
|
|
Per-backend adapters |
|
xTB ALPB implicit-solvent helper |
|
xTB ALPB delta correction |
|
See Backends for the add-a-backend recipe.
4.5 I/O (L4b io/)¶
concern |
file |
|---|---|
|
|
Plotly energy diagram |
|
Trajectory → PNG / SVG / PDF / HTML / CSV |
|
PDB altloc resolution |
|
In-memory Hessian cache (per-run TTL) |
|
Harmonic restraint setup |
|
4.6 Foundation (L5 core/)¶
concern |
file |
|---|---|
Every CLI default (single source of truth) |
|
PDB / XYZ / plot helpers |
|
|
|
4.7 Repo-internal bundled forks¶
dir |
role |
divergent files (do NOT replace with upstream) |
|---|---|---|
|
optimizer / TS / IRC engine |
|
|
thermochemistry (ΔG, ZPE, partition functions) |
|
See each dir’s README.md for the touch-restriction boundary.
6. Bundled forks (repo-internal)¶
pdb2reaction ships two repo-internal modules at the repo top:
dir |
upstream PyPI? |
purpose |
scope of edits allowed |
|---|---|---|---|
|
NO — fork, do not |
optimizer, TS, IRC, COS, calculators |
annotation-only in this release line (docstring + type hints); logic edits forbidden |
|
NO — fork (branding diff) |
ΔG, ZPE, partition functions, |
same as |
Each dir carries its own README.md listing the divergent files and the touch-restriction boundary. From the layer model these forks live outside the L1..L5 graph: any layer may import them via the absolute package path (from pysisyphus.X import Y) without breaking the L1 → L2 → {L3, L4} → L5 direction.
7. Recommended deeper reading order¶
After the Fresh-eyes tour (§3), follow this depth-first reading order:
pdb2reaction/core/defaults.py— internalise the default-value table; everything downstream reads from here.pdb2reaction/cli/app.py— Click root +_LAZY_SUBCOMMANDSregistry.pdb2reaction/workflows/all.py— one full pipeline top-to-bottom.pdb2reaction/workflows/extract.py— active-site cluster cap.pdb2reaction/backends/__init__.py+base.py— MLIP dispatcher and per-backend adapter contract.pdb2reaction/workflows/tsopt.py— RS-I-RFO + Bofill scatter (CHEMISTRY-RULE:7).pdb2reaction/workflows/freq.py— vibrational analysis on the cluster model.pdb2reaction/workflows/irc.py— VRAM hygiene + IRC integration.pdb2reaction/workflows/dft.py— single-point DFT with gpu4pyscf (CHEMISTRY-RULE:4 + :5).pdb2reaction/core/utils.py— shared PDB / XYZ / plot helpers.