Templates
This tutorial explains the FRUST templating protocol from the point of view of
someone reading or modifying the code. A template workflow starts with a
transition-state-like .xyz file and a table of substrates, then turns those
inputs into embedded structures, DataFrame rows, and constrained xTB or ORCA
jobs.
Template geometry is chemical input
A template is not just a convenient starting geometry. Its comment line, atom order, connectivity, and reactive-core atom positions are part of the protocol. If those change, the transformer and constraint assumptions may no longer match the intended chemistry.
The Big Picture
flowchart TD
A["Substrate table<br/>smiles, optional rpos"]
B["TS template XYZ<br/>structures/ts1.xyz"]
C["read_ts_type_from_xyz<br/>TS1, TS2, TS3, TS4, INT3"]
D["create_ts_per_rpos"]
E["transformer_ts1/2/3/4<br/>or transformer_int3"]
F["Generated TS dict<br/>name -> (mol, keep_idxs, smiles)"]
G["embed_ts"]
H["Stepper.build_initial_df"]
I["FRUST DataFrame<br/>constraint_atoms, atoms, coords_embedded, rpos, cid"]
J["Stepper.xtb / Stepper.orca<br/>constraint=True"]
A --> D
B --> C --> D
D --> E --> F --> G --> H --> I --> J
The main code path is:
from frust.utils.io import read_ts_type_from_xyz
from frust.utils.mols import create_ts_per_rpos
from frust.embedder import embed_ts
from frust.stepper import Stepper
The high-level workflow functions call the same pieces for you, but the lower-level functions are useful when you need to understand or debug a new template.
Start With A Substrate Table
The input table must contain a smiles column. The optional rpos column tells
FRUST which aromatic C-H positions to use.
import pandas as pd
ligands = pd.DataFrame(
{
"smiles": ["CN1C=CC=C1"],
"rpos": ["2"],
}
)
ligands
Output:
| smiles | rpos | |
|---|---|---|
| 0 | CN1C=CC=C1 |
2 |
This is the smallest useful substrate table: one substrate and one requested
reactive position. For larger screens, each row can point to a different
substrate, and rpos can contain several positions such as "2;3".
Reactive-position rules:
- If
rposis missing, FRUST finds symmetry-unique aromatic C-H positions. - If
rposis present, FRUST validates each value against the valid aromatic C-H atoms for that SMILES. - Multiple positions are written as a semicolon-separated string, for example
"2;3".
Inspect valid positions before generating templates
Use the visualization helper when you are uncertain about atom numbering:
from frust.vis import DrawUniqueChGrid
DrawUniqueChGrid(ligands, smiles_col="smiles")
Read The Template Type
The template type is read from the second line of the .xyz file. The comment
must contain a token such as TS1, TS2, TS3, TS4, or INT3.
from frust.utils.io import read_ts_type_from_xyz
ts_type = read_ts_type_from_xyz("structures/ts1.xyz")
print(ts_type) # "TS1"
Output:
TS1
Internally, create_ts_per_rpos(...) uses this value to dispatch to the matching
transformer:
| Template type | Transformer |
|---|---|
TS1 |
frust.transformers.transformer_ts1 |
TS2 |
frust.transformers.transformer_ts2 |
TS3 |
frust.transformers.transformer_ts3 |
TS4 |
frust.transformers.transformer_ts4 |
INT3 |
frust.transformers.transformer_int3 |
Template atom order matters
The transformers contain template-specific assumptions. They identify an active-site substructure, align the incoming substrate onto it, remove template-specific bonds, and return a list of reactive-core atom indices. The later constraint blocks interpret those indices by position, so changing template atom order can silently change what gets constrained.
Expand The Template Over Reactive Positions
Use create_ts_per_rpos(...) to combine the substrate table with the .xyz
template.
from frust.utils.mols import create_ts_per_rpos
ts_structs = create_ts_per_rpos(
ligands,
ts_guess_xyz="structures/ts1.xyz",
return_format="dict",
)
print(len(ts_structs))
print(next(iter(ts_structs)))
Representative output:
1
TS1(1-methylpyrrole_rpos(2))
The key contains both the template type and the reactive position used to make that generated structure.
Each dictionary entry is generated by the selected transformer. The value is the structure tuple used by the embedder:
name, value = next(iter(ts_structs.items()))
mol, keep_idxs, smiles = value
print(name)
print(smiles)
print(keep_idxs)
Representative output:
TS1(1-methylpyrrole_rpos(2))
CN1C=CC=C1
[10, 11, 39, 40, 41, 44]
keep_idxs is the important bridge between templating and constraints. It is a
list of atom indices for the reactive core that should be preserved during
embedding and later constrained during selected optimization stages.
In table form, the generated object is easiest to reason about like this:
| Generated key | SMILES | Atoms in generated mol | keep_idxs |
|---|---|---|---|
TS1(1-methylpyrrole_rpos(2)) |
CN1C=CC=C1 |
54 | [10, 11, 39, 40, 41, 44] |
Inspect generated template structures
from frust.vis import MolTo3DGrid
MolTo3DGrid(
[value[0] for value in ts_structs.values()],
legends=list(ts_structs),
highlightAtoms=[value[1] for value in ts_structs.values()],
show_labels=True,
)
Embed And Build The Initial DataFrame
embed_ts(...) generates conformers. During embedding, keep_idxs becomes a
coordinate map (coordMap) so RDKit keeps the template reactive core close to
the template geometry.
from frust.embedder import embed_ts
embedded = embed_ts(
ts_structs,
ts_type=ts_type,
n_confs=1,
optimize=False,
)
Representative shape of embedded:
| Generated key | conformer_ids |
keep_idxs |
smiles |
uff_energies |
|---|---|---|---|---|
TS1(1-methylpyrrole_rpos(2)) |
[0] |
[10, 11, 39, 40, 41, 44] |
CN1C=CC=C1 |
[] |
The exact conformer id and UFF energies depend on the embedding settings, but
the important point is that the same keep_idxs list still travels with the
structure after conformer generation.
The stepper turns embedded conformers into a tidy DataFrame:
from frust.stepper import Stepper
step = Stepper(
step_type=ts_type,
n_cores=4,
memory_gb=20,
debug=True,
)
df = step.build_initial_df(embedded)
df[
[
"custom_name",
"structure_type",
"rpos",
"cid",
"constraint_atoms",
"atoms",
"coords_embedded",
]
].head()
Representative output:
custom_name |
structure_type |
rpos |
cid |
constraint_atoms |
atoms |
coords_embedded |
|
|---|---|---|---|---|---|---|---|
| 0 | TS1(1-methylpyrrole_rpos(2)) |
TS1 |
2 | 0 | [10, 11, 39, 40, 41, 44] |
["C", "N", "C", ...] |
54 x 3 coordinates |
This is the first point where the transformer output becomes normal FRUST table
metadata. From here on, calculation stages do not need the original
ts_structs dictionary; they read constraint_atoms from the DataFrame row.
Inspect constraint atoms
for _, row in df.iterrows():
print(row["custom_name"])
print("constraint_atoms:", row["constraint_atoms"])
Expected output:
TS1(1-methylpyrrole_rpos(2))
constraint_atoms: [10, 11, 39, 40, 41, 44]
The key columns created by Stepper.build_initial_df(...) are:
| Column | Meaning |
|---|---|
custom_name |
Generated structure name, usually containing TS..._rpos(...) |
structure_type |
Parsed type such as TS1, TS2, TS3, TS4, or INT3 |
rpos |
Reactive position used for the generated structure |
cid |
RDKit conformer id |
constraint_atoms |
The keep_idxs list from the transformer |
atoms |
Element symbols for the conformer |
coords_embedded |
Starting coordinates for calculation stages |
Start with one conformer
When developing or checking a new template, use n_confs=1, debug=True,
and a tiny substrate table. First confirm generated structures and
constraint_atoms, then increase conformer coverage.
How Constraints Flow Into Calculations
flowchart LR
A["Transformer returns<br/>keep_idxs"]
B["embed_ts freezes<br/>coordMap atoms"]
C["build_initial_df stores<br/>constraint_atoms"]
D["Stepper._constraint_atoms(row)<br/>validates sequence"]
E["xtb constraint=True<br/>$constrain block"]
F["orca constraint=True<br/>%geom Constraints block"]
A --> B --> C --> D
D --> E
D --> F
constraint=True is opt-in for calculation stages. The stepper first validates
that it has enough information:
Stepper(step_type=...)must be set to a supported constrained type:TS1,TS2,TS3,TS4, orINT3.- The DataFrame must contain
constraint_atoms. - Each row must contain a sequence of atom indices.
Then the backend-specific builder inserts a constraint block.
df = step.xtb(
df,
name="xtb_preopt",
options={"gfnff": None, "opt": None},
constraint=True,
)
Conceptually, the xTB input receives a constraint section derived from the row's
constraint_atoms:
$constrain
atoms: 11,12,40,41,42,45
$end
The numbers above are one-based because the xTB writer adds +1 to the stored
zero-based FRUST indices.
For ORCA, the same DataFrame column is used:
df = step.orca(
df,
name="DFT-pre-Opt",
options={
"wB97X-D3": None,
"6-31G**": None,
"TightSCF": None,
"Opt": None,
"NoSym": None,
},
constraint=True,
lowest=1,
)
The ORCA input follows the same idea but writes ORCA %geom Constraints syntax.
The exact block depends on the step_type because each TS or intermediate type
has a different constraint pattern.
xTB uses one-based indices
In the current implementation, xTB constraint blocks convert
constraint_atoms with x + 1 before writing $constrain. ORCA constraint
blocks use the stored indices as implemented in Stepper.orca(...). This is
why the generated constraint_atoms list should be treated as internal
FRUST metadata rather than manually edited input.
High-Level Entry Points
Most users call the higher-level workflow functions. They still follow the same template protocol internally.
from frust.pipes import run_ts_per_lig
df = run_ts_per_lig(
ligands,
ts_guess_xyz="structures/ts1.xyz",
n_confs=1,
debug=True,
DFT=False,
)
run_ts_per_lig(...) expands the substrate table with
create_ts_per_rpos(...), embeds every generated TS structure, builds the
initial DataFrame, and runs the xTB screening steps.
Use run_ts_per_rpos(...) when you already have one pre-expanded structure,
for example when a cluster workflow has split each reactive position into a
separate job.
from frust.pipes import run_ts_per_rpos
first_name = next(iter(ts_structs))
df = run_ts_per_rpos(
{first_name: ts_structs[first_name]},
n_confs=1,
debug=True,
DFT=False,
)
Use the split workflow for submitted jobs
Local Python workflows usually start with run_ts_per_lig(...). Cluster
submission workflows often expand templates first, then submit one
run_ts_per_rpos(...) job per generated reactive-position structure.
Where To Go Next
- Continue with TS Guess Generation for the workflow-level view.
- Use DataFrame Inspection and Molecular Grids to inspect generated structures.
- Use Transition States to validate the final imaginary mode before trusting a barrier.