Skip to content

TS Guess Generation

FRUST transition-state workflows start from two pieces of information:

  • a ligand or substrate table, usually with a smiles column;
  • a transition-state template geometry such as structures/ts1.xyz or structures/ts2.xyz.

The template tells FRUST what kind of TS-like structure to build. The ligand table tells FRUST which substrates and reactive positions should be expanded.

flowchart TD
    A["Ligand table<br/>smiles, names, optional rpos"]
    B["TS template<br/>ts_guess_xyz"]
    C["Reactive-position mapping"]
    D["Template type detection<br/>TS1, TS2, TS3, TS4, INT3"]
    E["Generated TS structures"]
    F["Conformer embedding"]
    G["Initial FRUST DataFrame<br/>atoms, coords_embedded, rpos, cid"]
    H["Stepper or pipeline stages"]

    A --> C
    B --> D
    C --> E
    D --> E
    E --> F --> G --> H

Template geometry is chemical input

FRUST can generate and screen structures from a TS template, but it cannot prove that the template represents the intended reaction. Inspect the final imaginary mode before using a barrier.

Choosing The TS Entry Point

Use run_ts_per_lig(...) when you want the normal in-process TS workflow from a ligand table. Despite the name, this function expands each ligand over its reactive positions internally with create_ts_per_rpos(...), then runs the resulting TS candidates sequentially in one workflow call.

from frust.pipes import run_ts_per_lig

df = run_ts_per_lig(
    ligands,
    ts_guess_xyz="structures/ts1.xyz",
    n_confs=2,
    DFT=False,
)

Use run_ts_per_rpos(...) when TS structures have already been generated and you want to run one pre-expanded reactive-position structure. This is mainly useful for distribution: the cluster submission layer can split a CSV into multiple ts_struct jobs and submit one job per generated reactive-position structure.

from frust.pipes import run_ts_per_rpos
from frust.utils.mols import create_ts_per_rpos

ts_structs = create_ts_per_rpos(
    ligands,
    ts_guess_xyz="structures/ts2.xyz",
    return_format="dict",
)

first_name = next(iter(ts_structs))
df = run_ts_per_rpos(
    {first_name: ts_structs[first_name]},
    n_confs=2,
    DFT=False,
)

Rule of thumb

Use run_ts_per_lig(...) for a local or single-process Python workflow. Use run_ts_per_rpos(...) when you deliberately want the reactive-position expansion step to happen before execution, usually so each generated structure can become a separate Slurm job.

Use fewer conformers for wiring checks

For a new template or CSV, start with n_confs=1, DFT=False, and a tiny input table. Once the structure generation and reactive-position mapping look correct, increase conformer coverage and launch the expensive stages.

What The Initial DataFrame Contains

After embedding, FRUST builds a dataframe where each row is a generated structure or conformer. The columns that matter first are:

Column Meaning
substrate_name ligand or substrate identity
structure_type TS or intermediate type, for example TS1 or INT3
rpos reactive position used for this generated structure
cid conformer id
atoms element symbols
coords_embedded embedded starting coordinates

Check that reactive positions were generated

df[["substrate_name", "structure_type", "rpos", "cid"]].head()
df.groupby(["substrate_name", "rpos"], dropna=False).size()

What To Inspect Before Running DFT

  • Confirm each ligand generated the expected number of reactive-position rows.
  • Confirm rpos points to the intended atom in the substrate.
  • Inspect a few embedded structures before spending ORCA time.
  • Check whether the lowest conformer after a cheap optimization is chemically sensible.

For the post-run checks, continue with Inspecting Results.