Examples
========
This section shows how to use the `torch_molecule` library in practice. More examples are available in the `examples `_ folder and the `tests `_ folder.
Molecular Property Prediction Usage
-----------------------------------
The following example demonstrates how to use the `GREAMolecularPredictor`:
.. code-block:: python
from torch_molecule import GREAMolecularPredictor, GNNMolecularPredictor
from torch_molecule.utils.search import ParameterType, ParameterSpec
# Define search parameters
search_GNN = {
"gnn_type": ParameterSpec(ParameterType.CATEGORICAL, ["gin-virtual", "gcn-virtual", "gin", "gcn"]),
"norm_layer": ParameterSpec(ParameterType.CATEGORICAL, ["batch_norm", "layer_norm"]),
"graph_pooling": ParameterSpec(ParameterType.CATEGORICAL, ["mean", "sum", "max"]),
"augmented_feature": ParameterSpec(ParameterType.CATEGORICAL, ["maccs,morgan", "maccs", "morgan", None]),
"num_layer": ParameterSpec(ParameterType.INTEGER, (2, 5)),
"hidden_size": ParameterSpec(ParameterType.INTEGER, (64, 512)),
"drop_ratio": ParameterSpec(ParameterType.FLOAT, (0.0, 0.5)),
"learning_rate": ParameterSpec(ParameterType.LOG_FLOAT, (1e-5, 1e-2)),
"weight_decay": ParameterSpec(ParameterType.LOG_FLOAT, (1e-10, 1e-3)),
}
search_GREA = {
"gamma": ParameterSpec(ParameterType.FLOAT, (0.25, 0.75)),
**search_GNN
}
# Train GREA model
grea_model = GREAMolecularPredictor(
num_task=num_task,
task_type="regression",
model_name="GREA_multitask",
batch_size=BATCH_SIZE,
epochs=N_epoch,
evaluate_criterion='r2',
evaluate_higher_better=True,
verbose=True
)
# Fit the model
X_train = ['C1=CC=CC=C1', 'C1=CC=CC=C1']
y_train = [[0.5], [1.5]]
X_val = ['C1=CC=CC=C1', 'C1=CC=CC=C1']
y_val = [[0.5], [1.5]]
N_trial = 100
grea_model.autofit(
X_train=X_train.tolist(),
y_train=y_train,
X_val=X_val.tolist(),
y_val=y_val,
n_trials=N_trial,
search_parameters=search_GREA
)
Molecular Generator Usage
----------------------------
The following example demonstrates how to use the `GraphDITMolecularGenerator` for generating molecules with retry logic for invalid molecules:
.. code-block:: python
from torch_molecule import GraphDITMolecularGenerator
from rdkit import Chem
property_names = ['logP']
train_smiles_list = ['C1=CC=CC=C1', 'C1=CC=CC=C1']
train_property_array = [[1], [2]]
test_property_array = [[1.5], [2.5]]
# Initialize the generator model
model_cond = GraphDITMolecularGenerator(
task_type=['regression'] * len(property_names),
batch_size=1024,
drop_condition=0.1,
verbose=True,
epochs=10000,
)
# Fit the model
model_cond.fit(train_smiles_list, train_property_array)
# Generate molecules with retry logic
max_retries = 10
generated_smiles_list = model_cond.generate(test_property_array)
# Function to check if SMILES is valid
def is_valid_smiles(smiles):
if smiles is None:
return False
mol = Chem.MolFromSmiles(smiles)
return mol is not None
# Retry generation for invalid molecules
for retry in range(max_retries - 1): # Already did first generation
# Find indices of invalid SMILES
invalid_indices = [i for i, smiles in enumerate(generated_smiles_list) if not is_valid_smiles(smiles)]
if not invalid_indices:
print(f"All SMILES valid after {retry + 1} attempts")
break
print(f"Retry {retry + 1}: Regenerating {len(invalid_indices)} invalid molecules")
# Extract properties for invalid molecules
invalid_properties = test_property_array[invalid_indices]
# Regenerate only for invalid molecules
new_smiles = model_cond.generate(invalid_properties)
# Replace invalid molecules with new generations
for idx, new_idx in enumerate(invalid_indices):
generated_smiles_list[new_idx] = new_smiles[idx]
if retry == max_retries - 2: # Last iteration
print(f"Reached maximum retries ({max_retries}). {len(invalid_indices)} molecules still invalid.")
Using Pretrained Checkpoints
----------------------------
`torch_molecule` supports loading and saving models via Hugging Face Hub.
.. code-block:: python
from torch_molecule import GREAMolecularPredictor
from sklearn.metrics import mean_absolute_error
# huggingface repo_id including the user name and repo name
repo_id = "user/repo_id"
# Train and push a model to Hugging Face
model = GREAMolecularPredictor()
model.autofit(
X_train=X.tolist(),
y_train=y_train,
X_val=X_val.tolist(),
y_val=y_val,
n_trials=100
)
output = model.predict(X_test.tolist())
mae = mean_absolute_error(y_test, output['prediction'])
metrics = {'MAE': mae}
model.push_to_huggingface(
repo_id=repo_id,
task_id=f"{task_name}",
metrics=metrics,
commit_message=f"Upload GREA_{task_name} model with metrics: {metrics}",
private=False
)
# Load a pretrained model checkpoint
model_dir = "local_model_dir_to_save"
model = GREAMolecularPredictor()
model.load_model(f"{model_dir}/GREA_{task_name}.pt", repo_id=repo_id)
model.set_params(verbose=True)
predictions = model.predict(smiles_list)