Utility Functions

class torch_molecule.utils.checker.MolecularInputChecker[source]

Bases: object

Class for validating input data used in molecular models.

static validate_inputs(X: List[str], y: List | ndarray | None = None, num_task: int = 0, num_pretask: int = 0, return_rdkit_mol: bool = True) Tuple[List[str] | List[Mol], ndarray | None][source]

Validate a list of SMILES strings, and optionally validate a target array.

Parameters:
  • X (List[str]) – List of SMILES strings

  • y (Optional[Union[List, np.ndarray]], optional) – Optional target values, by default None

  • num_task (int, optional) – Total number of tasks; used to check dimensions of y, by default 0

  • num_pretask (int, optional) – Number of (pseudo)-tasks that are predefined in the modeling; used to check dimensions of y. Preliminarily used in supervised pretraining, by default 0

  • return_rdkit_mol (bool, optional) – If True, convert SMILES to RDKit Mol objects, by default True

Returns:

A tuple containing:

  • The original or converted SMILES (RDKit Mol objects if return_rdkit_mol=True)

  • The target array as a numpy array, or None if y was not provided

Return type:

Tuple[Union[List[str], List[“Chem.Mol”]], Optional[np.ndarray]]

Raises:

ValueError – If SMILES or target dimensions are invalid

static validate_smiles(smiles: str, idx: int) Tuple[bool, str | None, Mol | None][source]

Validate a single SMILES string at a given index.

Parameters:
  • smiles (str) – The SMILES string to validate

  • idx (int) – The index of the SMILES string in the original list

Returns:

A tuple containing:

  • A boolean indicating whether the SMILES string is valid

  • A string describing the error if the SMILES is invalid, or None if valid

  • The RDKit Mol object if valid, or None if invalid

Return type:

Tuple[bool, Optional[str], Optional[Chem.Mol]]

class torch_molecule.utils.checkpoint.HuggingFaceCheckpointManager[source]

Bases: object

Handles saving and loading of models to and from the Hugging Face Hub.

static load_model_from_hf(model_instance, repo_id: str, path: str, config_filename: str = 'config.json') None[source]

Load model from Hugging Face Hub, saving locally to path first.

static push_to_huggingface(model_instance, repo_id: str, task_id: str = 'default', metadata_dict: Dict[str, Any] | None = None, metrics: Dict[str, float] | None = None, commit_message: str = 'Update model', token: str | None = None, private: bool = False, config_filename: str = 'config.json') None[source]

Push a task-specific model checkpoint to Hugging Face Hub.

class torch_molecule.utils.checkpoint.LocalCheckpointManager[source]

Bases: object

Handles saving and loading of models to and from local paths.

static load_model_from_local(model_instance, path: str) None[source]

Load model weights and configuration from a local file.

static save_model_to_local(model_instance, path: str) None[source]

Save model weights and configuration to a local file.

torch_molecule.utils.format.sanitize_config(config_dict)[source]

Recursively sanitize config dictionary for JSON serialization.

Handles nested structures and special cases.

Parameters:

config_dict (dict) – Configuration dictionary to sanitize

Returns:

Sanitized configuration dictionary that is JSON serializable

Return type:

dict

torch_molecule.utils.format.serialize_config(obj)[source]

Helper function to make config JSON serializable.

Handles special cases like lambda functions, torch modules, and numpy arrays.

Parameters:

obj (Any) – The object to serialize

Returns:

JSON serializable representation of the object

Return type:

Any

torch_molecule.utils.hf.create_model_card(model_class: str, model_name: str, tasks_config: Dict, model_config: Dict, repo_id: str, existing_readme: str = '') str[source]

Create a model card for multiple tasks.

Parameters:
  • model_class (str) – Class name of the model

  • model_name (str) – Name of the model

  • tasks_config (Dict) – Configuration for all tasks

  • model_config (Dict) – General model configuration

  • repo_id (str) – Repository ID

  • existing_readme (str) – Existing README content

Returns:

Generated model card content

Return type:

str

torch_molecule.utils.hf.get_existing_repo_data(repo_id: str, token: str | None = None) Tuple[bool, Dict, str][source]

Get existing repository data from HuggingFace Hub.

Parameters:
  • repo_id (str) – Repository ID

  • token (Optional[str]) – HuggingFace token

Returns:

Tuple containing (repo_exists, existing_config, existing_readme)

Return type:

Tuple[bool, Dict, str]

torch_molecule.utils.hf.merge_task_configs(task_id: str, existing_config: Dict, new_task_config: Dict, num_params: int) Dict[source]

Merge task-specific configuration and maintain version history.

Parameters:
  • task_id (str) – Task identifier (e.g., ‘O2’, ‘N2’)

  • existing_config (Dict) – Existing configuration dictionary

  • new_task_config (Dict) – New task configuration to merge

  • num_params (int) – Number of model parameters

Returns:

Updated configuration with task history

Return type:

Dict

class torch_molecule.utils.search.ParameterSpec(param_type: ParameterType, value_range: Tuple[Any, Any] | List[Any])[source]

Bases: NamedTuple

Specification for a hyperparameter including its type and valid range/options.

param_type: ParameterType

Alias for field number 0

value_range: Tuple[Any, Any] | List[Any]

Alias for field number 1

class torch_molecule.utils.search.ParameterType(value)[source]

Bases: Enum

Enum defining types of hyperparameters for optimization.

Each type corresponds to a specific Optuna suggest method and parameter behavior.

CATEGORICAL = 'categorical'
FLOAT = 'float'
INTEGER = 'integer'
LOG_FLOAT = 'log_float'
torch_molecule.utils.search.parse_list_params(params_str)[source]
torch_molecule.utils.search.suggest_parameter(trial: Any, param_name: str, param_spec: ParameterSpec) Any[source]

Suggest a parameter value using the appropriate Optuna suggest method.

Parameters:
  • trial (optuna.Trial) – The Optuna trial object

  • param_name (str) – Name of the parameter

  • param_spec (ParameterSpec) – Specification of the parameter type and range

Returns:

The suggested parameter value

Return type:

Any

Raises:

ValueError – If the parameter type is not recognized

torch_molecule.utils.graph.features.atom_feature_vector_to_dict(atom_feature)[source]
torch_molecule.utils.graph.features.atom_to_feature_vector(atom)[source]

Converts rdkit atom object to feature list of indices :param mol: rdkit atom object :return: list

torch_molecule.utils.graph.features.bond_feature_vector_to_dict(bond_feature)[source]
torch_molecule.utils.graph.features.bond_to_feature_vector(bond)[source]

Converts rdkit bond object to feature list of indices :param mol: rdkit bond object :return: list

torch_molecule.utils.graph.features.get_atom_feature_dims()[source]
torch_molecule.utils.graph.features.get_bond_feature_dims()[source]
torch_molecule.utils.graph.features.getmaccsfingerprint(mol)[source]
torch_molecule.utils.graph.features.getmorganfingerprint(mol)[source]
torch_molecule.utils.graph.features.safe_index(l, e)[source]

Return index of element e in list l. If e is not present, return the last index

torch_molecule.utils.graph.graph_from_smiles.add_fingerprint_feature(mol, feature_type, get_fingerprint_fn)[source]
torch_molecule.utils.graph.graph_from_smiles.get_augmented_property(mol, properties)[source]
torch_molecule.utils.graph.graph_from_smiles.graph_from_smiles(smiles_or_mol, properties, augmented_features=None, augmented_properties=None)[source]

Converts SMILES string or RDKit molecule to graph Data object

Parameters:
  • smiles_or_mol (Union[str, rdkit.Chem.rdchem.Mol]) – SMILES string or RDKit molecule object

  • properties (Any) – Properties to include in the graph

  • augmented_features (list) – List of augmented features to include

  • augmented_properties (list, optional) – List of augmented properties to include

Returns:

Graph object dictionary

Return type:

dict

torch_molecule.utils.graph.graph_to_smiles.build_molecule_with_partial_charges(atom_types, edge_types, atom_decoder, verbose=False)[source]
torch_molecule.utils.graph.graph_to_smiles.check_valency(mol)[source]
torch_molecule.utils.graph.graph_to_smiles.check_valid(smiles)[source]
torch_molecule.utils.graph.graph_to_smiles.connect_fragments(mol)[source]
torch_molecule.utils.graph.graph_to_smiles.correct_mol(mol, connection=False)[source]
torch_molecule.utils.graph.graph_to_smiles.get_mol(smiles_or_mol)[source]

Loads SMILES/molecule into RDKit’s object

torch_molecule.utils.graph.graph_to_smiles.graph_to_smiles(molecule_list: List[Tuple], atom_decoder: list) List[str | None][source]
torch_molecule.utils.graph.graph_to_smiles.mol2smiles(mol)[source]
torch_molecule.utils.graph.graph_to_smiles.select_atom_with_available_valency(frag)[source]
torch_molecule.utils.graph.graph_to_smiles.select_atoms_with_available_valency(frag)[source]
torch_molecule.utils.graph.graph_to_smiles.try_to_connect_fragments(combined_mol, frag, atom1, atom2)[source]
torch_molecule.utils.generic.metrics.accuracy_score(y_true, logits, avergae=None, thresholds=None, task_weights=None, task_types=None)[source]

Calculate accuracy for multiple tasks from prediction logits.

Parameters:

y_truenumpy.ndarray

Ground truth labels with shape (n_samples, n_tasks)

logitsnumpy.ndarray

Prediction logits with shape (n_samples, n_tasks)

task_typeslist or None, optional

List of task types (‘binary’ or ‘multiclass’) for each task If None, assumes all tasks are binary

thresholdsnumpy.ndarray or None, optional

Classification thresholds for binary tasks with shape (n_tasks,) If None, uses 0.5 for all binary tasks

task_weightsnumpy.ndarray or None, optional

Weights for each task with shape (n_tasks,) If None, all tasks are weighted equally

Returns:

dict

A dictionary containing: - ‘task_accuracies’: Accuracy for each individual task - ‘weighted_accuracy’: Overall weighted accuracy across all tasks - ‘macro_accuracy’: Simple average of all task accuracies - ‘predictions’: Binary predictions after applying activation and thresholds

Raises:

ValueError

If input shapes don’t match or dimensions are incorrect

torch_molecule.utils.generic.metrics.mean_absolute_error(y_true: ndarray | list, y_pred: ndarray | list, average: bool = True, sample_weight: ndarray | None = None) float | ndarray[source]

Calculate Mean Absolute Error for multi-task regression, handling NaN values.

Parameters:
  • y_true (Union[np.ndarray, list]) – Ground truth values. Shape should be (n_samples, n_tasks)

  • y_pred (Union[np.ndarray, list]) – Predicted values. Shape should be (n_samples, n_tasks)

  • average (bool, default=True) – If True, return the average MAE across all valid tasks. If False, return individual MAE for each task (NaN for invalid tasks).

  • sample_weight (Optional[np.ndarray], default=None) – Sample weights. Shape should be (n_samples,)

Returns:

If average=True, returns mean MAE across all valid tasks. If average=False, returns array of MAE scores with NaN for invalid tasks.

Return type:

Union[float, np.ndarray]

torch_molecule.utils.generic.metrics.mean_squared_error(y_true: ndarray | list, y_pred: ndarray | list, average: bool = True, sample_weight: ndarray | None = None, squared: bool = True) float | ndarray[source]

Calculate Mean Squared Error for multi-task regression, handling NaN values.

Parameters:
  • y_true (Union[np.ndarray, list]) – Ground truth values. Shape should be (n_samples, n_tasks)

  • y_pred (Union[np.ndarray, list]) – Predicted values. Shape should be (n_samples, n_tasks)

  • average (bool, default=True) – If True, return the average MSE across all valid tasks. If False, return individual MSE for each task (NaN for invalid tasks).

  • sample_weight (Optional[np.ndarray], default=None) – Sample weights. Shape should be (n_samples,)

  • squared (bool, default=True) – If True, returns MSE value. If False, returns RMSE value.

Returns:

If average=True, returns mean MSE/RMSE across all valid tasks. If average=False, returns array of MSE/RMSE scores with NaN for invalid tasks.

Return type:

Union[float, np.ndarray]

torch_molecule.utils.generic.metrics.r2_score(y_true: ndarray | list, y_pred: ndarray | list, average: bool = True, sample_weight: ndarray | None = None) float | ndarray[source]

Calculate R² Score for multi-task regression, handling NaN values.

Parameters:
  • y_true (Union[np.ndarray, list]) – Ground truth values. Shape should be (n_samples, n_tasks)

  • y_pred (Union[np.ndarray, list]) – Predicted values. Shape should be (n_samples, n_tasks)

  • average (bool, default=True) – If True, return the average R² across all valid tasks. If False, return individual R² for each task (NaN for invalid tasks).

  • sample_weight (Optional[np.ndarray], default=None) – Sample weights. Shape should be (n_samples,)

Returns:

If average=True, returns mean R² across all valid tasks. If average=False, returns array of R² scores with NaN for invalid tasks.

Return type:

Union[float, np.ndarray]

torch_molecule.utils.generic.metrics.roc_auc_score(y_true: ndarray | list, y_pred: ndarray | list, average: bool = True, sample_weight: ndarray | None = None) float | ndarray[source]

Calculate ROC AUC scores for multi-task binary classification, handling NaN values.

For each task dimension, computes AUC score using only the non-NaN samples. Tasks with insufficient valid samples or unique labels are masked in the output.

Parameters:
  • y_true (Union[np.ndarray, list]) – True binary labels. Shape should be (n_samples, n_tasks)

  • y_pred (Union[np.ndarray, list]) – Predicted probabilities. Shape should be (n_samples, n_tasks)

  • average (bool, default=True) – If True, return the average ROC AUC score across all valid tasks. If False, return individual scores for each task (NaN for invalid tasks).

  • sample_weight (Optional[np.ndarray], default=None) – Sample weights for each instance. Shape should be (n_samples,)

Returns:

If average=True, returns mean ROC AUC score across all valid tasks. If average=False, returns array of ROC AUC scores with NaN for invalid tasks.

Return type:

Union[float, np.ndarray]

Raises:
  • ValueError – If input shapes don’t match or no valid tasks are found

  • TypeError – If inputs are not of correct type

Examples

>>> y_true = np.array([[0, 1, np.nan], [1, 0, 1], [1, np.nan, 0], [0, 0, 1]])
>>> y_pred = np.array([[0.1, 0.8, 0.7], [0.9, 0.2, 0.8], [0.8, 0.7, 0.3], [0.2, 0.1, 0.9]])
>>> score = roc_auc_score(y_true, y_pred)
>>> print(f"Average ROC AUC across valid tasks: {score:.3f}")
torch_molecule.utils.generic.metrics.root_mean_squared_error(y_true, y_pred, average, sample_weight)[source]
torch_molecule.utils.generic.metrics.sigmoid(x)[source]

Numerically stable sigmoid function.

torch_molecule.utils.generic.weights.init_weights(net, init_type='xavier', init_gain=0.02, verbose=False)[source]

Initialize network weights. :param net: :type net: network :param init_type: normal | xavier | kaiming | orthogonal :type init_type: str :param init_gain: :type init_gain: float