scmiracle.utils#

scmiracle.utils.BallTreeSubsample(X, target_size, ls=2)[source]#

Sample using the ball-tree.

Parameters:
  • X (numpy.array) – A matrix (sample * feature).

  • target (int) – An integer for the target number.

  • ls (int) – The leaf size of the ball-tree.

Returns:

ID of samples.

scmiracle.utils.calculate_loss_scale(replay_cell_counts, current_cell_counts)[source]#

Calculates scaling factors to balance the loss contribution between historical (replay) and current data during continual learning.

This ensures that the smaller replay buffer is not overshadowed by the larger new data batch during training.

Parameters:
  • replay_cell_counts (list) – A list of cell counts from previous batches.

  • current_cell_counts (list) – A list of cell counts from the new batch.

Returns:

A list of weights for all batches.

Return type:

list

scmiracle.utils.convert_tensor_to_list(data: Tensor | List[List[Any]]) List[List[Any]][source]#

Convert a 2D tensor or list into a 2D list.

Parameters:

data – Union[torch.Tensor, List[List[Any]]] Input data to be converted.

Returns:

Converted 2D list.

Return type:

List[List[Any]]

scmiracle.utils.convert_tensors_to_cuda(x: Dict[str, Any], device: device) Dict[str, Any][source]#

Recursively convert all tensors in a dictionary to CUDA.

Parameters:
  • x – Dict[str, Any] Dictionary containing tensors or nested dictionaries.

  • device – torch.device Device to move the tensors to (e.g., CUDA or CPU).

Returns:

A new dictionary with all tensors moved to the specified device.

Return type:

Dict[str, Any]

scmiracle.utils.detach_tensors(x: Dict[str, Any]) Dict[str, Any][source]#

Recursively detach all tensors in a dictionary.

Parameters:

x – Dict[str, Any] Dictionary containing tensors or nested dictionaries.

Returns:

A new dictionary with all tensors detached.

Return type:

Dict[str, Any]

scmiracle.utils.exp(x: Tensor, eps: float = 1e-12) Tensor[source]#

Compute a numerically stable exponential transformation.

Handles negative and positive values to avoid numerical instability.

Parameters:
  • x – torch.Tensor Input tensor.

  • eps – float, optional A small epsilon value to avoid division by zero, by default 1e-12.

Returns:

Transformed tensor with the exponential applied.

Return type:

torch.Tensor

scmiracle.utils.extract_params(config: dict, prefix: str) dict[source]#

Extract parameters from a configuration dictionary with a specific prefix.

Removes the specified prefix from the keys in the resulting dictionary.

Parameters:
  • config – dict Configuration dictionary containing various parameters.

  • prefix – str Prefix to filter and remove from the keys.

Returns:

A new dictionary containing the filtered parameters with the prefix removed.

Return type:

dict

scmiracle.utils.extract_values(x: List[Any] | Tuple[Any] | Dict[Any, Any] | Any) List[Any][source]#

Recursively extract all values from a tuple, list, or dictionary.

Parameters:

x – list, tuple, dict, or any type The input structure containing nested values.

Returns:

A flattened list of all values extracted from the input.

Return type:

List[Any]

scmiracle.utils.filter_keys(d: Dict[str, Any], substring: str) Dict[str, Any][source]#

Filter a dictionary to include only keys that contain a specific substring.

Parameters:
  • d – Dict[str, Any] The input dictionary to filter.

  • substring – str The substring to look for in the keys.

Returns:

A new dictionary containing only the keys from the original dictionary that include the specified substring.

Return type:

Dict[str, Any]

scmiracle.utils.generate_all_combinations(mods: List[str]) List[Tuple[Tuple[str, ...], List[str]]][source]#

Generate all possible input-output combinations for a given list of modalities.

For N modalities, generate all combinations of size r (1 <= r < N) as input, and the remaining modalities as output.

Parameters:

mods – List[str] List of modality names.

Returns:

A list of tuples, where each tuple contains:
  • A tuple of input modalities.

  • A list of output modalities.

Return type:

List[Tuple[Tuple[str, …], List[str]]]

scmiracle.utils.get_filenames(directory: str, extension: str) List[str][source]#

Get sorted filenames with the given extension in the specified directory.

Parameters:
  • directory – str The directory to search for files.

  • extension – str The file extension to filter by.

Returns:

Sorted list of filenames with the specified extension.

Return type:

List[str]

scmiracle.utils.get_name_fmt(file_num: int) str[source]#

Generate a format string for filenames based on the total number of files.

Parameters:

file_num – int Total number of files to be named.

Returns:

Format string for filenames, e.g., ‘%03d’ for three-digit naming.

Return type:

str

scmiracle.utils.get_pred_dirs(pred_dir: str, combs: List[List[str]], joint_latent: bool, mod_latent: bool, impute: bool, batch_correct: bool, translate: bool, input: bool) Dict[int, Dict[str, Dict[str, str]]][source]#

Generate directory paths for predictions based on configurations.

Parameters:
  • pred_dir – str Base directory for predictions.

  • combs – list of list of str Combinations of modalities for each batch.

  • joint_latent – bool Include joint latent variables.

  • mod_latent – bool Include modality-specific latent variables.

  • impute – bool Include imputed data.

  • batch_correct – bool Include batch-corrected data.

  • translate – bool Include translated data.

  • input – bool Include input data.

Returns:

Dictionary of directories for each batch and variable.

Return type:

Dict[int, Dict[str, Dict[str, str]]]

scmiracle.utils.get_s_joint_mods(combs: List[List[str]]) Tuple[List[Dict[str, int]], List[str]][source]#

Generate s_joint and mods from a list of modality combinations.

Parameters:

combs – List[List[str]] A list where each element is a list of strings representing combinations of modalities for a specific batch.

Returns:

  • s_joint: A list of dictionaries, where each dictionary maps the modalities

to their corresponding indices for each batch. - mods: A list of all unique modalities across the dataset.

Return type:

Tuple

scmiracle.utils.load_csv(filename: str) list[source]#

Load a CSV file and return its contents as a list of rows.

Parameters:

filename – str Path to the CSV file.

Returns:

A list of rows, where each row is a list of strings.

Return type:

list

scmiracle.utils.load_mtx(filename: str) list[source]#

load mtx file and convert to csr_matrix

Parameters:

filename – str Path to the mtx file.

scmiracle.utils.load_predicted(pred_dir: str, combs: List[List[str]], joint_latent: bool = True, mod_latent: bool = False, impute: bool = False, batch_correct: bool = False, translate: bool = False, input: bool = False, group_by: str = 'modality', mtx: bool = True) Dict[int, Dict[str, Any]] | Dict[str, Dict[str, ndarray]][source]#

Load predicted variables from a specified directory.

Parameters:
  • pred_dir – str Path to the prediction directory.

  • combs – list of list of str Combinations of modalities for each batch. Example: [[‘rna’], [‘rna’, ‘adt’]].

  • joint_latent – bool, optional Whether to include joint latent variables, by default True.

  • mod_latent – bool, optional Whether to include modality-specific latent variables, by default False.

  • impute – bool, optional Whether to include imputed data, by default False.

  • batch_correct – bool, optional Whether to include batch-corrected data, by default False.

  • translate – bool, optional Whether to include translated data, by default False.

  • input – bool, optional Whether to include input data, by default False.

  • group_by – str, optional Grouping method for the data, either ‘modality’ or ‘batch’, by default ‘modality’.

Returns:

Loaded predicted data grouped by the specified method.

Return type:

Union[Dict[int, Dict[str, Any]], Dict[str, Dict[str, np.ndarray]]]

scmiracle.utils.log(x: Tensor, eps: float = 1e-12) Tensor[source]#

Compute a numerically stable logarithm transformation.

Ensures numerical stability by adding a small epsilon.

Parameters:
  • x – torch.Tensor Input tensor.

  • eps – float, optional A small epsilon value to avoid log(0), by default 1e-12.

Returns:

Transformed tensor with the logarithm applied.

Return type:

torch.Tensor

scmiracle.utils.mkdir(directory: str, remove_old: bool = False)[source]#

Create a directory, optionally removing the old one.

Parameters:
  • directory – str Path to the directory.

  • remove_old – bool, optional Whether to remove the old directory if it exists, by default False.

scmiracle.utils.mkdirs(directories: str | List[str] | Dict[str, Any], remove_old: bool = False)[source]#

Recursively create directories.

Parameters:
  • directories – Union[str, List[str], Dict[str, Any]] Path(s) to directories to create.

  • remove_old – bool Whether to remove old directories if they exist, by default False.

scmiracle.utils.ref_sort(x: List[str], ref: List[str]) List[str][source]#

Sort the elements of x based on the order defined in ref.

Parameters:
  • x – list of str List of elements to be sorted.

  • ref – list of str Reference list defining the sort order.

Returns:

A sorted list of elements from x that appear in ref, maintaining the order of ref.

Return type:

List[str]

scmiracle.utils.reverse_dict(original_dict: Dict[str, Dict[str, Any]]) Dict[str, Dict[str, Any]][source]#

Reverse the keys and sub-keys of a nested dictionary.

Parameters:

original_dict – Dict[str, Dict[str, Any]] The original nested dictionary to be reversed.

Returns:

A reconstructed dictionary where the keys and sub-keys are swapped.

Return type:

Dict[str, Dict[str, Any]]

scmiracle.utils.reverse_trsf(name: str, data: ndarray, **kwargs) ndarray[source]#

Apply a reverse transformation to the given data.

Parameters:
  • name – str Name of the transformation to reverse (e.g., ‘log1p’).

  • data – np.ndarray Data to transform.

  • kwargs – dict Additional transformation parameters.

Returns:

Transformed data.

Return type:

np.ndarray

scmiracle.utils.rmdir(directory: str)[source]#

Remove a directory if it exists.

Parameters:

directory – str Path to the directory to remove.

scmiracle.utils.safe_append(pred: dict, batch_id: int, key_path: list, value: Any)[source]#

Append a value to a nested dictionary structure.

Parameters:
  • pred – dict The nested dictionary structure to append to.

  • batch_id – int The batch ID to use as the key for the nested dictionary.

  • key_path – list of str The path of keys to follow in the nested dictionary.

  • value – Any The value to append to the nested dictionary.

scmiracle.utils.save_list_to_csv(data: List[List[Any]], filename: str, delimiter: str = ',')[source]#

Save a 2D list to a CSV file.

Parameters:
  • data – List[List[Any]] Input data to be saved.

  • filename – str Path to the CSV file.

  • delimiter – str Delimiter to separate values in the CSV file, by default ‘,’.

scmiracle.utils.save_list_to_mtx(data: Tensor, filename: str)[source]#

Save a 2D list or tensor to a Matrix Market (MTX) file. :param data: torch.Tensor

Input data to be saved.

Parameters:

filename – str Path to the MTX file.

scmiracle.utils.save_tensor_to_csv(data: Tensor, filename: str, delimiter: str = ',', index: bool = False, header: bool = False)[source]#

Save a 2D tensor to a CSV file.

Parameters:
  • data – torch.Tensor Input tensor to be saved.

  • filename – str Path to the CSV file.

  • delimiter – str, optional Delimiter to separate values in the CSV file, by default ‘,’.

scmiracle.utils.save_tensor_to_mtx(data: Tensor, filename: str)[source]#

Save a 2D tensor to a Matrix Market (MTX) file. :param data: torch.Tensor

Input tensor to be saved.

Parameters:

filename – str Path to the MTX file.