API Reference¶
illico package¶
ovo subpackage¶
ovr subpackage¶
utils¶
- class illico.utils.groups.GroupContainer(encoded_groups, counts, indices, indptr, encoded_ref_group)¶
- counts¶
Alias for field number 1
- encoded_groups¶
Alias for field number 0
- encoded_ref_group¶
Alias for field number 4
- indices¶
Alias for field number 2
- indptr¶
Alias for field number 3
- illico.utils.groups.encode_and_count_groups(groups, ref_group)[source]¶
Build the GroupContainer holding all group-related information.
GroupContainer holds: - original group information - reference group (control) - encoded groups - unique raw groups - counts (of cell, per group) - indices, indptr in a RLE format - encoded reference group (control)
- Parameters:
groups (np.ndarray) – 1-d array holding group labels, one per cell
ref_group (Any) – Flag
- Returns:
Array of unique group labels GroupContainer: GroupContainer holding all group-related information.
- Return type:
unique_groups (np.ndarray)
Author: Rémy Dubois
- illico.utils.math.diff(x)¶
Equivalent of np.diff.
For some reasons, np.ndiff failed to compile properly.
- Parameters:
x (np.ndarray) – Input 1-d array
- Returns:
Results diff array, of size x.size - 1.
- Return type:
np.ndarray
Author: Rémy Dubois
- illico.utils.math.compute_pval(n_ref, n_tgt, n, tie_sum, U, mu, contin_corr=0.0, alternative='two-sided')¶
Compute p-value.
This small piece of code was isolated here because it was duplicated in the all six routines.
- Parameters:
n_ref (int) – Number of reference (control) values (cells)
n_tgt (int) – Number of perturbed (targeted) values (cells)
n (int) – Total number of values (cells)
tie_sum (float) – Tie sum
U (float) – U-statistic
mu (float) – Mean
contin_corr (float, optional) – Continuity correction factor. Defaults to 0.0.
alternative (Literal["two-sided", "less", "greater"]) – Type of alternative hypothesis.
- Returns:
P-value and z-score
- Return type:
tuple[float]
Author: Rémy Dubois
- illico.utils.math.sampled_max(data, sample_size=200000)¶
- Parameters:
data (numpy.ndarray)
sample_size (int)
- Return type:
float
- illico.utils.math.fold_change_from_summed_expr(group_agg_counts, grpc, exp_post_agg)¶
Compute fold change from summed expression values, per group.
- Parameters:
group_agg_counts (np.ndarray) – Sum of expression values of shape (n_groups, n_genes)
grpc (GroupContainer) – GroupContainer holding group information
exp_post_agg (bool) – Whether to exponentiate the fold change after aggregation. This is relevant if the input data is log1p. See documentation for details.
- Returns:
Fold change values of shape (n_groups, n_genes)
- Return type:
np.ndarray
Author: Rémy Dubois
- illico.utils.math.dense_fold_change(X, grpc, is_log1p, exp_post_agg)¶
Compute fold change from a dense array of expression counts.
- Parameters:
X (np.ndarray) – Expression counts
grpc (GroupContainer) – GroupContainer holding group information
is_log1p (bool) – User-indicated flag if data is log1p or not.
exp_post_agg (bool) – Whether to exponentiate the fold change after aggregation. This is relevant if the input data is log1p. See documentation for details.
- Returns:
Fold change values of shape (n_groups, n_genes)
- Return type:
np.ndarray
Author: Rémy Dubois
- illico.utils.math.compute_sparsity(X)[source]¶
Compute sparsity of the data matrix.
- Parameters:
X (np.ndarray | sc_sparse.spmatrix) – Data matrix
- Returns:
Sparsity (fraction of zero elements)
- Return type:
float
Author: Rémy Dubois
- illico.utils.math.chunk_and_fortranize(X, chunk_lb, chunk_ub, indices=None)¶
Vertically chunk the input array and converts it to Fortran-contiguous.
The reason to be of the conversion is that later operations access the columns of this array so F order is advantageous. Also, this function performs one memory allocation instead of 2, which happens if calling np.asfortranarray on top of fancy-indexing.
NB: If indices is None, then all rows are taken as is.
- Parameters:
X (np.ndarray) – Input dense array
chunk_lb (int) – Lower bound of the vertical slicing
chunk_ub (int) – Upper bound of the vertical slicing
indices (np.ndarray) – Indices to reorder rows. There can be less indices than rows in X.
- Returns:
Chunked Fortran-contiguous array with reordered rows.
- Return type:
np.ndarray
Author: Rémy Dubois
- illico.utils.math.compute_batch_bounds(n_genes, batch_size, n_threads)[source]¶
Computes ideal batch bounds for processing genes in batches. This function ensures no worker is starving. This could happen if we have 8 workers but 9 batches to allocate. In this case, because each batch takes the same time to be processed, all but one workers will be idle waiting for one worker to process the last batch.
- Parameters:
n_genes (int) – Total number of genes
batch_size (Literal["auto"] | int) – Batch size, or “auto” to compute ideal batch size.
n_threads (int) – Number of threads to use.
- Returns:
List of (lower_bound, upper_bound) for each batch. Upper bound is excluding, following slicing conventions.
- Return type:
List[Tuple[int, int]]
- illico.utils.ranking.rank_sum_and_ties_from_sorted(A, B)¶
Compute rank sums and tie sums from two 1-d sorted arrays.
This routine is similar to the leetcode “merge two sorted arrays”, except it never returns to sorted array, instead it accumulate rank sums of the second array and tie sums for the combined arrays.
This routine sits at the core of the one-versus-one (or one-versus-control) asymptotic wilcoxon rank sum test as it allows to sort controls only once. :param A: The first sorted array (controls) :type A: np.ndarray :param B: The second sorted array (perturbed) :type B: np.ndarray
- Returns:
Ranks sum from the second array, and tie sums for the combined arrays.
- Return type:
tuple[np.ndarray]
- Parameters:
A (numpy.ndarray)
B (numpy.ndarray)
Author: Rémy Dubois
- illico.utils.ranking.sort_along_axis(X, axis=0)¶
Sort a dense array along a given axis.
- Parameters:
X (np.ndarray) – Input dense array.
axis (int, optional) – Axis along which to sort. Defaults to 0.
- Returns:
Sorted array.
- Return type:
np.ndarray
Author: Rémy Dubois
- illico.utils.ranking.check_if_sorted(arr)¶
Check if an array is sorted. O(n)
- Parameters:
arr (np.ndarray) – 1-d array to check
- Returns:
bool – If sorted or not.
Author (Rémy Dubois)
- Return type:
bool
- illico.utils.ranking.check_indices_sorted_per_parcel(indices, indptr)¶
Check if indices of a sparse array are sorted.
This is esssential if input data is CSR. Indeed, chunking makes use of binary search on indices, which requires sorted indices.
- Parameters:
indices (np.ndarray) – Indices
indptr (np.ndarray) – Indptr
- Returns:
True if all indices subarrays are sorted. False otherwise.
- Return type:
bool