API Reference

illico package

ovo subpackage

ovr subpackage

utils

class illico.utils.groups.GroupContainer(encoded_groups, counts, indices, indptr, encoded_ref_group)
counts

Alias for field number 1

encoded_groups

Alias for field number 0

encoded_ref_group

Alias for field number 4

indices

Alias for field number 2

indptr

Alias for field number 3

illico.utils.groups.encode_and_count_groups(groups, ref_group)[source]

Build the GroupContainer holding all group-related information.

GroupContainer holds: - original group information - reference group (control) - encoded groups - unique raw groups - counts (of cell, per group) - indices, indptr in a RLE format - encoded reference group (control)

Parameters:
  • groups (np.ndarray) – 1-d array holding group labels, one per cell

  • ref_group (Any) – Flag

Returns:

Array of unique group labels GroupContainer: GroupContainer holding all group-related information.

Return type:

unique_groups (np.ndarray)

Author: Rémy Dubois

illico.utils.math.diff(x)

Equivalent of np.diff.

For some reasons, np.ndiff failed to compile properly.

Parameters:

x (np.ndarray) – Input 1-d array

Returns:

Results diff array, of size x.size - 1.

Return type:

np.ndarray

Author: Rémy Dubois

illico.utils.math.compute_pval(n_ref, n_tgt, n, tie_sum, U, mu, contin_corr=0.0, alternative='two-sided')

Compute p-value.

This small piece of code was isolated here because it was duplicated in the all six routines.

Parameters:
  • n_ref (int) – Number of reference (control) values (cells)

  • n_tgt (int) – Number of perturbed (targeted) values (cells)

  • n (int) – Total number of values (cells)

  • tie_sum (float) – Tie sum

  • U (float) – U-statistic

  • mu (float) – Mean

  • contin_corr (float, optional) – Continuity correction factor. Defaults to 0.0.

  • alternative (Literal["two-sided", "less", "greater"]) – Type of alternative hypothesis.

Returns:

P-value and z-score

Return type:

tuple[float]

Author: Rémy Dubois

illico.utils.math.sampled_max(data, sample_size=200000)
Parameters:
  • data (numpy.ndarray)

  • sample_size (int)

Return type:

float

illico.utils.math.fold_change_from_summed_expr(group_agg_counts, grpc, exp_post_agg)

Compute fold change from summed expression values, per group.

Parameters:
  • group_agg_counts (np.ndarray) – Sum of expression values of shape (n_groups, n_genes)

  • grpc (GroupContainer) – GroupContainer holding group information

  • exp_post_agg (bool) – Whether to exponentiate the fold change after aggregation. This is relevant if the input data is log1p. See documentation for details.

Returns:

Fold change values of shape (n_groups, n_genes)

Return type:

np.ndarray

Author: Rémy Dubois

illico.utils.math.dense_fold_change(X, grpc, is_log1p, exp_post_agg)

Compute fold change from a dense array of expression counts.

Parameters:
  • X (np.ndarray) – Expression counts

  • grpc (GroupContainer) – GroupContainer holding group information

  • is_log1p (bool) – User-indicated flag if data is log1p or not.

  • exp_post_agg (bool) – Whether to exponentiate the fold change after aggregation. This is relevant if the input data is log1p. See documentation for details.

Returns:

Fold change values of shape (n_groups, n_genes)

Return type:

np.ndarray

Author: Rémy Dubois

illico.utils.math.compute_sparsity(X)[source]

Compute sparsity of the data matrix.

Parameters:

X (np.ndarray | sc_sparse.spmatrix) – Data matrix

Returns:

Sparsity (fraction of zero elements)

Return type:

float

Author: Rémy Dubois

illico.utils.math.chunk_and_fortranize(X, chunk_lb, chunk_ub, indices=None)

Vertically chunk the input array and converts it to Fortran-contiguous.

The reason to be of the conversion is that later operations access the columns of this array so F order is advantageous. Also, this function performs one memory allocation instead of 2, which happens if calling np.asfortranarray on top of fancy-indexing.

NB: If indices is None, then all rows are taken as is.

Parameters:
  • X (np.ndarray) – Input dense array

  • chunk_lb (int) – Lower bound of the vertical slicing

  • chunk_ub (int) – Upper bound of the vertical slicing

  • indices (np.ndarray) – Indices to reorder rows. There can be less indices than rows in X.

Returns:

Chunked Fortran-contiguous array with reordered rows.

Return type:

np.ndarray

Author: Rémy Dubois

illico.utils.math.compute_batch_bounds(n_genes, batch_size, n_threads)[source]

Computes ideal batch bounds for processing genes in batches. This function ensures no worker is starving. This could happen if we have 8 workers but 9 batches to allocate. In this case, because each batch takes the same time to be processed, all but one workers will be idle waiting for one worker to process the last batch.

Parameters:
  • n_genes (int) – Total number of genes

  • batch_size (Literal["auto"] | int) – Batch size, or “auto” to compute ideal batch size.

  • n_threads (int) – Number of threads to use.

Returns:

List of (lower_bound, upper_bound) for each batch. Upper bound is excluding, following slicing conventions.

Return type:

List[Tuple[int, int]]

illico.utils.ranking.rank_sum_and_ties_from_sorted(A, B)

Compute rank sums and tie sums from two 1-d sorted arrays.

This routine is similar to the leetcode “merge two sorted arrays”, except it never returns to sorted array, instead it accumulate rank sums of the second array and tie sums for the combined arrays.

This routine sits at the core of the one-versus-one (or one-versus-control) asymptotic wilcoxon rank sum test as it allows to sort controls only once. :param A: The first sorted array (controls) :type A: np.ndarray :param B: The second sorted array (perturbed) :type B: np.ndarray

Returns:

Ranks sum from the second array, and tie sums for the combined arrays.

Return type:

tuple[np.ndarray]

Parameters:
  • A (numpy.ndarray)

  • B (numpy.ndarray)

Author: Rémy Dubois

illico.utils.ranking.sort_along_axis(X, axis=0)

Sort a dense array along a given axis.

Parameters:
  • X (np.ndarray) – Input dense array.

  • axis (int, optional) – Axis along which to sort. Defaults to 0.

Returns:

Sorted array.

Return type:

np.ndarray

Author: Rémy Dubois

illico.utils.ranking.check_if_sorted(arr)

Check if an array is sorted. O(n)

Parameters:

arr (np.ndarray) – 1-d array to check

Returns:

  • bool – If sorted or not.

  • Author (Rémy Dubois)

Return type:

bool

illico.utils.ranking.check_indices_sorted_per_parcel(indices, indptr)

Check if indices of a sparse array are sorted.

This is esssential if input data is CSR. Indeed, chunking makes use of binary search on indices, which requires sorted indices.

Parameters:
  • indices (np.ndarray) – Indices

  • indptr (np.ndarray) – Indptr

Returns:

True if all indices subarrays are sorted. False otherwise.

Return type:

bool

class illico.utils.type.TestResults(statistic, pvalue)
pvalue

Alias for field number 1

statistic

Alias for field number 0