Using CellPipeline#

A cell type specific analysis and visualization tool for the gene of interest#

This notebook is built to be run automatically, you can just “Run All” cells. Beware: this requires some patience and high computational resources at the moment.

First, the data and package are loaded. This may take a minute. Set your gene of interest (GOI) here!

[10]:
import sys
sys.path.append('/lustre/groups/ml01/workspace/samantha.bening/Bachelor/')
from importlib import reload
import genereporter.cell_pipeline as cp
reload(cp)

cp = cp.CellPipeline("/lustre/groups/ml01/workspace/samantha.bening/Bachelor/", "data2/veo_ibd_balanced.h5ad")

# set your gene of interest
GOI = "CASP8"
# set your cell type of interest
cell_type = 'CD4 T'

Below is a list of possible coarse (level 1) cell types. Choose one of these as your cell type of interest above (cell_type = ‘[your cell type]’) to run the notebook automatically. Of course, you can rerun certain outputs on different cell types as well.

[2]:
# print cell type names here; easier to select
print(f"Coarse cell types: ")
for cell_type in cp.adata.obs['celltype_l2'].unique():
    print(f"\t{str(cell_type)}")
Coarse cell types:
        Pericyte
        B
        Endothelial
        CD4 T
        CD8 T
        NK_ILC
        Fibroblast
        Cycling B
        Plasma
        Cycling Myeloid
        Cycling Stroma
        Cycling T
        Epithelial
        Glial
        Myeloid
        Tuft
        Smooth Muscle Cell
        pDC
        Mast
[3]:
# UMAP of coarse cell types
cp.plot_umap(color="celltype_l2")
_images/Cell_Example_5_0.png

Next, we provide a quick summary of the GOI’s expression class and mean expression level across all cell types.

[4]:
expr_sum = cp.explain_expr_celltypes(GOI='CASP8')
expr_sum
[4]:
Cell type Expression class Avg. expression over cell type
CASP8 pDC low 0.349
CASP8 CD4 T very low 0.266
CASP8 Cycling T very low 0.262
CASP8 CD8 T very low 0.243
CASP8 NK_ILC very low 0.240
CASP8 Mast very low 0.202
CASP8 B very low 0.140
CASP8 Cycling B very low 0.122
CASP8 Cycling Myeloid very low 0.109
CASP8 Plasma very low 0.109
CASP8 Tuft very low 0.108
CASP8 Myeloid very low 0.104
CASP8 Epithelial very low 0.091
CASP8 Endothelial very low 0.064
CASP8 Cycling Stroma very low 0.056
CASP8 Pericyte very low 0.037
CASP8 Fibroblast very low 0.037
CASP8 Glial very low 0.023
CASP8 Smooth Muscle Cell very low 0.013
[11]:
cp.plot_expressions(GOI, cell_type=cell_type, show_summary=True)
# Can change show_summary=False to hide the textual summary of the expression classes (quantile thresholds and cell counts per category)
_images/Cell_Example_8_0.png
Summary for all cells:
Quantile thresholds:
very low: 96.2325, low: 98.8921, middle: 99.4425, high: 99.7479, very high: 99.7500

Number of genes per category:
very_low: 27101
low: 749
middle: 155
high: 86
very_high: 71


Summary for CD4 T cells:
Quantile thresholds:
very low: 96.5912, low: 98.988, middle: 99.4709, high: 99.7479, very high: 99.7500

Number of genes per category:
very_low: 27202
low: 675
middle: 136
high: 78
very_high: 71

Expression vs. Detection visualization#

This can contextualize the expression levels we observe in the standard scanpy plots. In single-cell RNA-seq, only a random sampling of the RNA present in a cell is selected to be sequenced. By pure chance, lowly expressed genes may not be present in all the sampled RNA due to their low prevalance. Here, we can inspect the maximum percentage of expression expected in all genes, specifically our gene of interest.

[12]:
cp.expression_vs_detection(GOI, cell_type=cell_type)
# Can add (or remove) "cell_type=cell_type" to plot only the cell type of interest (or across all cell types)
# todo this section before dotplots etc.
_images/Cell_Example_10_0.png

Automatically identify lower outliers (clue to look at celltype subset)#

[13]:
cp.plot_outliers(GOI, outlier_threshold=0.1, cell_type=cell_type)
# Can add "cell_type=cell_type" to plot only the cell type of interest
_images/Cell_Example_12_0.png

This is how the maximum threshold curve approximation is calculated. This is primarily interesting for our fundamental understanding of the curve’s approximation through the spline’s 3rd derivative’s change points and the linear approximation of this curve.

[14]:
cp.fit_spline(plot=True, cell_type=cell_type)
_images/Cell_Example_14_0.png

These are the top 5 number of outliers, sorted by their distance away from the maximum curve. You can show more or less by changing the head=n parameter.

[15]:
cp.list_outliers(cell_type=cell_type)

# can show top n number of genes by adding "head=n"
[15]:
log1p(means) percent_detected distance is_outlier
HSPA1A 1.041783 0.459530 0.394890 True
HSPA1B 0.911569 0.453423 0.339097 True
IGKC 0.581478 0.103288 0.320663 True
KLF2 0.865623 0.474351 0.299710 True
CCL4 0.509195 0.062818 0.299202 True

GOI expression across cell types#

Now we show the standard scanpy plots of our GOI’s expression across both coarse cell types and fine cell types. The fine cell type automatically shown in the one you set at the beginning of this notebook. You can rerun the cell with other cell types of interest by setting the cell_type=[‘your cell type’] parameter.

[23]:
# GOI expression across coarse cell types
cp.dotplot(GOI)
_images/Cell_Example_18_0.png
[17]:
# GOI expression in fine cell type
cp.dotplot(GOI, cell_type=cell_type)
_images/Cell_Example_19_0.png
[21]:
# GOI expression across coarse cell types
# This is similar to the coarse cell type dotplot previously, just a different visualization
cp.matrixplot(GOI)
_images/Cell_Example_20_0.png
[19]:
# GOI expression across coarse cell types
# Individual vertical "lines" correspond to individual cells
# A more fine grained visual than the mean expression plots shown before
cp.heatmap(GOI)
_images/Cell_Example_21_0.png