Using GRNPipeline

A GRN analysis and visualization tool for the gene of interest

First, the data and package are loaded. This may take a minute. Set your gene of interest (GOI) here!

[1]:
%%time
import sys
sys.path.append('/lustre/groups/ml01/workspace/christopher.lance/genereporter/')
from importlib import reload
import genereporter.grn_pipeline as ggModule
reload(ggModule)

# set GOI
GOI = 'ACTG2'
CPU times: user 2.35 s, sys: 2.61 s, total: 4.96 s
Wall time: 1min 31s
[2]:
%%time
gg = ggModule.GRNPipeline(wdir="/lustre/groups/ml01/workspace/christopher.lance/genereporter/",
                          adata="scenic_output/veo_ibd_balanced_aucell_final.h5ad",
                          f_adj='scenic_output/TFtg_adj.csv.gz',
                          f_reg='scenic_output/regulons_output.csv.gz',
                          dir_gg_adj = 'scenic_output/gg_adj/',
                          gg_adj_files=['gg_adj_B_Cell.csv.gz', 'gg_adj_Epithelium.csv.gz',
                                        'gg_adj_Myeloid.csv.gz', 'gg_adj_Stroma.csv.gz', 'gg_adj_T_Cell.csv.gz'])
Read in adata.
Cleaned regulon.
Loaded reactome, regulon geneset, and geneset data.
Loaded gene-gene adjacency data.
CPU times: user 8.14 s, sys: 7.1 s, total: 15.2 s
Wall time: 23.7 s

A ranked list of the REACTOME pathway and GRN regulon gene sets that include the GOI. The ranking is based on the (absolute value of the) Spearman’s correlation coefficient between the genesets AUCell score and the GOI’s expression across all cells.

[3]:
pathways_goi = gg.get_goi_pathways(GOI)
pathways_goi
[3]:
geneset genesymbol correlation
0 PRDM6_REGULON ACTG2 0.270770
1 REACTOME_SMOOTH_MUSCLE_CONTRACTION ACTG2 0.254986
2 REACTOME_MUSCLE_CONTRACTION ACTG2 0.247783
3 NR2F1_REGULON ACTG2 0.240771

We show the UMAPs of all the cells, first colored by cell type, then by the GOI expression, followed by the top four gene sets (according to the previous ranking). The color of the pathways corresponds to the AUCell score of that gene set in every cell.

[3]:
gg.plot_pathways(pathways_goi, GOI)
_images/GRN_Example_7_0.png

Regulon-level gene set enrichment analysis

Now we look into which pathways are significantly enriched in a specific gene set of interest. You can pick which gene set to look at by setting the regulon parameter: regulon=[‘your gene set’] We recommend looking at regulons, but you can also look at a REACTOME gene set. Hover over the individual pathways plotted to see more information.

[4]:
gprofiler = gg.gGOSt(regulon='PRDM6_REGULON') # set gene set of interest here

TF-Target Gene Network Visualization

Now we can begin to look into specific TF-GOI pairs in the regulon network.

[5]:
# GOI text summary:
goi_grn = gg.make_goi_grn(GOI=GOI)
gg.GOI_network_stats(goi_grn, GOI)
gg.format_gene_summary(goi_grn, GOI)
Summary of ACTG2:

There are 2 regulons that have ACTG2 in their target genes.

Regulons that have ACTG2 in their target genes:

        (TF: GRNBoost2 Importance Score)
        NR2F1: 0.491
        PRDM6: 0.038


There are 95 TFs for ACTG2 that were NOT supported by a regulon (motif analysis),
here are the top 10:

        (TF: GRNBoost2 Importance Score)
        MYLK: 2.776
        RARB: 1.735
        MSRB3: 1.578
        ZNF239: 1.496
        FOXF2: 0.968
        LUZP2: 0.882
        NFATC4: 0.8
        PITX1: 0.779
        ZNF841: 0.741
        SP100: 0.67
        ZNF154: 0.667

ACTG2:

        Actins are highly conserved proteins that are involved in various types of
        cell motility and in the maintenance of the cytoskeleton. Three types of
        actins, alpha, beta and gamma, have been identified in vertebrates. Alpha actins
        are found in muscle tissues and are a major constituent of the
        contractile apparatus. The beta and gamma actins co-exist in most cell types
        as components of the cytoskeleton and as mediators of internal cell motility.
        This gene encodes actin gamma 2; a smooth muscle actin found in
        enteric tissues. Alternative splicing results in multiple transcript variants encoding distinct isoforms.
        Based on similarity to peptide cleavage of related actins, the mature protein
        of this gene is formed by removal of two N-terminal peptides.[provided by
        RefSeq, Dec 2010]

NR2F1:

        The protein encoded by this gene is a nuclear hormone receptor and
        transcriptional regulator. The encoded protein acts as a homodimer and binds to
        5'-AGGTCA-3' repeats. Defects in this gene are a cause of Bosch-Boonstra optic
        atrophy syndrome (BBOAS). [provided by RefSeq, Apr 2014]

PRDM6:

        No summary available for PRDM6.

Gene-gene Co-Expression Network

We can also look into not only TF-target gene regulatory relationships, but simple gene-gene co-expression patterns. Here, we show a gene-gene co-expression matrix for each very coarse cell lineage: B Cells, Epithelium Cells, Myeloid Cells, Stroma Cells, and T Cells. Each network is calculated using only gene expression values within these cell lineages. This means that the results here are cell-lineage specific.

[6]:
# first: look at importance score distribution for each cell lineage
# set log_scale=True for a log scale of the y-axis (Number of genes)
# set xlim=x for a view of the x-axis from 0 to x
gg.genegene_importance_histograms(log_scale=False, xlim=10)
_images/GRN_Example_13_0.png

You can visualize either the regulon (TF-GOI regulating pairs) or gene-gene (GOI-gene coexpressed pairs) network here. Set the parameter to ‘regulon’ or ‘gene_gene’ for the desired network visualization. The output is interactive, so hover over nodes to see their gene set and move nodes around for a better view!

[ ]:
# network visualization
# type = 'gene_gene' for the GOI-gene co-expression network
# type = 'regulon' for the TF-GOI regulon network
# top_n = n: only show top n direct target genes of each TF or gene

gg.show_network(GOI, type='regulon', top_n=5) # only works on jupyter notebook running in browser (not in VS Code for example)

We apply the same principle of gene set enrichment analysis within the genes that are co-expressed with the GOI in each distinct cell lineage. We simply provide a list of the top three (or less) significantly enriched pathways from REACTOME and KEGG for each gene set, i.e. each set of genes co-expressed with the GOI within each cell lineage.

[7]:
gg.gGOSt_listed(GOI)
Top 3 significantly differentially expressed pathways in genes co-expressed with ACTG2, per cell lineage.
This is in REACTOME and KEGG databases.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[7], line 1
----> 1 gg.gGOSt_listed(GOI)

File /lustre/groups/ml01/workspace/samantha.bening/Bachelor/genereporter/grn_pipeline.py:759, in GRNPipeline.gGOSt_listed(self, GOI)
    750 new_list = np.unique(new_list).tolist()
    751 r = requests.post(
    752     url='https://biit.cs.ut.ee/gprofiler/api/gost/profile/',
    753     json={
   (...)
    757     }
    758     )
--> 759 result = r.json()['result']
    760 result = pd.DataFrame(result)
    761 # sort by p-value

KeyError: 'result'