In this Notebook we use scMRA to reconstruct a signaling network of the EGFR-pathway. Therefore we use simulated single-cell (phospho)-protein data of the Orton Model. The data contains measurements of phospho- and total protein for 250 cells and eleven proteins.
import scmra
import pandas as pd
import numpy as np
import seaborn as sns; sns.set_theme()
import matplotlib.pyplot as plt
from IPython.display import display, HTML
import networkx as nx
#read in data
data = pd.read_csv("../data/scData_wildtype.tsv", sep='\t')
The input file contains cells in the rows and protein counts in the columns. Modular Response Analysis (MRA) explains interaction strenghts as the partial derivatives of phospho-protein with respect to its upstream phospho-protein and uses the deviation of phospho-protein to a perturbation as input. Similarly, we use the stochastic variability of total protein between cells as 'natural perturbation experiments' and as a result measure the deviation of a cells phospho-protein from the cell-population mean. In MRA the measured responses to a perturbation are termed global response coefficient. Based on that we term the deviation of a cells phospho-protein from the mean 'rglob', and the stoachstic 'natural perturbation' - the deviation of total protein from the population mean - 'rtot'.
#add a SC-ID
data.index = ["ID."+str(i) for i in range(data.shape[0])]
#calculate input: deviatipon of protein from cell population mean
dat_scaled = ((data - data.median())/data.median()).transpose()
#get active protein deviations (frmo those we infer interaction strengths)
rglob = dat_scaled.loc[[n for n in list(dat_scaled.index) if ("Active" in n)]]
rglob.index = [i.replace("Active", "") for i in rglob.index]
#get total protein deviations: 'natural perturbations'
rtot = dat_scaled.loc[[n for n in list(dat_scaled.index) if ("Tot" in n)]]
rtot.index = [i.replace("Tot", "") for i in rtot.index]
display(rglob.head)
<bound method NDFrame.head of ID.0 ID.1 ID.2 ID.3 ID.4 ID.5 ID.6 \ Sos -0.051155 0.085232 0.008321 0.073530 0.037336 0.077462 0.092302 Ras -0.076395 0.054095 -0.007728 0.025145 0.028869 0.058157 0.088900 Raf1 -0.103680 0.069777 0.053183 0.041268 0.066881 0.064294 0.005852 Mek -0.033101 -0.152314 -0.050093 -0.128995 0.069491 -0.148370 0.166537 Erk -0.106257 -0.177075 0.030958 -0.113891 0.051923 -0.161771 0.092711 P90Rsk -0.129504 -0.117604 -0.007665 -0.074405 -0.126830 -0.019424 0.120107 PI3K 0.142959 0.021876 -0.143707 0.023346 0.029151 0.198616 0.008005 Akt 0.116549 -0.098978 -0.143954 -0.084988 0.046098 0.104219 0.110371 C3G 0.007717 -0.082415 -0.006071 -0.138128 0.056060 -0.083842 -0.024886 Rap1 -0.005878 -0.069144 0.016994 -0.137141 0.027891 -0.114642 0.005731 bRaf 0.072555 -0.084724 -0.030218 -0.107014 0.048830 -0.116599 0.097450 ID.7 ID.8 ID.9 ... ID.240 ID.241 ID.242 \ Sos -0.008603 -0.222532 0.126810 ... -0.094700 -0.237459 -0.100487 Ras 0.004750 -0.215548 0.117134 ... -0.117030 -0.264199 -0.082735 Raf1 0.066933 -0.259652 0.135834 ... -0.178845 -0.324005 -0.072003 Mek 0.058568 0.107128 -0.142194 ... 0.144093 0.070434 0.045346 Erk 0.024951 0.221935 -0.127559 ... 0.143330 0.013953 0.156041 P90Rsk 0.015984 0.237644 -0.188018 ... -0.059097 -0.065432 0.160903 PI3K 0.087303 -0.074450 -0.020944 ... -0.029353 -0.002874 0.065850 Akt -0.176313 0.044074 -0.082037 ... 0.116220 0.103712 -0.074635 C3G 0.005292 0.115988 0.113998 ... 0.129428 0.128431 -0.017273 Rap1 0.009414 0.123415 0.086880 ... 0.135517 0.103057 -0.037426 bRaf 0.006757 0.086650 0.085719 ... 0.119990 0.073498 0.017430 ID.243 ID.244 ID.245 ID.246 ID.247 ID.248 ID.249 Sos -0.036475 -0.165083 -0.107879 -0.035377 0.137442 -0.172961 0.087074 Ras -0.020628 -0.175497 -0.123056 -0.007759 0.112728 -0.195095 0.052859 Raf1 0.006394 -0.207331 -0.106979 -0.037144 0.126883 -0.212635 0.137019 Mek -0.002845 0.109314 -0.093608 -0.074491 0.002844 0.118653 0.027718 Erk 0.097090 0.088938 -0.045883 -0.029147 0.009935 0.135879 0.079179 P90Rsk 0.065053 0.084095 0.037934 0.039865 -0.026981 0.207729 -0.072461 PI3K -0.032113 -0.083988 0.228363 -0.054729 0.055028 0.149363 0.021098 Akt 0.039172 -0.015887 -0.040904 0.200536 0.037683 -0.029558 -0.143577 C3G -0.081234 0.001695 0.079890 -0.102714 -0.138296 0.202136 -0.082332 Rap1 -0.080556 -0.001542 0.051694 -0.083326 -0.155609 0.199343 -0.076946 bRaf -0.050971 0.000265 0.014805 -0.022855 -0.099917 0.113333 -0.068884 [11 rows x 250 columns]>
scMRA penalizes the number of edges in the network with the η-parameter. When the number of edges in the network is not know we recommend to pick a number of edges in the network such that modeling additional interactions would lead to only a marginal reduction in the mean squared error.
%%capture
etaRange = numbers = np.linspace(0.0001, 0.75, 20)
dfEdgeEstimation = pd.DataFrame(columns=['Eta', 'MSE', '#Edge'])
for eta in etaRange:
#make MRA simulation
scd = scmra.ScData(rglob=rglob, rtot = rtot)
p = scmra.ScMraProblem(scd, eta=eta)
p.cpx.solve()
result = scmra.ScMraResult(p)
#store number of edges/ Mean Sqared Error/ penalty
n_res = np.size(result.residuals_complete) + np.size(result.residuals_incomplete)
error = np.sum(np.array(np.square(result.residuals_complete)))
error += np.sum(np.array(np.square(result.residuals_incomplete)))
error = error/n_res
edgeNum = result.imap.sum().sum()
row = {'Eta': [eta], 'MSE': [error], '#Edge': [edgeNum]}
dfEdgeEstimation = pd.concat([dfEdgeEstimation, pd.DataFrame(row)])
with plt.style.context("seaborn-whitegrid"):
plt.rcParams["axes.edgecolor"] = "0.15"
plt.rcParams["axes.linewidth"] = 1.25
f, ax = plt.subplots(figsize=(5, 5))
wt = ax.scatter(dfEdgeEstimation["#Edge"], dfEdgeEstimation["MSE"], color = "#0138FE", s = 150)
ax.plot(ls="--", c=".3")
plt.xlabel('#Edges', size = 24)
plt.ylabel('Mean Squared Error', size = 24)
plt.show()
display(dfEdgeEstimation)
Eta | MSE | #Edge | |
---|---|---|---|
0 | 0.000100 | 0.000058 | 25 |
0 | 0.039568 | 0.000079 | 11 |
0 | 0.079037 | 0.000118 | 10 |
0 | 0.118505 | 0.000118 | 10 |
0 | 0.157974 | 0.000379 | 8 |
0 | 0.197442 | 0.000379 | 8 |
0 | 0.236911 | 0.000379 | 8 |
0 | 0.276379 | 0.000379 | 8 |
0 | 0.315847 | 0.000379 | 8 |
0 | 0.355316 | 0.000379 | 8 |
0 | 0.394784 | 0.000379 | 8 |
0 | 0.434253 | 0.000931 | 7 |
0 | 0.473721 | 0.001495 | 6 |
0 | 0.513189 | 0.001495 | 6 |
0 | 0.552658 | 0.001495 | 6 |
0 | 0.592126 | 0.003341 | 4 |
0 | 0.631595 | 0.005650 | 2 |
0 | 0.671063 | 0.005650 | 2 |
0 | 0.710532 | 0.008861 | 0 |
0 | 0.750000 | 0.008861 | 0 |
Based on the above plot we choose an η-parameter of 0.1 to reconstruct a network with 10 edges as identifying more edges in the network seems to only decrease the MSE marginally. The interaction strenghts between proteins can be retrieved from the 'scMRAResult' class. In traditional MRA the inferred interaction strenghts are termed local response coefficients - opposed to the measurable perturbation induced global response coeeficients. Based on that the 'rloc' variable of the 'scMRAResult' holds a weighted adjacency matrix of the inferred network topology. Here we visualize the resulting network topology and interaction strenght with networkX.
%%capture
#make MRA simulation
eta = 0.1
scd = scmra.ScData(rglob=rglob, rtot = rtot)
p = scmra.ScMraProblem(scd, eta=eta)
p.cpx.solve()
result = scmra.ScMraResult(p)
display(result.rloc)
#diagonal rloc entries are per definition -1, replace those with 0 to plot network
A = result.rloc.transpose()
A[A == -1] = 0
G = nx.from_pandas_adjacency(A, create_using=nx.DiGraph)
pos=nx.shell_layout(G)
nx.draw_networkx(G, with_labels=True, node_size = 750, pos = pos, node_color = "yellow")
#create edge labels
edgeLabels = {}
edgeColor = {}
for row_label, row_data in A.iterrows():
for col_label, value in row_data.items():
key = (col_label, row_label)
if(np.abs(value) > 0):
edgeLabels[key] = round(value,1)
nx.draw_networkx_edge_labels(
G, pos,
edge_labels=edgeLabels,
font_color="black"
)
plt.show()
Sos | Ras | Raf1 | Mek | Erk | P90Rsk | PI3K | Akt | C3G | Rap1 | bRaf | |
---|---|---|---|---|---|---|---|---|---|---|---|
Sos | -1.000000 | 0.0000 | 0.0 | 0.000000 | 0.00000 | -0.798104 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Ras | 0.997905 | -1.0000 | 0.0 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Raf1 | 0.000000 | 1.0035 | -1.0 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | -0.401177 | 0.000000 | 0.000000 | 0.000000 |
Mek | 0.000000 | 0.0000 | 0.0 | -1.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.879479 |
Erk | 0.000000 | 0.0000 | 0.0 | 1.002274 | -1.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
P90Rsk | 0.000000 | 0.0000 | 0.0 | 0.000000 | 0.93911 | -1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
PI3K | 0.000000 | 0.0000 | 0.0 | 0.000000 | 0.00000 | 0.000000 | -1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Akt | 0.000000 | 0.0000 | 0.0 | 0.000000 | 0.00000 | 0.000000 | 0.379765 | -1.000000 | 0.000000 | 0.000000 | 0.000000 |
C3G | 0.000000 | 0.0000 | 0.0 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | -1.000000 | 0.000000 | 0.000000 |
Rap1 | 0.000000 | 0.0000 | 0.0 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 1.000372 | -1.000000 | 0.000000 |
bRaf | 0.000000 | 0.0000 | 0.0 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.776643 | -1.000000 |
Additionally we can add a prior network or perturbations to the optimization. #ToDo: add function to include treatment: cell_annot:{'inh' : CellIds}, tx_annot:{'inh': 'protName'}
To add a prior network, we need to set the 'prior_network' variable of the scMRAProblem(or scCNRProblem).
This variable stores edges between nodes and is a list of pairs in the following format (<to_node,
To add additionally perturbations we have to set two variables to match cells to their treatment, and treatments to the corresponding protein that was perturbed:
The names for the cellID, protein, are the same as the ones in 'rglob', 'rtot'.