Tutorial 2: scDREAMER-SUP for semi-supervised integration of Human Immune Dataset
This tutorial demonstartes semi-supervised integration of Human Immune dataset. The human immune dataset consists of ~34000 cells sampled from bone marrow and peripheral blood (PMBCs) and are obtained in 10 batches corresponding to different donors.
In this tutorial , we demonstrate semi-supervised integration on human immune dataset where 20% cells have missing labels for all the cell types. The dataset is avalable here https://drive.google.com/drive/folders/1alw75wwWRg9KXopUccPhMh6N3b6dOoE9?usp=sharing
Importing Libraries
[3]:
import warnings
warnings.filterwarnings('ignore')
import os
import scanpy as sc
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
import random
import numpy as np
import tensorflow as tf2
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
Visualization of un integrated Human Immune data
[15]:
adata = sc.read_h5ad("/home/ajita/Documents/data_integration/Immune/Immune_ALL_human.h5ad")
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.pl.umap(adata, color = ["final_annotation"], frameon = False)
sc.pl.umap(adata, color = ["batch"], frameon = False)
WARNING: You’re trying to run this on 12303 dimensions of `.X`, if you really want this, set `use_rep='X'`.
Falling back to preprocessing with `sc.pp.pca` and default params.
Setting seed for reproducibility
[2]:
np.random.seed(0)
tf.set_random_seed(0)
random.seed(0)
tf2.random.set_seed(0)
tf2.keras.utils.set_random_seed(0)
Building model
[3]:
name = "Immune_Human"
"""
NOTE:
Run setting as follows:
0: Supervised setting
10: 10 percent missing labels data
20: 20 percent missing labels data
50: 50 percent missing labels data
"""
run_setting = 20
[4]:
path = "/home/ajita/Documents/data_integration/"
data_path = {
"Immune_Human" : {0: path + "Immune/Immune_ALL_human.h5ad",
10: path + "Immune/Immune_Human_NA_0.1.h5ad",
20: path + "Immune/Immune_Human_NA_0.2.h5ad",
50: path + "Immune/Immune_Human_NA_0.5.h5ad"
}
}
batch_key_dict = {'Immune_Human' : 'batch',
}
cell_type_key_dict = {
"Immune_Human" : {0: "final_annotation", 10: "final_annotation_NA",
20: "final_annotation_NA", 50: "final_annotation_NA"
}
}
lr = {"lr_ae" : 0.0002, "lr_dis": 0.0007}
lr_big_data = {"lr_ae" : 0.0002, "lr_dis": 0.00001}
learning_rate = {
"Immune_Human" : {0: lr, 10: lr, 20: lr, 50: lr }, # Learning rate for small datasets
"Healthy_Heart" : {0: lr_big_data, 20: lr_big_data, 50: lr_big_data}} # Learning rate for large datasets
plot_cell_type_dict = {
"Healthy_Heart" : "celltype",
'Immune_Human' : 'final_annotation',
}
Running scDREAMER-Sup
[5]:
from scDREAMER import scDREAMER_SUP
run_config = tf.ConfigProto()
run_config.gpu_options.per_process_gpu_memory_fraction = 0.333
run_config.gpu_options.allow_growth = True
with tf.Session(config = run_config) as sess:
dreamer = scDREAMER_SUP(
sess,
epoch = 300,
dataset_name = data_path[name][run_setting],
batch = batch_key_dict[name],
cell_type = cell_type_key_dict[name][run_setting],
plot_cell_type = plot_cell_type_dict[name],
name = name,
lr_ae = learning_rate[name][run_setting]['lr_ae'],
lr_dis = learning_rate[name][run_setting]['lr_dis']
)
dreamer.train_cluster()
2023-09-11 19:34:48.475886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5367 MB memory: -> device: 0, name: Quadro RTX 5000, pci bus id: 0000:d8:00.0, compute capability: 7.5
Loading dataset
Preprocessing...
here [2 4 1 ... 1 2 2]
Shape self.data_train: (33506, 2000)
Shape self.data_test: (33506, 2000)
encoder input shape Tensor("concat:0", shape=(?, 2010), dtype=float32)
WARNING:tensorflow:From /home/ajita/anaconda3/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /home/ajita/anaconda3/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
decoder input shape Tensor("concat_2:0", shape=(?, 20), dtype=float32)
KL gaussian z Tensor("mul_12:0", shape=(?,), dtype=float32)
KL gaussian l Tensor("mul_11:0", shape=(?,), dtype=float32)
WARNING:tensorflow:From /home/ajita/anaconda3/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
See `tf.nn.softmax_cross_entropy_with_logits_v2`.
scDREAMER-Sup on DataSet /home/ajita/Documents/data_integration/Immune/Immune_Human_NA_0.2.h5ad ...
2023-09-11 19:35:03.551198: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:353] MLIR V1 optimization pass is not enabled
Epoch : [0] , a_loss = 434.9476
Epoch : [10] , a_loss = 395.4304
Epoch : [20] , a_loss = 382.7641
Epoch : [30] , a_loss = 374.0014
Epoch : [40] , a_loss = 368.1162
Epoch : [50] , a_loss = 362.4543
Epoch : [60] , a_loss = 358.2608
Epoch : [70] , a_loss = 355.1032
Epoch : [80] , a_loss = 352.0520
Epoch : [90] , a_loss = 349.6750
Epoch : [100] , a_loss = 347.8295
Epoch : [110] , a_loss = 346.3599
Epoch : [120] , a_loss = 345.2096
Epoch : [130] , a_loss = 344.0256
Epoch : [140] , a_loss = 342.8842
Epoch : [150] , a_loss = 341.8074
Epoch : [160] , a_loss = 340.9780
Epoch : [170] , a_loss = 340.2975
Epoch : [180] , a_loss = 339.5921
Epoch : [190] , a_loss = 338.9905
Epoch : [200] , a_loss = 338.3954
Epoch : [210] , a_loss = 337.9810
Epoch : [220] , a_loss = 337.4382
Epoch : [230] , a_loss = 336.9385
Epoch : [240] , a_loss = 336.4648
Epoch : [250] , a_loss = 336.0906
Epoch : [260] , a_loss = 335.9613
Epoch : [270] , a_loss = 335.5891
Epoch : [280] , a_loss = 335.1971
Epoch : [290] , a_loss = 334.9298
latent_matrix shape (33506, 10)
(33506,)
None
Done !
[ ]: