Tutorial 2: scDREAMER-SUP for semi-supervised integration of Human Immune Dataset

This tutorial demonstartes semi-supervised integration of Human Immune dataset. The human immune dataset consists of ~34000 cells sampled from bone marrow and peripheral blood (PMBCs) and are obtained in 10 batches corresponding to different donors.

In this tutorial , we demonstrate semi-supervised integration on human immune dataset where 20% cells have missing labels for all the cell types. The dataset is avalable here https://drive.google.com/drive/folders/1alw75wwWRg9KXopUccPhMh6N3b6dOoE9?usp=sharing

Open In Colab

Importing Libraries

[3]:
import warnings
warnings.filterwarnings('ignore')

import os
import scanpy as sc
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
import random
import numpy as np
import tensorflow as tf2
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

Visualization of un integrated Human Immune data

[15]:
adata = sc.read_h5ad("/home/ajita/Documents/data_integration/Immune/Immune_ALL_human.h5ad")
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.pl.umap(adata, color = ["final_annotation"], frameon = False)
sc.pl.umap(adata, color = ["batch"], frameon = False)
WARNING: You’re trying to run this on 12303 dimensions of `.X`, if you really want this, set `use_rep='X'`.
         Falling back to preprocessing with `sc.pp.pca` and default params.
_images/scDREAMER_Sup_runs_6_1.png
_images/scDREAMER_Sup_runs_6_2.png

Setting seed for reproducibility

[2]:

np.random.seed(0) tf.set_random_seed(0) random.seed(0) tf2.random.set_seed(0) tf2.keras.utils.set_random_seed(0)

Building model

[3]:

name = "Immune_Human" """ NOTE: Run setting as follows: 0: Supervised setting 10: 10 percent missing labels data 20: 20 percent missing labels data 50: 50 percent missing labels data """ run_setting = 20
[4]:


path = "/home/ajita/Documents/data_integration/" data_path = { "Immune_Human" : {0: path + "Immune/Immune_ALL_human.h5ad", 10: path + "Immune/Immune_Human_NA_0.1.h5ad", 20: path + "Immune/Immune_Human_NA_0.2.h5ad", 50: path + "Immune/Immune_Human_NA_0.5.h5ad" } } batch_key_dict = {'Immune_Human' : 'batch', } cell_type_key_dict = { "Immune_Human" : {0: "final_annotation", 10: "final_annotation_NA", 20: "final_annotation_NA", 50: "final_annotation_NA" } } lr = {"lr_ae" : 0.0002, "lr_dis": 0.0007} lr_big_data = {"lr_ae" : 0.0002, "lr_dis": 0.00001} learning_rate = { "Immune_Human" : {0: lr, 10: lr, 20: lr, 50: lr }, # Learning rate for small datasets "Healthy_Heart" : {0: lr_big_data, 20: lr_big_data, 50: lr_big_data}} # Learning rate for large datasets plot_cell_type_dict = { "Healthy_Heart" : "celltype", 'Immune_Human' : 'final_annotation', }

Running scDREAMER-Sup

[5]:
from scDREAMER import scDREAMER_SUP


run_config = tf.ConfigProto()

run_config.gpu_options.per_process_gpu_memory_fraction = 0.333
run_config.gpu_options.allow_growth = True

with tf.Session(config = run_config) as sess:

    dreamer = scDREAMER_SUP(
        sess,
        epoch = 300,
        dataset_name = data_path[name][run_setting],
        batch = batch_key_dict[name],
        cell_type = cell_type_key_dict[name][run_setting],
        plot_cell_type = plot_cell_type_dict[name],
        name = name,
        lr_ae = learning_rate[name][run_setting]['lr_ae'],
        lr_dis = learning_rate[name][run_setting]['lr_dis']
        )

    dreamer.train_cluster()
2023-09-11 19:34:48.475886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5367 MB memory:  -> device: 0, name: Quadro RTX 5000, pci bus id: 0000:d8:00.0, compute capability: 7.5
Loading dataset
Preprocessing...
here [2 4 1 ... 1 2 2]
Shape self.data_train: (33506, 2000)
Shape self.data_test: (33506, 2000)
encoder input shape  Tensor("concat:0", shape=(?, 2010), dtype=float32)
WARNING:tensorflow:From /home/ajita/anaconda3/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /home/ajita/anaconda3/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
decoder input shape  Tensor("concat_2:0", shape=(?, 20), dtype=float32)
KL gaussian z Tensor("mul_12:0", shape=(?,), dtype=float32)
KL gaussian l Tensor("mul_11:0", shape=(?,), dtype=float32)
WARNING:tensorflow:From /home/ajita/anaconda3/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py:1176: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

scDREAMER-Sup on DataSet /home/ajita/Documents/data_integration/Immune/Immune_Human_NA_0.2.h5ad ...
2023-09-11 19:35:03.551198: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:353] MLIR V1 optimization pass is not enabled
Epoch : [0] ,  a_loss = 434.9476
Epoch : [10] ,  a_loss = 395.4304
Epoch : [20] ,  a_loss = 382.7641
Epoch : [30] ,  a_loss = 374.0014
Epoch : [40] ,  a_loss = 368.1162
Epoch : [50] ,  a_loss = 362.4543
Epoch : [60] ,  a_loss = 358.2608
Epoch : [70] ,  a_loss = 355.1032
Epoch : [80] ,  a_loss = 352.0520
Epoch : [90] ,  a_loss = 349.6750
Epoch : [100] ,  a_loss = 347.8295
Epoch : [110] ,  a_loss = 346.3599
Epoch : [120] ,  a_loss = 345.2096
Epoch : [130] ,  a_loss = 344.0256
Epoch : [140] ,  a_loss = 342.8842
Epoch : [150] ,  a_loss = 341.8074
Epoch : [160] ,  a_loss = 340.9780
Epoch : [170] ,  a_loss = 340.2975
Epoch : [180] ,  a_loss = 339.5921
Epoch : [190] ,  a_loss = 338.9905
Epoch : [200] ,  a_loss = 338.3954
Epoch : [210] ,  a_loss = 337.9810
Epoch : [220] ,  a_loss = 337.4382
Epoch : [230] ,  a_loss = 336.9385
Epoch : [240] ,  a_loss = 336.4648
Epoch : [250] ,  a_loss = 336.0906
Epoch : [260] ,  a_loss = 335.9613
Epoch : [270] ,  a_loss = 335.5891
Epoch : [280] ,  a_loss = 335.1971
Epoch : [290] ,  a_loss = 334.9298
latent_matrix shape (33506, 10)
(33506,)
_images/scDREAMER_Sup_runs_13_4.png
None
_images/scDREAMER_Sup_runs_13_6.png
Done !
[ ]: