Tutorial 1: scDREAMER unsupervised integration of Human Immune dataset

Here we present a COLAB tutorial on an unsupervised integration of Human Immune dataset. The dataset is available in the drive location https://drive.google.com/drive/folders/1alw75wwWRg9KXopUccPhMh6N3b6dOoE9?usp=sharing.

Clone scDREAMER Github repository

[ ]:
# Restart runtime after every run.
!git clone https://github.com/Zafar-Lab/scDREAMER.git
%cd scDREAMER/
!ls
fatal: destination path 'scDREAMER' already exists and is not an empty directory.
/content/scDREAMER
architecture.png  LICENSE    scDREAMER_runs.ipynb
docs              README.md  scDREAMER_SUP
Environments      scDREAMER  scDREAMER_Sup_runs.ipynb

Mounting Google drive for accessing the input data

[ ]:
from google.colab import drive

drive.mount("/content/drive")
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

Installing required libraries for running scDREAMER

[ ]:
# Restart the run post installation of below libraries
!pip install -U scipy==1.10.1
!pip install scanpy==1.9.3
architecture.png  LICENSE    scDREAMER_runs.ipynb
docs              README.md  scDREAMER_SUP
Environments      scDREAMER  scDREAMER_Sup_runs.ipynb
Requirement already satisfied: scipy==1.10.1 in /usr/local/lib/python3.10/dist-packages (1.10.1)
Requirement already satisfied: numpy<1.27.0,>=1.19.5 in /usr/local/lib/python3.10/dist-packages (from scipy==1.10.1) (1.23.5)
Requirement already satisfied: scanpy==1.9.3 in /usr/local/lib/python3.10/dist-packages (1.9.3)
Requirement already satisfied: anndata>=0.7.4 in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (0.9.2)
Requirement already satisfied: numpy>=1.17.0 in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (1.23.5)
Requirement already satisfied: matplotlib>=3.4 in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (3.7.1)
Requirement already satisfied: pandas>=1.0 in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (1.5.3)
Requirement already satisfied: scipy>=1.4 in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (1.10.1)
Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (0.12.2)
Requirement already satisfied: h5py>=3 in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (3.9.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (4.66.1)
Requirement already satisfied: scikit-learn>=0.22 in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (1.2.2)
Requirement already satisfied: statsmodels>=0.10.0rc2 in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (0.14.0)
Requirement already satisfied: patsy in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (0.5.3)
Requirement already satisfied: networkx>=2.3 in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (3.1)
Requirement already satisfied: natsort in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (8.4.0)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (1.3.2)
Requirement already satisfied: numba>=0.41.0 in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (0.56.4)
Requirement already satisfied: umap-learn>=0.3.10 in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (0.5.3)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (23.1)
Requirement already satisfied: session-info in /usr/local/lib/python3.10/dist-packages (from scanpy==1.9.3) (1.0.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.4->scanpy==1.9.3) (1.1.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.4->scanpy==1.9.3) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.4->scanpy==1.9.3) (4.42.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.4->scanpy==1.9.3) (1.4.5)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.4->scanpy==1.9.3) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.4->scanpy==1.9.3) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.4->scanpy==1.9.3) (2.8.2)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba>=0.41.0->scanpy==1.9.3) (0.39.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from numba>=0.41.0->scanpy==1.9.3) (67.7.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.0->scanpy==1.9.3) (2023.3.post1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.22->scanpy==1.9.3) (3.2.0)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from patsy->scanpy==1.9.3) (1.16.0)
Requirement already satisfied: pynndescent>=0.5 in /usr/local/lib/python3.10/dist-packages (from umap-learn>=0.3.10->scanpy==1.9.3) (0.5.10)
Requirement already satisfied: stdlib-list in /usr/local/lib/python3.10/dist-packages (from session-info->scanpy==1.9.3) (0.9.0)

Specify path to the input file

[ ]:
# Please specify the data path for the datasets
data_path = "/content/drive/MyDrive/Colab Notebooks/Project/scDREAMER/Immune/Immune_Human/Immune_ALL_human.h5ad"

Importing Libraries

[ ]:
import warnings

warnings.filterwarnings("ignore")
import os
import random

import numpy as np
import tensorflow as tf2
import tensorflow.compat.v1 as tf

tf.disable_v2_behavior()


np.random.seed(0)
tf.set_random_seed(0)
random.seed(0)
tf2.random.set_seed(0)
tf2.keras.utils.set_random_seed(0)
WARNING:tensorflow:From /usr/local/lib/python3.10/dist-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
[ ]:
# Specify name of the datset to run
name = "Immune_Human"

data_path = {
    "Immune_Human": data_path,
}

batch_key_dict = {
    "Immune_Human": "batch",
}
cell_type_key_dict = {
    "Immune_Human": "final_annotation",
}

# Leaning rate to use for small data vs large input data
learning_rate = {
    "Immune_Human": {"lr_ae": 0.0002, "lr_dis": 0.0007},  # Small Datasets
    "Human_Mouse": {"lr_ae": 0.0001, "lr_dis": 0.00001},
}  # Big Datasets >= 0.5 million cells

Running scDREAMER

[ ]:
from scDREAMER import scDREAMER

run_config = tf.ConfigProto()

run_config.gpu_options.per_process_gpu_memory_fraction = 0.333
run_config.gpu_options.allow_growth = True

with tf.Session(config=run_config) as sess:

    dreamer = scDREAMER(
        sess,
        epoch=250,
        dataset_name=data_path[name],
        batch=batch_key_dict[name],
        cell_type=cell_type_key_dict[name],
        name=name,
        lr_ae=learning_rate[name]["lr_ae"],
        lr_dis=learning_rate[name]["lr_dis"],
    )

    dreamer.train_cluster()
Reading data
WARNING:tensorflow:From /usr/local/lib/python3.10/dist-packages/tensorflow/python/util/dispatch.py:1176: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /usr/local/lib/python3.10/dist-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
encoder input shape  Tensor("concat:0", shape=(?, 2010), dtype=float32)
decoder input shape  Tensor("concat_2:0", shape=(?, 20), dtype=float32)
WARNING:tensorflow:From /usr/local/lib/python3.10/dist-packages/tensorflow/python/util/dispatch.py:1176: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

KL gaussian z Tensor("mul_10:0", shape=(?,), dtype=float32)
KL gaussian l Tensor("mul_9:0", shape=(?,), dtype=float32)
scDREAMER on DataSet /content/drive/MyDrive/Colab Notebooks/Project/scDREAMER/Immune/Immune_Human/Immune_ALL_human.h5ad ...
Epoch : [0] ,  a_loss = 517.4711
Epoch : [10] ,  a_loss = 404.9067
Epoch : [20] ,  a_loss = 387.9474
Epoch : [30] ,  a_loss = 377.6891
Epoch : [40] ,  a_loss = 370.3075
Epoch : [50] ,  a_loss = 364.6601
Epoch : [60] ,  a_loss = 360.2006
Epoch : [70] ,  a_loss = 356.6030
Epoch : [80] ,  a_loss = 353.6540
Epoch : [90] ,  a_loss = 351.2103
Epoch : [100] ,  a_loss = 349.1573
Epoch : [110] ,  a_loss = 347.4158
Epoch : [120] ,  a_loss = 345.9245
Epoch : [130] ,  a_loss = 344.6351
Epoch : [140] ,  a_loss = 343.5103
Epoch : [150] ,  a_loss = 342.5197
Epoch : [160] ,  a_loss = 341.6437
Epoch : [170] ,  a_loss = 340.8624
Epoch : [180] ,  a_loss = 340.1621
Epoch : [190] ,  a_loss = 339.5314
Epoch : [200] ,  a_loss = 338.9593
Epoch : [210] ,  a_loss = 338.4374
Epoch : [220] ,  a_loss = 337.9597
Epoch : [230] ,  a_loss = 337.5199
Epoch : [240] ,  a_loss = 337.1125
latent_matrix shape (33506, 10)
(33506,)
_images/scDREAMER_Immune_14_5.png
None
_images/scDREAMER_Immune_14_7.png
Done !