View on GitHub

Transcriptomics

Fundamentals of Transcriptomics — from RNA sequencing basics to advanced expression analysis.

Single-Cell Transcriptomics

Imagine zooming into a city from space → you see lights (bulk RNA-Seq). Now zoom to street level → you see who’s awake, who’s working, who’s sick. That’s single-cell RNA-Seq (scRNA-Seq).

1. Introduction to Single-Cell Transcriptomics

1.1 Why Study Gene Expression at Single-Cell Resolution?

Bulk RNA-Seq = average of 10,000 cells

→ Like asking: “What’s the average mood in New York?”

scRNA-Seq = each cell’s voice

→ “This neuron is stressed, this immune cell is fighting, this cancer cell is hiding.”

Purpose:

Real Example:

Brain Tumor

  • Bulk RNA-Seq: “Tumor is aggressive”

  • scRNA-Seq: Found 1% of cells were cancer stem cells hiding from chemo

  • → New drug targets only those 1% → tumor shrank 80% in mice

1.2 Bulk vs. Single-Cell RNA-Seq – The Key Differences

Feature Bulk RNA-Seq scRNA-Seq
Input 10,000-1M cells 1 cell
Output Average expression Cell-specific profile
Resolution Tissue level Cell-type & state
Rare cells Hidden Detected
Cost USD200-500 USD1000-5000
Data size ~5 GB 50-500 GB

Real Example:

Lung Fibrosis

  • Bulk: “Fibroblasts upregulated”

  • scRNA-Seq: Only 1 subtype (not all) → targeted therapy avoided side effects

2. Workflow of scRNA-Seq

2.1 Cell Isolation and Barcoding

Goal: Put one cell in one droplet with a unique barcode

Methods:

Method How it works Cells
10x Chromium Oil droplets trap cells + beads 10,000 cells/run
Drop-seq DIY version Cheaper
Smart-Seq2 Plate-based Full-length RNA

Step-by-step (10x):

  1. Dissociate tissue → enzymes (37°C, 20 min)
  2. Filter → 40 µm (remove clumps)
  3. Count → aim for 1,000 cells/µL
  4. Load into 10x chip → cell + bead + barcode in droplet
  5. Lyse cell → RNA sticks to bead

Real Example (2025):

Human Pancreas Atlas

  • Used 10x Chromium on 20 donors
  • Captured 1.2 million cells
  • → Found new beta-cell subtype that dies first in diabetes

2.2 Library Preparation and Sequencing

After barcoding:

  1. Reverse transcription → RNA → cDNA (with barcode + UMI)
  2. Amplify → PCR (12–16 cycles)
  3. Fragment → add sequencing adapters
  4. Sequence → Illumina NovaSeq (100M reads)

UMI (Unique Molecular Identifier):

→ 10-letter random tag → Counts real RNA molecules, not PCR copies

Real Example:

Cancer Immunotherapy

  • Patient’s tumor biopsy → 10x → sequenced

  • UMI showed T-cells had 1,000 PD-1 RNAs each

  • → Anti-PD-1 drug worked → tumor gone in 6 months

2.3 Data Preprocessing and Quality Control

Raw data: .fastq files with barcodes + UMIs

1
2
3
4
5
6
7
8
9
# 1. Demultiplex
cellranger mkfastq

# 2. Align + Count
cellranger count --id=sample

# 3. QC in R (Seurat)
library(Seurat)
pbmc <- CreateSeuratObject(counts, min.cells=3, min.features=200)

QC Filters:

Metric Good Bad
Genes per cell 500-5000 <200 (dead), >7,000 (doublet)
Mitochondrial % <10% >20% (dying)
UMIs >1,000 <500

Real Example:

Alzheimer’s Brain

  • 100,000 cells → after QC → 70,000 high-quality

  • Removed high-mito cells → found microglia activation not cell death

3. Analysis of Single-Cell Data

3.1 Cell Clustering and Identification

Goal: Group cells with similar gene expression Tools: Seurat (R), Scanpy (Python)

Steps:

  1. Normalize → log(TPM/10,000 + 1)
  2. Find variable genes → top 2,000
  3. PCA → reduce to 50 dimensions
  4. Cluster → Louvain algorithm
  5. UMAP → 2D map

Marker Genes → label clusters:

Cluster Marker Cell Type
1 INS Beta Cells
2 GFAP Astrocytes
3 CD3D T-cells

Real Example

Human Heart Atlas

  • 500,000 heart cells → 14 clusters

  • Cluster 7: MYH7 high → cardiomyocytes

  • Cluster 12: PECAM1 → endothelial

→ Found fibroblasts turn into myofibroblasts in heart failure

3.2 Trajectory Inference and Pseudotime

Goal: Order cells in time (like a movie) Tools: Monocle3, Slingshot Pseudotime = biological time, not clock time

Real Example

Blood Development

  • Stem cell → red blood cell

  • Monocle3 → pseudotime trajectory

Early: GATA2 high

Late: HBB (hemoglobin) high

→ Found drug blocks at day 5 → new anemia treatment

3.3 Tools for scRNA-Seq Analysis

Tool Langauge Best for
Seurat R Clustering, integration
Scanpy Python Speed, scalability
CellPhoneDB Python Cell-cell communication

Real Example:

Tumor-Immune Talk

  • Scanpy → clusters

  • CellPhoneDB → cancer cells signal macrophages via CCL2

  • Anti-CCL2 drug → immune attack restored

4. Applications and Challenges

4.1 Mapping Cellular Diversity in Tissues

Purpose: Build cell atlases #### Real Example > Human Cell Atlas Project > > * 100+ organs → 50 million cells > > * Found new kidney cell type that filters toxins > > → Drug toxicity test now uses this cell

4.2 Studying Rare Cell Populations

Purpose: Find needles in haystack #### Real Example:

Pancreatic Cancer

  • 0.5% of cells = cancer stem cells

  • scRNA-Seq → CD133+ALDH1+

  • → Targeted therapy killed only these → tumor regrowth stopped

4.3 Technical Noise and Batch Effects

Challenges:

Problem Fix
Dropout (gene not detected) Imputation (MAGIC, scVI)
Batch effect Harmony, Seurat Integration
Doublets DoubletFinder

Real Example:

Multi-Lab Brain Study

  • 5 labs → different protocols

  • Harmony removed batch → found true Alzheimer’s microglia state

  • Without fix → false cluster

Expected UMAP

1
[Healthy] → [Infected: T-cells activated] → [Recovered]

Summary Table

Step Tool Example
Isolation 10x Chromium Pancreas beta subtypes
QC Seurat Alzheimer’s microglia
Clustering UMAP Heart cell atlas
Trajectory Monocle3 Blood development
Application Rare cells Cancer stem cells