Single-Cell Transcriptomics
Imagine zooming into a city from space → you see lights (bulk RNA-Seq). Now zoom to street level → you see who’s awake, who’s working, who’s sick. That’s single-cell RNA-Seq (scRNA-Seq).
1. Introduction to Single-Cell Transcriptomics
1.1 Why Study Gene Expression at Single-Cell Resolution?
Bulk RNA-Seq = average of 10,000 cells
→ Like asking: “What’s the average mood in New York?”
scRNA-Seq = each cell’s voice
→ “This neuron is stressed, this immune cell is fighting, this cancer cell is hiding.”
Purpose:
- Find rare cells (0.1% of tissue)
- See cell states (resting vs. activated)
- Track development (stem cell → neuron)
- Understand disease heterogeneity
Real Example:
Brain Tumor
Bulk RNA-Seq: “Tumor is aggressive”
scRNA-Seq: Found 1% of cells were cancer stem cells hiding from chemo
→ New drug targets only those 1% → tumor shrank 80% in mice
1.2 Bulk vs. Single-Cell RNA-Seq – The Key Differences
| Feature | Bulk RNA-Seq | scRNA-Seq |
|---|---|---|
| Input | 10,000-1M cells | 1 cell |
| Output | Average expression | Cell-specific profile |
| Resolution | Tissue level | Cell-type & state |
| Rare cells | Hidden | Detected |
| Cost | USD200-500 | USD1000-5000 |
| Data size | ~5 GB | 50-500 GB |
Real Example:
Lung Fibrosis
Bulk: “Fibroblasts upregulated”
scRNA-Seq: Only 1 subtype (not all) → targeted therapy avoided side effects
2. Workflow of scRNA-Seq
2.1 Cell Isolation and Barcoding
Goal: Put one cell in one droplet with a unique barcode
Methods:
| Method | How it works | Cells |
|---|---|---|
| 10x Chromium | Oil droplets trap cells + beads | 10,000 cells/run |
| Drop-seq | DIY version | Cheaper |
| Smart-Seq2 | Plate-based | Full-length RNA |
Step-by-step (10x):
- Dissociate tissue → enzymes (37°C, 20 min)
- Filter → 40 µm (remove clumps)
- Count → aim for 1,000 cells/µL
- Load into 10x chip → cell + bead + barcode in droplet
- Lyse cell → RNA sticks to bead
Real Example (2025):
Human Pancreas Atlas
- Used 10x Chromium on 20 donors
- Captured 1.2 million cells
- → Found new beta-cell subtype that dies first in diabetes
2.2 Library Preparation and Sequencing
After barcoding:
- Reverse transcription → RNA → cDNA (with barcode + UMI)
- Amplify → PCR (12–16 cycles)
- Fragment → add sequencing adapters
- Sequence → Illumina NovaSeq (100M reads)
UMI (Unique Molecular Identifier):
→ 10-letter random tag → Counts real RNA molecules, not PCR copies
Real Example:
Cancer Immunotherapy
Patient’s tumor biopsy → 10x → sequenced
UMI showed T-cells had 1,000 PD-1 RNAs each
→ Anti-PD-1 drug worked → tumor gone in 6 months
2.3 Data Preprocessing and Quality Control
Raw data: .fastq files with barcodes + UMIs
1
2
3
4
5
6
7
8
9
# 1. Demultiplex
cellranger mkfastq
# 2. Align + Count
cellranger count --id=sample
# 3. QC in R (Seurat)
library(Seurat)
pbmc <- CreateSeuratObject(counts, min.cells=3, min.features=200)
QC Filters:
| Metric | Good | Bad |
|---|---|---|
| Genes per cell | 500-5000 | <200 (dead), >7,000 (doublet) |
| Mitochondrial % | <10% | >20% (dying) |
| UMIs | >1,000 | <500 |
Real Example:
Alzheimer’s Brain
100,000 cells → after QC → 70,000 high-quality
Removed high-mito cells → found microglia activation not cell death
3. Analysis of Single-Cell Data
3.1 Cell Clustering and Identification
Goal: Group cells with similar gene expression Tools: Seurat (R), Scanpy (Python)
Steps:
- Normalize → log(TPM/10,000 + 1)
- Find variable genes → top 2,000
- PCA → reduce to 50 dimensions
- Cluster → Louvain algorithm
- UMAP → 2D map
Marker Genes → label clusters:
| Cluster | Marker | Cell Type |
|---|---|---|
| 1 | INS | Beta Cells |
| 2 | GFAP | Astrocytes |
| 3 | CD3D | T-cells |
Real Example
Human Heart Atlas
500,000 heart cells → 14 clusters
Cluster 7: MYH7 high → cardiomyocytes
Cluster 12: PECAM1 → endothelial
→ Found fibroblasts turn into myofibroblasts in heart failure
3.2 Trajectory Inference and Pseudotime
Goal: Order cells in time (like a movie) Tools: Monocle3, Slingshot Pseudotime = biological time, not clock time
Real Example
Blood Development
Stem cell → red blood cell
Monocle3 → pseudotime trajectory
Early: GATA2 high
Late: HBB (hemoglobin) high
→ Found drug blocks at day 5 → new anemia treatment
3.3 Tools for scRNA-Seq Analysis
| Tool | Langauge | Best for |
|---|---|---|
| Seurat | R | Clustering, integration |
| Scanpy | Python | Speed, scalability |
| CellPhoneDB | Python | Cell-cell communication |
Real Example:
Tumor-Immune Talk
Scanpy → clusters
CellPhoneDB → cancer cells signal macrophages via CCL2
→ Anti-CCL2 drug → immune attack restored
4. Applications and Challenges
4.1 Mapping Cellular Diversity in Tissues
Purpose: Build cell atlases #### Real Example > Human Cell Atlas Project > > * 100+ organs → 50 million cells > > * Found new kidney cell type that filters toxins > > → Drug toxicity test now uses this cell
4.2 Studying Rare Cell Populations
Purpose: Find needles in haystack #### Real Example:
Pancreatic Cancer
0.5% of cells = cancer stem cells
scRNA-Seq → CD133+ALDH1+
→ Targeted therapy killed only these → tumor regrowth stopped
4.3 Technical Noise and Batch Effects
Challenges:
| Problem | Fix |
|---|---|
| Dropout (gene not detected) | Imputation (MAGIC, scVI) |
| Batch effect | Harmony, Seurat Integration |
| Doublets | DoubletFinder |
Real Example:
Multi-Lab Brain Study
5 labs → different protocols
Harmony removed batch → found true Alzheimer’s microglia state
Without fix → false cluster
Expected UMAP
1
[Healthy] → [Infected: T-cells activated] → [Recovered]
Summary Table
| Step | Tool | Example |
|---|---|---|
| Isolation | 10x Chromium | Pancreas beta subtypes |
| QC | Seurat | Alzheimer’s microglia |
| Clustering | UMAP | Heart cell atlas |
| Trajectory | Monocle3 | Blood development |
| Application | Rare cells | Cancer stem cells |