View on GitHub

Transcriptomics

Fundamentals of Transcriptomics — from RNA sequencing basics to advanced expression analysis.

Single-Cell Transcriptomics

Imagine zooming into a city from space → you see lights (bulk RNA-Seq). Now zoom to street level → you see who’s awake, who’s working, who’s sick. That’s single-cell RNA-Seq (scRNA-Seq).

1. Introduction to Single-Cell Transcriptomics

1.1 Why Study Gene Expression at Single-Cell Resolution?

Bulk RNA-Seq = average of 10,000 cells

→ Like asking: “What’s the average mood in New York?”

scRNA-Seq = each cell’s voice

→ “This neuron is stressed, this immune cell is fighting, this cancer cell is hiding.”

Purpose:

Find rare cells (0.1% of tissue)
See cell states (resting vs. activated)
Track development (stem cell → neuron)
Understand disease heterogeneity

Real Example:

Brain Tumor

Bulk RNA-Seq: “Tumor is aggressive”

scRNA-Seq: Found 1% of cells were cancer stem cells hiding from chemo

→ New drug targets only those 1% → tumor shrank 80% in mice

1.2 Bulk vs. Single-Cell RNA-Seq – The Key Differences

Feature	Bulk RNA-Seq	scRNA-Seq
Input	10,000-1M cells	1 cell
Output	Average expression	Cell-specific profile
Resolution	Tissue level	Cell-type & state
Rare cells	Hidden	Detected
Cost	USD200-500	USD1000-5000
Data size	~5 GB	50-500 GB

Real Example:

Lung Fibrosis

Bulk: “Fibroblasts upregulated”

scRNA-Seq: Only 1 subtype (not all) → targeted therapy avoided side effects

2. Workflow of scRNA-Seq

2.1 Cell Isolation and Barcoding

Goal: Put one cell in one droplet with a unique barcode

Methods:

Method	How it works	Cells
10x Chromium	Oil droplets trap cells + beads	10,000 cells/run
Drop-seq	DIY version	Cheaper
Smart-Seq2	Plate-based	Full-length RNA

Step-by-step (10x):

Dissociate tissue → enzymes (37°C, 20 min)
Filter → 40 µm (remove clumps)
Count → aim for 1,000 cells/µL
Load into 10x chip → cell + bead + barcode in droplet
Lyse cell → RNA sticks to bead

Real Example (2025):

Human Pancreas Atlas

Used 10x Chromium on 20 donors

Captured 1.2 million cells

→ Found new beta-cell subtype that dies first in diabetes

2.2 Library Preparation and Sequencing

After barcoding:

Reverse transcription → RNA → cDNA (with barcode + UMI)
Amplify → PCR (12–16 cycles)
Fragment → add sequencing adapters
Sequence → Illumina NovaSeq (100M reads)

UMI (Unique Molecular Identifier):

→ 10-letter random tag → Counts real RNA molecules, not PCR copies

Real Example:

Cancer Immunotherapy

Patient’s tumor biopsy → 10x → sequenced

UMI showed T-cells had 1,000 PD-1 RNAs each

→ Anti-PD-1 drug worked → tumor gone in 6 months

2.3 Data Preprocessing and Quality Control

Raw data: .fastq files with barcodes + UMIs

# 1. Demultiplex
cellranger mkfastq

# 2. Align + Count
cellranger count --id=sample

# 3. QC in R (Seurat)
library(Seurat)
pbmc <- CreateSeuratObject(counts, min.cells=3, min.features=200)

QC Filters:

Metric	Good	Bad
Genes per cell	500-5000	<200 (dead), >7,000 (doublet)
Mitochondrial %	<10%	>20% (dying)
UMIs	>1,000	<500

Real Example:

Alzheimer’s Brain

100,000 cells → after QC → 70,000 high-quality

Removed high-mito cells → found microglia activation not cell death

3. Analysis of Single-Cell Data

3.1 Cell Clustering and Identification

Goal: Group cells with similar gene expression Tools: Seurat (R), Scanpy (Python)

Steps:

Normalize → log(TPM/10,000 + 1)
Find variable genes → top 2,000
PCA → reduce to 50 dimensions
Cluster → Louvain algorithm
UMAP → 2D map

Marker Genes → label clusters:

Cluster	Marker	Cell Type
1	INS	Beta Cells
2	GFAP	Astrocytes
3	CD3D	T-cells

Real Example

Human Heart Atlas

500,000 heart cells → 14 clusters

Cluster 7: MYH7 high → cardiomyocytes

Cluster 12: PECAM1 → endothelial

→ Found fibroblasts turn into myofibroblasts in heart failure

3.2 Trajectory Inference and Pseudotime

Goal: Order cells in time (like a movie) Tools: Monocle3, Slingshot Pseudotime = biological time, not clock time

Real Example

Blood Development

Stem cell → red blood cell

Monocle3 → pseudotime trajectory

Early: GATA2 high

Late: HBB (hemoglobin) high

→ Found drug blocks at day 5 → new anemia treatment

3.3 Tools for scRNA-Seq Analysis

Tool	Langauge	Best for
Seurat	R	Clustering, integration
Scanpy	Python	Speed, scalability
CellPhoneDB	Python	Cell-cell communication

Real Example:

Tumor-Immune Talk

Scanpy → clusters

CellPhoneDB → cancer cells signal macrophages via CCL2

→ Anti-CCL2 drug → immune attack restored

4. Applications and Challenges

4.1 Mapping Cellular Diversity in Tissues

Purpose: Build cell atlases #### Real Example > Human Cell Atlas Project > > * 100+ organs → 50 million cells > > * Found new kidney cell type that filters toxins > > → Drug toxicity test now uses this cell

4.2 Studying Rare Cell Populations

Purpose: Find needles in haystack #### Real Example:

Pancreatic Cancer

0.5% of cells = cancer stem cells

scRNA-Seq → CD133+ALDH1+

→ Targeted therapy killed only these → tumor regrowth stopped

4.3 Technical Noise and Batch Effects

Challenges:

Problem	Fix
Dropout (gene not detected)	Imputation (MAGIC, scVI)
Batch effect	Harmony, Seurat Integration
Doublets	DoubletFinder

Real Example:

Multi-Lab Brain Study

5 labs → different protocols

Harmony removed batch → found true Alzheimer’s microglia state

Without fix → false cluster

Expected UMAP

[Healthy] → [Infected: T-cells activated] → [Recovered]

Summary Table

Step	Tool	Example
Isolation	10x Chromium	Pancreas beta subtypes
QC	Seurat	Alzheimer’s microglia
Clustering	UMAP	Heart cell atlas
Trajectory	Monocle3	Blood development
Application	Rare cells	Cancer stem cells