View on GitHub

Transcriptomics

Fundamentals of Transcriptomics — from RNA sequencing basics to advanced expression analysis.

Experimental Design in Transcriptomics

This chapter is the blueprint for turning curiosity into discovery. Whether you’re a novice or refining your skills, experimental design is where science meets strategy. A well-designed study ensures your transcriptomic data is reproducible, interpretable, and biologically meaningful.

Key Considerations

1. Defining Biological Questions and Hypotheses

Why it matters: Your question drives everything—sample choice, method, analysis. A vague question yields vague results.

How to do it (step-by-step):

Tip: Use the PICO framework:

Example: A 2025 study asked: “Does Alzheimer’s disease alter microglial activation in the hippocampus?”

→ Hypothesis: TREM2 and CX3CR1 are upregulated in AD patient microglia. → This directed scRNA-Seq on hippocampal tissue, leading to targeted biomarker discovery.

2. Selection of Appropriate Samples

Why it matters: The transcriptome is tissue-, cell-, and condition-specific. Wrong samples = wrong answers.

Step-by-step guide

Step Action Example
1 Define the biological context Liver for metabolic disease, not blood
2 Choose sample type Fresh-frozen tissue, FFPE, single cells
3 Match conditions Age, sex, BMI, medication
4 Avoid confounders Smoking, inflammation

Trap: Using whole blood for brain-specific questions → diluted signal.

2025 Example: A lung cancer study compared tumor core vs. tumor edge vs. healthy lung tissue using spatial transcriptomics. They found PD-L1 highly expressed only at the invasive edge—critical for immunotherapy targeting.

3. Replication and Sample Size

Why it matters: Biology is variable. Replication distinguishes signal from noise.

Step-by-step:

Rule of thumb:

Example: A 2025 diabetes study used n = 5 obese and n = 5 lean patients. Power analysis showed 80% power to detect 2-fold changes at FDR < 0.05. Tip: Under-replication → false positives. Over-replication → wasted resources.

Sample Preparation

1. RNA Extraction and Quality Control

Goal: Isolate intact, pure RNA. Step-by-step protocol (for tissue):

  1. Homogenize tissue in TRIzol or Qiagen RNeasy kit
  2. Phase separation (chloroform) → aqueous phase = RNA
  3. Column purification → DNase treatment (remove DNA)
  4. Quantify: NanoDrop (concentration), Qubit (accurate), Bioanalyzer (integrity)

Quality metrics:

Metric Good Poor
RIN (RNA Integrity Number) ≥ 7.0 < 6.0
260/280 ratio 1.8–2.1 < 1.7 (protein)
260/230 ratio > 2.0 < 1.8 (salts)

Tip: Degraded RNA → 3’ bias in sequencing. Always check electropherogram.

2025 Example: A brain bank study rejected 30% of samples with RIN < 7, ensuring reliable Alzheimer’s transcriptomes.

2. Handling Low-Input or Degraded RNA Samples

Challenge: Biopsies, laser-capture microdissection, FFPE samples. Solutions:

Sample Type Method Key Step
Low input (<10 ng) Smart-Seq2, CEL-Seq2 Full-length amplification
FFPE TruSeq RNA Exome, Ovation FFPE Deparaffinization + rRNA depletion
Degraded RNAtag-Seq, BRB-Seq 3’ end capture

Example: A 2025 pancreatic cancer study used FFPE blocks from 2015 with a new FFPE-optimized RNA-Seq kit, recovering 80% of transcripts despite degradation.

3. Considerations for Single-Cell Studies

Unique challenges:

Step-by-step:

1. Tissue dissociation: Enzymatic (collagenase), gentle 2. Cell counting: Trypan blue or automated (Countess) 3. Loading: 10x Chromium → target 5,000–10,000 cells 4. Library QC: Check cDNA size (Tapestation)

Tip: Dead cells release RNA → false ambient signal. Use CellBender to remove. 2025 Example: A kidney scRNA-Seq study used fresh biopsies within 30 minutes to achieve 92% viability, revealing a novel podocyte subtype in lupus nephritis.

Controls and Normalization

1. Use of Housekeeping Genes and Spike-In Controls

Housekeeping genes (e.g., GAPDH, ACTB): Assumed stable → not always true! Better: ERCC spike-ins (synthetic RNAs of known concentration)

Step-by-step: 1. Add ERCC mix before library prep 2. Sequence → count ERCC reads 3. Use for absolute quantification and technical normalization

Example: A 2025 study found GAPDH upregulated in hypoxia → misleading normalization. ERCC spike-ins corrected this, revealing true hypoxia response genes.

2. Normalization Methods for Transcriptomic Data

Goal: Remove technical variation (sequencing depth, RNA input)

Method When to use Formula
TPM (Transcripts Per Million) Compare genes within sample (reads × 10⁶) / (gene length × total reads)
RPKM/FPKM Older bulk RNA-Seq Similar to TPM
TMM (Trimmed Mean of M-values) DESeq2, edgeR Scales by effective library size
Spike-in normalization Low-input, scRNA-Seq ERCC-based scaling

Tip: Never use raw counts for comparison across samples. 2025 Example: A multi-lab RNA-Seq study used TMM + batch correction (ComBat-seq) to harmonize data from 12 centers, identifying robust COVID-19 biomarkers.

Study Types:

1. Differential Expression Studies

Goal: Find genes changing between conditions (e.g., disease vs. healthy) Design: * Case-control: n ≥ 3 per group * Paired design: Before/after treatment (stronger)

Analysis pipeline:

  1. QC → Trim adapters
  2. Align (STAR/HISAT2)
  3. Count (featureCounts)
  4. Normalize (DESeq2/edgeR)
  5. Test (Wald, LRT)

Example: A 2025 colorectal cancer study (n=50 tumors vs. 50 normals) found APC downregulated and MYC upregulated, validated by qPCR.

2. Time-Course Experiments

Goal: Capture dynamic changes (e.g., drug response over 0–48h)

Design:

Example: A 2025 immunotherapy study tracked PD-1 blockade response in melanoma:

3. Comparative Transcriptomics Across Species

Goal: Identify conserved or divergent responses (e.g., stress in human vs. mouse)

Design:

Example: A 2025 heat stress study compared human, mouse, and zebrafish: