Experimental Design in Transcriptomics

This chapter is the blueprint for turning curiosity into discovery. Whether you’re a novice or refining your skills, experimental design is where science meets strategy. A well-designed study ensures your transcriptomic data is reproducible, interpretable, and biologically meaningful.

Key Considerations

1. Defining Biological Questions and Hypotheses

Why it matters: Your question drives everything—sample choice, method, analysis. A vague question yields vague results.

How to do it (step-by-step):

1. Start with a biological problem: “How does obesity affect liver gene expression?”
2. Formulate a testable hypothesis: “Obesity upregulates lipid synthesis genes in hepatocytes.”
3. Make it specific and measurable: “We hypothesize that FASN and SCD1 are >2-fold upregulated in obese vs. lean human liver biopsies.”

Tip: Use the PICO framework:

Population: Obese vs. lean adults
Intervention/Exposure: Obesity
Comparison: Lean controls
Outcome: Differential gene expression

Example: A 2025 study asked: “Does Alzheimer’s disease alter microglial activation in the hippocampus?”

→ Hypothesis: TREM2 and CX3CR1 are upregulated in AD patient microglia. → This directed scRNA-Seq on hippocampal tissue, leading to targeted biomarker discovery.

2. Selection of Appropriate Samples

Why it matters: The transcriptome is tissue-, cell-, and condition-specific. Wrong samples = wrong answers.

Step-by-step guide

Step	Action	Example
1	Define the biological context	Liver for metabolic disease, not blood
2	Choose sample type	Fresh-frozen tissue, FFPE, single cells
3	Match conditions	Age, sex, BMI, medication
4	Avoid confounders	Smoking, inflammation

Trap: Using whole blood for brain-specific questions → diluted signal.

2025 Example: A lung cancer study compared tumor core vs. tumor edge vs. healthy lung tissue using spatial transcriptomics. They found PD-L1 highly expressed only at the invasive edge—critical for immunotherapy targeting.

3. Replication and Sample Size

Why it matters: Biology is variable. Replication distinguishes signal from noise.

Step-by-step:

1. Biological replicates: Different individuals (n ≥ 3 per group)
1. Technical replicates: Same sample, different runs (rarely needed with NGS)
1. Power calculation: Use tools like RNASeqPower or Scotty

Rule of thumb:

Bulk RNA-Seq: n = 3–6 per group
scRNA-Seq: n = 3 patients, >5,000 cells per sample

Example: A 2025 diabetes study used n = 5 obese and n = 5 lean patients. Power analysis showed 80% power to detect 2-fold changes at FDR < 0.05. Tip: Under-replication → false positives. Over-replication → wasted resources.

Sample Preparation

1. RNA Extraction and Quality Control

Goal: Isolate intact, pure RNA. Step-by-step protocol (for tissue):

Homogenize tissue in TRIzol or Qiagen RNeasy kit
Phase separation (chloroform) → aqueous phase = RNA
Column purification → DNase treatment (remove DNA)
Quantify: NanoDrop (concentration), Qubit (accurate), Bioanalyzer (integrity)

Quality metrics:

Metric	Good	Poor
RIN (RNA Integrity Number)	≥ 7.0	< 6.0
260/280 ratio	1.8–2.1	< 1.7 (protein)
260/230 ratio	> 2.0	< 1.8 (salts)

Tip: Degraded RNA → 3’ bias in sequencing. Always check electropherogram.

2025 Example: A brain bank study rejected 30% of samples with RIN < 7, ensuring reliable Alzheimer’s transcriptomes.

2. Handling Low-Input or Degraded RNA Samples

Challenge: Biopsies, laser-capture microdissection, FFPE samples. Solutions:

Sample Type	Method	Key Step
Low input (<10 ng)	Smart-Seq2, CEL-Seq2	Full-length amplification
FFPE	TruSeq RNA Exome, Ovation FFPE	Deparaffinization + rRNA depletion
Degraded	RNAtag-Seq, BRB-Seq	3’ end capture

Example: A 2025 pancreatic cancer study used FFPE blocks from 2015 with a new FFPE-optimized RNA-Seq kit, recovering 80% of transcripts despite degradation.

3. Considerations for Single-Cell Studies

Unique challenges:

Cell viability (>80%)
Doublet removal
Ambient RNA contamination

Step-by-step:

1. Tissue dissociation: Enzymatic (collagenase), gentle 2. Cell counting: Trypan blue or automated (Countess) 3. Loading: 10x Chromium → target 5,000–10,000 cells 4. Library QC: Check cDNA size (Tapestation)

Tip: Dead cells release RNA → false ambient signal. Use CellBender to remove. 2025 Example: A kidney scRNA-Seq study used fresh biopsies within 30 minutes to achieve 92% viability, revealing a novel podocyte subtype in lupus nephritis.

Controls and Normalization

1. Use of Housekeeping Genes and Spike-In Controls

Housekeeping genes (e.g., GAPDH, ACTB): Assumed stable → not always true! Better: ERCC spike-ins (synthetic RNAs of known concentration)

Step-by-step: 1. Add ERCC mix before library prep 2. Sequence → count ERCC reads 3. Use for absolute quantification and technical normalization

Example: A 2025 study found GAPDH upregulated in hypoxia → misleading normalization. ERCC spike-ins corrected this, revealing true hypoxia response genes.

2. Normalization Methods for Transcriptomic Data

Goal: Remove technical variation (sequencing depth, RNA input)

Method	When to use	Formula
TPM (Transcripts Per Million)	Compare genes within sample	(reads × 10⁶) / (gene length × total reads)
RPKM/FPKM	Older bulk RNA-Seq	Similar to TPM
TMM (Trimmed Mean of M-values)	DESeq2, edgeR	Scales by effective library size
Spike-in normalization	Low-input, scRNA-Seq	ERCC-based scaling

Tip: Never use raw counts for comparison across samples. 2025 Example: A multi-lab RNA-Seq study used TMM + batch correction (ComBat-seq) to harmonize data from 12 centers, identifying robust COVID-19 biomarkers.

Study Types:

1. Differential Expression Studies

Goal: Find genes changing between conditions (e.g., disease vs. healthy) Design: * Case-control: n ≥ 3 per group * Paired design: Before/after treatment (stronger)

Analysis pipeline:

QC → Trim adapters
Align (STAR/HISAT2)
Count (featureCounts)
Normalize (DESeq2/edgeR)
Test (Wald, LRT)

Example: A 2025 colorectal cancer study (n=50 tumors vs. 50 normals) found APC downregulated and MYC upregulated, validated by qPCR.

2. Time-Course Experiments

Goal: Capture dynamic changes (e.g., drug response over 0–48h)

Design:

Time points: 0, 1, 6, 24, 48h (n=3 each)
Model: Use ImpulseDE or maSigPro

Example: A 2025 immunotherapy study tracked PD-1 blockade response in melanoma:

0h: Baseline
6h: Immune activation (IFNG ↑)
24h: T-cell exhaustion (LAG3 ↑) → Guided combination therapy timing.

3. Comparative Transcriptomics Across Species

Goal: Identify conserved or divergent responses (e.g., stress in human vs. mouse)

Design:

Ortholog mapping (Ensembl Biomart)
Normalize separately, then integrate

Example: A 2025 heat stress study compared human, mouse, and zebrafish:

Conserved: HSP70 upregulation
Divergent: Fish-specific gill genes → Improved mouse model relevance for human trials.