Experimental Design in Transcriptomics
This chapter is the blueprint for turning curiosity into discovery. Whether you’re a novice or refining your skills, experimental design is where science meets strategy. A well-designed study ensures your transcriptomic data is reproducible, interpretable, and biologically meaningful.
Key Considerations
1. Defining Biological Questions and Hypotheses
Why it matters: Your question drives everything—sample choice, method, analysis. A vague question yields vague results.
How to do it (step-by-step):
- 1. Start with a biological problem: “How does obesity affect liver gene expression?”
- 2. Formulate a testable hypothesis: “Obesity upregulates lipid synthesis genes in hepatocytes.”
- 3. Make it specific and measurable: “We hypothesize that FASN and SCD1 are >2-fold upregulated in obese vs. lean human liver biopsies.”
Tip: Use the PICO framework:
- Population: Obese vs. lean adults
- Intervention/Exposure: Obesity
- Comparison: Lean controls
- Outcome: Differential gene expression
Example: A 2025 study asked: “Does Alzheimer’s disease alter microglial activation in the hippocampus?”
→ Hypothesis: TREM2 and CX3CR1 are upregulated in AD patient microglia. → This directed scRNA-Seq on hippocampal tissue, leading to targeted biomarker discovery.
2. Selection of Appropriate Samples
Why it matters: The transcriptome is tissue-, cell-, and condition-specific. Wrong samples = wrong answers.
Step-by-step guide
| Step | Action | Example |
|---|---|---|
| 1 | Define the biological context | Liver for metabolic disease, not blood |
| 2 | Choose sample type | Fresh-frozen tissue, FFPE, single cells |
| 3 | Match conditions | Age, sex, BMI, medication |
| 4 | Avoid confounders | Smoking, inflammation |
Trap: Using whole blood for brain-specific questions → diluted signal.
2025 Example: A lung cancer study compared tumor core vs. tumor edge vs. healthy lung tissue using spatial transcriptomics. They found PD-L1 highly expressed only at the invasive edge—critical for immunotherapy targeting.
3. Replication and Sample Size
Why it matters: Biology is variable. Replication distinguishes signal from noise.
Step-by-step:
-
- Biological replicates: Different individuals (n ≥ 3 per group)
-
- Technical replicates: Same sample, different runs (rarely needed with NGS)
-
- Power calculation: Use tools like RNASeqPower or Scotty
Rule of thumb:
- Bulk RNA-Seq: n = 3–6 per group
- scRNA-Seq: n = 3 patients, >5,000 cells per sample
Example: A 2025 diabetes study used n = 5 obese and n = 5 lean patients. Power analysis showed 80% power to detect 2-fold changes at FDR < 0.05. Tip: Under-replication → false positives. Over-replication → wasted resources.
Sample Preparation
1. RNA Extraction and Quality Control
Goal: Isolate intact, pure RNA. Step-by-step protocol (for tissue):
- Homogenize tissue in TRIzol or Qiagen RNeasy kit
- Phase separation (chloroform) → aqueous phase = RNA
- Column purification → DNase treatment (remove DNA)
- Quantify: NanoDrop (concentration), Qubit (accurate), Bioanalyzer (integrity)
Quality metrics:
| Metric | Good | Poor |
|---|---|---|
| RIN (RNA Integrity Number) | ≥ 7.0 | < 6.0 |
| 260/280 ratio | 1.8–2.1 | < 1.7 (protein) |
| 260/230 ratio | > 2.0 | < 1.8 (salts) |
Tip: Degraded RNA → 3’ bias in sequencing. Always check electropherogram.
2025 Example: A brain bank study rejected 30% of samples with RIN < 7, ensuring reliable Alzheimer’s transcriptomes.
2. Handling Low-Input or Degraded RNA Samples
Challenge: Biopsies, laser-capture microdissection, FFPE samples. Solutions:
| Sample Type | Method | Key Step |
|---|---|---|
| Low input (<10 ng) | Smart-Seq2, CEL-Seq2 | Full-length amplification |
| FFPE | TruSeq RNA Exome, Ovation FFPE | Deparaffinization + rRNA depletion |
| Degraded | RNAtag-Seq, BRB-Seq | 3’ end capture |
Example: A 2025 pancreatic cancer study used FFPE blocks from 2015 with a new FFPE-optimized RNA-Seq kit, recovering 80% of transcripts despite degradation.
3. Considerations for Single-Cell Studies
Unique challenges:
- Cell viability (>80%)
- Doublet removal
- Ambient RNA contamination
Step-by-step:
1. Tissue dissociation: Enzymatic (collagenase), gentle 2. Cell counting: Trypan blue or automated (Countess) 3. Loading: 10x Chromium → target 5,000–10,000 cells 4. Library QC: Check cDNA size (Tapestation)
Tip: Dead cells release RNA → false ambient signal. Use CellBender to remove. 2025 Example: A kidney scRNA-Seq study used fresh biopsies within 30 minutes to achieve 92% viability, revealing a novel podocyte subtype in lupus nephritis.
Controls and Normalization
1. Use of Housekeeping Genes and Spike-In Controls
Housekeeping genes (e.g., GAPDH, ACTB): Assumed stable → not always true! Better: ERCC spike-ins (synthetic RNAs of known concentration)
Step-by-step: 1. Add ERCC mix before library prep 2. Sequence → count ERCC reads 3. Use for absolute quantification and technical normalization
Example: A 2025 study found GAPDH upregulated in hypoxia → misleading normalization. ERCC spike-ins corrected this, revealing true hypoxia response genes.
2. Normalization Methods for Transcriptomic Data
Goal: Remove technical variation (sequencing depth, RNA input)
| Method | When to use | Formula |
|---|---|---|
| TPM (Transcripts Per Million) | Compare genes within sample | (reads × 10⁶) / (gene length × total reads) |
| RPKM/FPKM | Older bulk RNA-Seq | Similar to TPM |
| TMM (Trimmed Mean of M-values) | DESeq2, edgeR | Scales by effective library size |
| Spike-in normalization | Low-input, scRNA-Seq | ERCC-based scaling |
Tip: Never use raw counts for comparison across samples. 2025 Example: A multi-lab RNA-Seq study used TMM + batch correction (ComBat-seq) to harmonize data from 12 centers, identifying robust COVID-19 biomarkers.
Study Types:
1. Differential Expression Studies
Goal: Find genes changing between conditions (e.g., disease vs. healthy) Design: * Case-control: n ≥ 3 per group * Paired design: Before/after treatment (stronger)
Analysis pipeline:
- QC → Trim adapters
- Align (STAR/HISAT2)
- Count (featureCounts)
- Normalize (DESeq2/edgeR)
- Test (Wald, LRT)
Example: A 2025 colorectal cancer study (n=50 tumors vs. 50 normals) found APC downregulated and MYC upregulated, validated by qPCR.
2. Time-Course Experiments
Goal: Capture dynamic changes (e.g., drug response over 0–48h)
Design:
- Time points: 0, 1, 6, 24, 48h (n=3 each)
- Model: Use ImpulseDE or maSigPro
Example: A 2025 immunotherapy study tracked PD-1 blockade response in melanoma:
- 0h: Baseline
- 6h: Immune activation (IFNG ↑)
- 24h: T-cell exhaustion (LAG3 ↑) → Guided combination therapy timing.
3. Comparative Transcriptomics Across Species
Goal: Identify conserved or divergent responses (e.g., stress in human vs. mouse)
Design:
- Ortholog mapping (Ensembl Biomart)
- Normalize separately, then integrate
Example: A 2025 heat stress study compared human, mouse, and zebrafish:
- Conserved: HSP70 upregulation
- Divergent: Fish-specific gill genes → Improved mouse model relevance for human trials.