Volcano Plot Generator for RNA-seq and Omics Data

Create volcano plots online from differential expression or screening results. Highlight fold change, significance, and top hits with AI.

Or try with a sample dataset:

Preview

What Is a Volcano Plot?

A volcano plot is a scatter plot designed to simultaneously display the statistical significance and magnitude of change for a large number of features — typically genes, proteins, or genetic variants. The x-axis shows the log₂ fold change (how much a feature increases or decreases between two conditions), and the y-axis shows −log₁₀(p-value) (flipping the p-value scale so that more significant results appear higher up). The result looks like a volcanic eruption: a dense cloud of uninteresting features at the bottom center, with significant hits "erupting" upward at the left and right edges.

The chart's strength is its ability to show thousands of features at once while making the important ones immediately obvious. A gene that appears in the upper-right corner is both strongly upregulated (large positive fold change) and highly statistically significant (tiny p-value) — the ideal combination. Upper-left means strongly downregulated and significant. Features near the bottom or center are either not significant, not changed, or both.

Volcano plots originated in genomics but apply to any setting where you compare two groups across many features simultaneously: proteomics (treatment vs control protein abundance), GWAS (genetic variant association across the genome), metabolomics, or even large-scale A/B testing (many metrics compared at once across two cohorts).

How It Works

  1. Upload your data — provide a CSV or Excel file with at least three columns: a feature identifier (gene name, protein ID, SNP rsID), a log₂ fold change, and a p-value (raw or adjusted). This is the standard output format from DESeq2, edgeR, limma, or any differential analysis tool.
  2. Describe the plot — e.g. "volcano plot with |log2FC| > 1 and padj < 0.05 thresholds, label the top 20 genes by significance, color upregulated red and downregulated blue"
  3. Get the visualization — the AI generates Python code using Plotly to render the plot with threshold lines, color-coded points, and gene labels

Required Data Format

ColumnDescriptionExample values
gene / featureFeature identifierBRCA2, TP53, rs12345
log2FoldChangelog₂(condition A / condition B)-3.2, 0.1, 4.7
pvalue or padjRaw or adjusted p-value0.0001, 0.32, 8.2e-15

Column names are flexible — describe them in your prompt if they differ (e.g. "the fold change column is called 'FC' and p-value is 'p.adj'").

Interpreting the Results

Position on plotWhat it means
Upper rightSignificantly upregulated — high fold change + low p-value
Upper leftSignificantly downregulated — large negative fold change + low p-value
Lower centerNot significant — close to baseline, high p-value
Right/left, bottomLarge fold change but not statistically significant (possibly low power)
Top centerHighly significant but small effect size (common with very large samples)

Key Thresholds

ThresholdTypical defaultWhat it means
|log₂FC| > 12-fold changeFeature doubled or halved between conditions
p-value < 0.05−log₁₀p > 1.35% false positive rate (unadjusted)
padj < 0.05StricterBenjamini-Hochberg adjusted; accounts for multiple testing
|log₂FC| > 1.5More stringent~2.8-fold change; reduces false positives

Always prefer adjusted p-values (padj, FDR) over raw p-values when available — raw p-values will produce many false positives across thousands of comparisons.

Example Prompts

ScenarioWhat to type
Standard DE analysisvolcano plot, log2FC threshold 1, padj < 0.05, label top 15 genes by significance
Strict thresholds`volcano plot with
Custom colorsvolcano plot, upregulated in orange, downregulated in teal, grey for non-significant
Add gene countsvolcano plot with count labels in each quadrant showing how many genes are up/down/NS
GWAS stylevolcano plot using p-value column, color hits above genome-wide significance (5e-8) in red

Use the AI Heatmap Generator to visualise expression levels of your significant hits across all samples after identifying them with a volcano plot. Use the Exploratory Data Analysis tool to get summary statistics and distributions of your fold change and p-value columns before plotting.

Frequently Asked Questions

Some of my p-values are exactly 0 — what happens? A p-value of 0 usually means it underflowed floating point precision (the true value is smaller than ~1e-300). The AI will automatically replace zeros with a small floor value (e.g. 1e-300) before applying −log₁₀ so these points still plot at the top of the chart.

Should I use raw p-values or adjusted p-values? Always use adjusted p-values (padj, FDR, BH-corrected) when available. Raw p-values across thousands of genes will produce many false positives. If your data only has raw p-values, mention this in your prompt and the AI can apply Benjamini-Hochberg correction for you.

My gene labels are overlapping — how do I fix it? Ask for "non-overlapping labels" or "label only the top N genes". The AI can use adjustText (matplotlib) or reduce the number of labeled points to avoid overlap.

Can I highlight a specific list of genes? Yes — paste the gene names directly into your prompt: "highlight these genes in purple: BRCA1, TP53, EGFR, MYC". The AI will color and label exactly those points regardless of their significance threshold.