[{"data":1,"prerenderedAt":784},["ShallowReactive",2],{"content-query-iTPTS6JFJP":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"heading":10,"prompt":11,"tags":15,"files":17,"nav":17,"presets":18,"gallery":36,"body":38,"_type":777,"_id":778,"_source":779,"_file":780,"_stem":781,"_extension":782,"sitemap":783},"/tools/correlation-matrix","tools",false,"","Correlation Matrix Calculator for Excel & CSV","Create correlation matrices online from Excel and CSV data. Measure pairwise relationships and generate correlation heatmaps with AI.","Correlation Matrix",{"prefix":12,"label":13,"placeholder":14},"Create a correlation matrix","Describe the correlation matrix you want to create","e.g. correlation matrix of GDP, life expectancy, CO2, education, and fertility with p-values",[16],"statistics",true,[19,25,30],{"label":20,"prompt":21,"dataset_url":22,"dataset_title":23,"dataset_citation":24},"Country development indicators","correlation matrix of log GDP per capita, life expectancy, CO2 emissions, education index, fertility rate, and urbanization rate; Pearson correlations with p-values; diverging red-white-blue heatmap; annotate each cell with r value and significance stars","https://ourworldindata.org/grapher/life-expectancy-vs-gdp-per-capita.csv","Life expectancy vs. GDP per capita","Our World in Data",{"label":26,"prompt":27,"dataset_url":28,"dataset_title":29,"dataset_citation":24},"Electricity sources correlation","correlation matrix of electricity generation shares from fossil fuels, nuclear, solar, wind, hydro, and bioenergy by country; Pearson correlations; show which sources co-occur and which are substitutes; annotate with r and significance","https://ourworldindata.org/grapher/share-of-electricity-production-by-source.csv","Share of electricity production by source",{"label":31,"prompt":32,"dataset_url":33,"dataset_title":34,"dataset_citation":35},"Economic indicator correlations","correlation matrix of GDP growth, inflation, trade openness, government expenditure, and unemployment by country; Spearman correlations (more robust to outliers); cluster variables by correlation; highlight cells with |r| > 0.5","https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.KD.ZG?downloadformat=excel","GDP growth (annual %)","World Bank",[37],"/img/tools/correlation-matrix.png",{"type":39,"children":40,"toc":767},"root",[41,50,92,118,151,157,223,229,440,446,550,556,610,616,645,651,672,716,726,743],{"type":42,"tag":43,"props":44,"children":46},"element","h2",{"id":45},"what-is-a-correlation-matrix",[47],{"type":48,"value":49},"text","What Is a Correlation Matrix?",{"type":42,"tag":51,"props":52,"children":53},"p",{},[54,56,62,64,69,71,76,78,83,85,90],{"type":48,"value":55},"A ",{"type":42,"tag":57,"props":58,"children":59},"strong",{},[60],{"type":48,"value":61},"correlation matrix",{"type":48,"value":63}," is a table showing the ",{"type":42,"tag":57,"props":65,"children":66},{},[67],{"type":48,"value":68},"pairwise correlation coefficients",{"type":48,"value":70}," between every combination of numeric variables in a dataset. Each cell contains a value between −1 and +1: ",{"type":42,"tag":57,"props":72,"children":73},{},[74],{"type":48,"value":75},"+1",{"type":48,"value":77}," means the two variables increase together perfectly, ",{"type":42,"tag":57,"props":79,"children":80},{},[81],{"type":48,"value":82},"−1",{"type":48,"value":84}," means one increases as the other decreases perfectly, and ",{"type":42,"tag":57,"props":86,"children":87},{},[88],{"type":48,"value":89},"0",{"type":48,"value":91}," means no linear relationship. Visualized as a color-coded heatmap with a diverging scale (typically blue for negative, white for zero, red for positive), the correlation matrix lets you scan the strength and direction of all pairwise relationships at a single glance.",{"type":42,"tag":51,"props":93,"children":94},{},[95,97,102,104,109,111,116],{"type":48,"value":96},"The most common measure is the ",{"type":42,"tag":57,"props":98,"children":99},{},[100],{"type":48,"value":101},"Pearson correlation coefficient",{"type":48,"value":103}," (r), which captures linear relationships and assumes the variables are approximately normally distributed. For data with outliers, non-normal distributions, or ordinal scales, ",{"type":42,"tag":57,"props":105,"children":106},{},[107],{"type":48,"value":108},"Spearman's rank correlation",{"type":48,"value":110}," (ρ) is more robust — it ranks the values first and then computes Pearson on the ranks. Each correlation can be accompanied by a ",{"type":42,"tag":57,"props":112,"children":113},{},[114],{"type":48,"value":115},"p-value",{"type":48,"value":117}," indicating whether the observed correlation could plausibly have arisen by chance in a dataset of that size. With 100+ observations, even r = 0.2 can be statistically significant; with 10 observations, r = 0.5 may not be.",{"type":42,"tag":51,"props":119,"children":120},{},[121,123,128,130,135,137,142,144,149],{"type":48,"value":122},"Correlation matrices are the standard first diagnostic before any multivariate analysis. In ",{"type":42,"tag":57,"props":124,"children":125},{},[126],{"type":48,"value":127},"regression",{"type":48,"value":129},", highly correlated predictors (r > 0.8) signal multicollinearity — they carry redundant information and inflate standard errors. In ",{"type":42,"tag":57,"props":131,"children":132},{},[133],{"type":48,"value":134},"PCA",{"type":48,"value":136},", variable clusters in the correlation matrix predict which groups of variables will collapse into the same principal component. In ",{"type":42,"tag":57,"props":138,"children":139},{},[140],{"type":48,"value":141},"feature selection",{"type":48,"value":143}," for machine learning, pairs with high mutual correlation are candidates for dropping one. In ",{"type":42,"tag":57,"props":145,"children":146},{},[147],{"type":48,"value":148},"finance",{"type":48,"value":150},", a portfolio's correlation matrix reveals diversification opportunities — pairs of assets with low or negative correlation reduce portfolio volatility.",{"type":42,"tag":43,"props":152,"children":154},{"id":153},"how-it-works",[155],{"type":48,"value":156},"How It Works",{"type":42,"tag":158,"props":159,"children":160},"ol",{},[161,172,188],{"type":42,"tag":162,"props":163,"children":164},"li",{},[165,170],{"type":42,"tag":57,"props":166,"children":167},{},[168],{"type":48,"value":169},"Upload your data",{"type":48,"value":171}," — provide a CSV or Excel file with multiple numeric columns. One row per observation.",{"type":42,"tag":162,"props":173,"children":174},{},[175,180,182],{"type":42,"tag":57,"props":176,"children":177},{},[178],{"type":48,"value":179},"Describe the analysis",{"type":48,"value":181}," — e.g. ",{"type":42,"tag":183,"props":184,"children":185},"em",{},[186],{"type":48,"value":187},"\"Pearson correlation matrix of all numeric columns, annotate with r and p-values, cluster by correlation\"",{"type":42,"tag":162,"props":189,"children":190},{},[191,196,198,205,207,213,215,221],{"type":42,"tag":57,"props":192,"children":193},{},[194],{"type":48,"value":195},"Get full results",{"type":48,"value":197}," — the AI writes Python code using ",{"type":42,"tag":199,"props":200,"children":202},"a",{"href":201},"https://pandas.pydata.org/",[203],{"type":48,"value":204},"pandas",{"type":48,"value":206}," and ",{"type":42,"tag":199,"props":208,"children":210},{"href":209},"https://scipy.org/",[211],{"type":48,"value":212},"scipy",{"type":48,"value":214}," to compute correlations and p-values, and ",{"type":42,"tag":199,"props":216,"children":218},{"href":217},"https://plotly.com/python/",[219],{"type":48,"value":220},"Plotly",{"type":48,"value":222}," to render the heatmap with annotations",{"type":42,"tag":43,"props":224,"children":226},{"id":225},"interpreting-the-results",[227],{"type":48,"value":228},"Interpreting the Results",{"type":42,"tag":230,"props":231,"children":232},"table",{},[233,252],{"type":42,"tag":234,"props":235,"children":236},"thead",{},[237],{"type":42,"tag":238,"props":239,"children":240},"tr",{},[241,247],{"type":42,"tag":242,"props":243,"children":244},"th",{},[245],{"type":48,"value":246},"Cell value",{"type":42,"tag":242,"props":248,"children":249},{},[250],{"type":48,"value":251},"What it means",{"type":42,"tag":253,"props":254,"children":255},"tbody",{},[256,273,289,305,321,337,353,369,382,395,408,424],{"type":42,"tag":238,"props":257,"children":258},{},[259,268],{"type":42,"tag":260,"props":261,"children":262},"td",{},[263],{"type":42,"tag":57,"props":264,"children":265},{},[266],{"type":48,"value":267},"r = +1.0",{"type":42,"tag":260,"props":269,"children":270},{},[271],{"type":48,"value":272},"Perfect positive linear relationship",{"type":42,"tag":238,"props":274,"children":275},{},[276,284],{"type":42,"tag":260,"props":277,"children":278},{},[279],{"type":42,"tag":57,"props":280,"children":281},{},[282],{"type":48,"value":283},"r = +0.7 to +0.9",{"type":42,"tag":260,"props":285,"children":286},{},[287],{"type":48,"value":288},"Strong positive correlation",{"type":42,"tag":238,"props":290,"children":291},{},[292,300],{"type":42,"tag":260,"props":293,"children":294},{},[295],{"type":42,"tag":57,"props":296,"children":297},{},[298],{"type":48,"value":299},"r = +0.3 to +0.6",{"type":42,"tag":260,"props":301,"children":302},{},[303],{"type":48,"value":304},"Moderate positive correlation",{"type":42,"tag":238,"props":306,"children":307},{},[308,316],{"type":42,"tag":260,"props":309,"children":310},{},[311],{"type":42,"tag":57,"props":312,"children":313},{},[314],{"type":48,"value":315},"r near 0",{"type":42,"tag":260,"props":317,"children":318},{},[319],{"type":48,"value":320},"Little or no linear relationship",{"type":42,"tag":238,"props":322,"children":323},{},[324,332],{"type":42,"tag":260,"props":325,"children":326},{},[327],{"type":42,"tag":57,"props":328,"children":329},{},[330],{"type":48,"value":331},"r = −0.3 to −0.6",{"type":42,"tag":260,"props":333,"children":334},{},[335],{"type":48,"value":336},"Moderate negative correlation",{"type":42,"tag":238,"props":338,"children":339},{},[340,348],{"type":42,"tag":260,"props":341,"children":342},{},[343],{"type":42,"tag":57,"props":344,"children":345},{},[346],{"type":48,"value":347},"r = −0.7 to −0.9",{"type":42,"tag":260,"props":349,"children":350},{},[351],{"type":48,"value":352},"Strong negative correlation",{"type":42,"tag":238,"props":354,"children":355},{},[356,364],{"type":42,"tag":260,"props":357,"children":358},{},[359],{"type":42,"tag":57,"props":360,"children":361},{},[362],{"type":48,"value":363},"r = −1.0",{"type":42,"tag":260,"props":365,"children":366},{},[367],{"type":48,"value":368},"Perfect negative linear relationship",{"type":42,"tag":238,"props":370,"children":371},{},[372,377],{"type":42,"tag":260,"props":373,"children":374},{},[375],{"type":48,"value":376},"*****",{"type":42,"tag":260,"props":378,"children":379},{},[380],{"type":48,"value":381},"p \u003C 0.05 — correlation is statistically significant",{"type":42,"tag":238,"props":383,"children":384},{},[385,390],{"type":42,"tag":260,"props":386,"children":387},{},[388],{"type":48,"value":389},"******",{"type":42,"tag":260,"props":391,"children":392},{},[393],{"type":48,"value":394},"p \u003C 0.01",{"type":42,"tag":238,"props":396,"children":397},{},[398,403],{"type":42,"tag":260,"props":399,"children":400},{},[401],{"type":48,"value":402},"*******",{"type":42,"tag":260,"props":404,"children":405},{},[406],{"type":48,"value":407},"p \u003C 0.001",{"type":42,"tag":238,"props":409,"children":410},{},[411,419],{"type":42,"tag":260,"props":412,"children":413},{},[414],{"type":42,"tag":57,"props":415,"children":416},{},[417],{"type":48,"value":418},"Cluster of red cells",{"type":42,"tag":260,"props":420,"children":421},{},[422],{"type":48,"value":423},"Group of mutually correlated variables — may be redundant",{"type":42,"tag":238,"props":425,"children":426},{},[427,435],{"type":42,"tag":260,"props":428,"children":429},{},[430],{"type":42,"tag":57,"props":431,"children":432},{},[433],{"type":48,"value":434},"Blue cell between two red clusters",{"type":42,"tag":260,"props":436,"children":437},{},[438],{"type":48,"value":439},"Two variable groups that move in opposite directions",{"type":42,"tag":43,"props":441,"children":443},{"id":442},"example-prompts",[444],{"type":48,"value":445},"Example Prompts",{"type":42,"tag":230,"props":447,"children":448},{},[449,465],{"type":42,"tag":234,"props":450,"children":451},{},[452],{"type":42,"tag":238,"props":453,"children":454},{},[455,460],{"type":42,"tag":242,"props":456,"children":457},{},[458],{"type":48,"value":459},"Scenario",{"type":42,"tag":242,"props":461,"children":462},{},[463],{"type":48,"value":464},"What to type",{"type":42,"tag":253,"props":466,"children":467},{},[468,486,499,516,533],{"type":42,"tag":238,"props":469,"children":470},{},[471,476],{"type":42,"tag":260,"props":472,"children":473},{},[474],{"type":48,"value":475},"Full matrix",{"type":42,"tag":260,"props":477,"children":478},{},[479],{"type":42,"tag":480,"props":481,"children":483},"code",{"className":482},[],[484],{"type":48,"value":485},"correlation matrix of all numeric columns, Pearson r with p-values, red-blue heatmap",{"type":42,"tag":238,"props":487,"children":488},{},[489,494],{"type":42,"tag":260,"props":490,"children":491},{},[492],{"type":48,"value":493},"Spearman",{"type":42,"tag":260,"props":495,"children":496},{},[497],{"type":48,"value":498},"`Spearman correlation matrix of health indicators, cluster variables, highlight",{"type":42,"tag":238,"props":500,"children":501},{},[502,507],{"type":42,"tag":260,"props":503,"children":504},{},[505],{"type":48,"value":506},"Partial correlations",{"type":42,"tag":260,"props":508,"children":509},{},[510],{"type":42,"tag":480,"props":511,"children":513},{"className":512},[],[514],{"type":48,"value":515},"correlation matrix of stock returns, show which assets are most diversified",{"type":42,"tag":238,"props":517,"children":518},{},[519,524],{"type":42,"tag":260,"props":520,"children":521},{},[522],{"type":48,"value":523},"Significance filter",{"type":42,"tag":260,"props":525,"children":526},{},[527],{"type":42,"tag":480,"props":528,"children":530},{"className":529},[],[531],{"type":48,"value":532},"correlation matrix, mask cells where p > 0.05 (show only significant correlations)",{"type":42,"tag":238,"props":534,"children":535},{},[536,541],{"type":42,"tag":260,"props":537,"children":538},{},[539],{"type":48,"value":540},"Time-lagged",{"type":42,"tag":260,"props":542,"children":543},{},[544],{"type":42,"tag":480,"props":545,"children":547},{"className":546},[],[548],{"type":48,"value":549},"correlation matrix with lag 1 to see how this month's variable predicts next month's",{"type":42,"tag":43,"props":551,"children":553},{"id":552},"assumptions-to-check",[554],{"type":48,"value":555},"Assumptions to Check",{"type":42,"tag":557,"props":558,"children":559},"ul",{},[560,570,580,590,600],{"type":42,"tag":162,"props":561,"children":562},{},[563,568],{"type":42,"tag":57,"props":564,"children":565},{},[566],{"type":48,"value":567},"Numeric variables",{"type":48,"value":569}," — Pearson correlation requires numeric data; for ordinal or ranked data use Spearman",{"type":42,"tag":162,"props":571,"children":572},{},[573,578],{"type":42,"tag":57,"props":574,"children":575},{},[576],{"type":48,"value":577},"Sufficient sample size",{"type":48,"value":579}," — at least 30 observations for reliable correlation estimates; p-values are unreliable with fewer than 20",{"type":42,"tag":162,"props":581,"children":582},{},[583,588],{"type":42,"tag":57,"props":584,"children":585},{},[586],{"type":48,"value":587},"No extreme outliers",{"type":48,"value":589}," — a single outlier can dramatically change a Pearson correlation; use Spearman if outliers are present",{"type":42,"tag":162,"props":591,"children":592},{},[593,598],{"type":42,"tag":57,"props":594,"children":595},{},[596],{"type":48,"value":597},"Linear relationships",{"type":48,"value":599}," — Pearson measures only linear association; two variables can have a strong curved relationship yet r ≈ 0",{"type":42,"tag":162,"props":601,"children":602},{},[603,608],{"type":42,"tag":57,"props":604,"children":605},{},[606],{"type":48,"value":607},"Multiple testing",{"type":48,"value":609}," — computing n×(n−1)/2 correlations inflates false discovery rate; ask for Bonferroni or FDR correction if testing many pairs",{"type":42,"tag":43,"props":611,"children":613},{"id":612},"related-tools",[614],{"type":48,"value":615},"Related Tools",{"type":42,"tag":51,"props":617,"children":618},{},[619,621,627,629,635,637,643],{"type":48,"value":620},"Use the ",{"type":42,"tag":199,"props":622,"children":624},{"href":623},"/tools/pair-plot",[625],{"type":48,"value":626},"Pair Plot Generator",{"type":48,"value":628}," to visualize the actual scatter of each variable pair after the correlation matrix identifies which pairs are most interesting. Use the ",{"type":42,"tag":199,"props":630,"children":632},{"href":631},"/tools/pca",[633],{"type":48,"value":634},"PCA tool",{"type":48,"value":636}," after identifying correlated variable clusters — PCA will compress those clusters into fewer components. Use the ",{"type":42,"tag":199,"props":638,"children":640},{"href":639},"/tools/exploratory-data-analysis-ai",[641],{"type":48,"value":642},"Exploratory Data Analysis tool",{"type":48,"value":644}," for a full automated analysis that includes the correlation matrix, distributions, and outlier summary in one report.",{"type":42,"tag":43,"props":646,"children":648},{"id":647},"frequently-asked-questions",[649],{"type":48,"value":650},"Frequently Asked Questions",{"type":42,"tag":51,"props":652,"children":653},{},[654,659,664,666,670],{"type":42,"tag":57,"props":655,"children":656},{},[657],{"type":48,"value":658},"What's the difference between Pearson and Spearman correlation?",{"type":42,"tag":57,"props":660,"children":661},{},[662],{"type":48,"value":663},"Pearson",{"type":48,"value":665}," measures linear association between raw values — it assumes both variables are roughly normally distributed and is sensitive to outliers. ",{"type":42,"tag":57,"props":667,"children":668},{},[669],{"type":48,"value":493},{"type":48,"value":671}," ranks all values first and then computes Pearson on the ranks — it measures monotonic association (variables that consistently increase together, even non-linearly) and is robust to outliers and non-normal distributions. Use Spearman when your data has skew, outliers, or ordinal scales; use Pearson for roughly normal, continuous data without extreme outliers.",{"type":42,"tag":51,"props":673,"children":674},{},[675,680,682,687,689,694,696,701,703,708,710,714],{"type":42,"tag":57,"props":676,"children":677},{},[678],{"type":48,"value":679},"My matrix has 40+ variables — can I still use it?",{"type":48,"value":681},"\nA 40×40 matrix has 780 cells and becomes hard to read. Ask the AI to: (1) ",{"type":42,"tag":57,"props":683,"children":684},{},[685],{"type":48,"value":686},"cluster the heatmap",{"type":48,"value":688}," (reorder rows/columns so correlated variables are adjacent), (2) show only the ",{"type":42,"tag":57,"props":690,"children":691},{},[692],{"type":48,"value":693},"lower triangle",{"type":48,"value":695},", (3) ",{"type":42,"tag":57,"props":697,"children":698},{},[699],{"type":48,"value":700},"mask non-significant cells",{"type":48,"value":702}," (set them to white), or (4) ",{"type":42,"tag":57,"props":704,"children":705},{},[706],{"type":48,"value":707},"threshold",{"type":48,"value":709}," (show only |r| > 0.5). For very many variables, the ",{"type":42,"tag":199,"props":711,"children":712},{"href":631},[713],{"type":48,"value":634},{"type":48,"value":715}," or a network graph of significant correlations may be more useful.",{"type":42,"tag":51,"props":717,"children":718},{},[719,724],{"type":42,"tag":57,"props":720,"children":721},{},[722],{"type":48,"value":723},"High correlation doesn't mean one variable causes the other — how do I check?",{"type":48,"value":725},"\nCorrelation is not causation. A high r between, say, ice cream sales and drowning rates doesn't mean ice cream causes drowning (both are driven by summer heat). To make causal claims you need experimental design or causal inference methods (instrumental variables, difference-in-differences, etc.). The correlation matrix is purely descriptive.",{"type":42,"tag":51,"props":727,"children":728},{},[729,734,736,741],{"type":42,"tag":57,"props":730,"children":731},{},[732],{"type":48,"value":733},"Can I compute partial correlations — controlling for a third variable?",{"type":48,"value":735},"\nYes — ask for ",{"type":42,"tag":183,"props":737,"children":738},{},[739],{"type":48,"value":740},"\"partial correlation between X and Y controlling for Z\"",{"type":48,"value":742},". Partial correlation removes the shared influence of control variables and shows the unique linear relationship between two variables. This is especially useful when a confounding variable drives spurious correlations (e.g. GDP driving both CO₂ and life expectancy, making them appear correlated even after controlling for GDP).",{"type":42,"tag":51,"props":744,"children":745},{},[746,751,753,758,760,765],{"type":42,"tag":57,"props":747,"children":748},{},[749],{"type":48,"value":750},"What does it mean if my correlation matrix is not positive semi-definite?",{"type":48,"value":752},"\nA valid correlation matrix must be positive semi-definite (all eigenvalues ≥ 0). This can fail when you have missing data handled by pairwise deletion (each pair computed on different samples) or when rounding introduces inconsistencies. Ask the AI to use ",{"type":42,"tag":57,"props":754,"children":755},{},[756],{"type":48,"value":757},"listwise deletion",{"type":48,"value":759}," (same rows for all pairs) or apply ",{"type":42,"tag":57,"props":761,"children":762},{},[763],{"type":48,"value":764},"nearest positive definite correction",{"type":48,"value":766}," to fix it.",{"title":7,"searchDepth":768,"depth":768,"links":769},2,[770,771,772,773,774,775,776],{"id":45,"depth":768,"text":49},{"id":153,"depth":768,"text":156},{"id":225,"depth":768,"text":228},{"id":442,"depth":768,"text":445},{"id":552,"depth":768,"text":555},{"id":612,"depth":768,"text":615},{"id":647,"depth":768,"text":650},"markdown","content:tools:031.correlation-matrix.md","content","tools/031.correlation-matrix.md","tools/031.correlation-matrix","md",{"loc":4},1775502468196]