[{"data":1,"prerenderedAt":684},["ShallowReactive",2],{"content-query-UMrCRUmpXL":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"heading":10,"prompt":11,"tags":15,"files":18,"nav":6,"presets":19,"gallery":37,"body":39,"_type":677,"_id":678,"_source":679,"_file":680,"_stem":681,"_extension":682,"sitemap":683},"/tools/residual-plot","tools",false,"","Residual Plot Generator for Regression Diagnostics","Create residual plots online from Excel and CSV data. Check regression assumptions, nonlinearity, and heteroscedasticity with AI.","Residual Plot Generator",{"prefix":12,"label":13,"placeholder":14},"Create a residual plot","Describe the regression and diagnostic plots you want","e.g. residual plot for linear regression of GDP on life expectancy, show fitted vs residuals and Q-Q plot",[16,17],"charts","statistics",true,[20,26,31],{"label":21,"prompt":22,"dataset_url":23,"dataset_title":24,"dataset_citation":25},"Life expectancy vs GDP regression","fit a linear regression of life expectancy on log GDP per capita; create residual diagnostic plots: (1) residuals vs fitted values with LOWESS smoother, (2) normal Q-Q plot, (3) scale-location plot; identify any influential outliers","https://ourworldindata.org/grapher/life-expectancy-vs-gdp-per-capita.csv","Life expectancy vs. GDP per capita","Our World in Data",{"label":27,"prompt":28,"dataset_url":29,"dataset_title":30,"dataset_citation":25},"CO₂ vs energy use regression","fit a linear regression of CO2 emissions per capita on energy use per capita; show residuals vs fitted values, Q-Q plot of residuals, and residuals vs leverage plot; flag countries with Cook's distance > 0.5 as influential","https://ourworldindata.org/grapher/co-emissions-per-capita.csv","CO₂ emissions per capita",{"label":32,"prompt":33,"dataset_url":34,"dataset_title":35,"dataset_citation":36},"GDP growth vs investment","fit a linear regression of GDP growth rate on gross capital formation (% of GDP); plot residuals vs fitted values colored by residual magnitude, add Q-Q plot, and histogram of residuals with normal curve overlay","https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.KD.ZG?downloadformat=excel","GDP growth (annual %)","World Bank",[38],"/img/tools/residual-plot.png",{"type":40,"children":41,"toc":667},"root",[42,51,72,112,117,123,189,195,354,360,468,474,528,534,563,569,586,596,620,637],{"type":43,"tag":44,"props":45,"children":47},"element","h2",{"id":46},"what-is-a-residual-plot",[48],{"type":49,"value":50},"text","What Is a Residual Plot?",{"type":43,"tag":52,"props":53,"children":54},"p",{},[55,57,63,65,70],{"type":49,"value":56},"A ",{"type":43,"tag":58,"props":59,"children":60},"strong",{},[61],{"type":49,"value":62},"residual plot",{"type":49,"value":64}," is a diagnostic chart used to check whether the assumptions of a regression model are satisfied. After fitting a regression, each observation has a ",{"type":43,"tag":58,"props":66,"children":67},{},[68],{"type":49,"value":69},"residual",{"type":49,"value":71}," — the difference between the actual observed value and the value predicted by the model. Plotting these residuals against the fitted values (predicted values) reveals patterns that indicate whether the model is appropriate. A well-fitting regression produces residuals that scatter randomly around zero with no discernible shape; systematic patterns signal that the model is misspecified or that an assumption is violated.",{"type":43,"tag":52,"props":73,"children":74},{},[75,77,82,84,89,91,96,98,103,105,110],{"type":49,"value":76},"The most important assumption checked by residual plots is ",{"type":43,"tag":58,"props":78,"children":79},{},[80],{"type":49,"value":81},"linearity",{"type":49,"value":83}," — whether the true relationship between the predictor and outcome is actually linear. If the residual vs. fitted plot shows a curve or a U-shape rather than a random cloud, a non-linear relationship exists and the model needs a transformation (e.g. log of the predictor). The second key assumption is ",{"type":43,"tag":58,"props":85,"children":86},{},[87],{"type":49,"value":88},"homoscedasticity",{"type":49,"value":90}," (equal variance) — the spread of residuals should be constant across all fitted values. A funnel-shaped scatter (wide at one end, narrow at the other) indicates ",{"type":43,"tag":58,"props":92,"children":93},{},[94],{"type":49,"value":95},"heteroscedasticity",{"type":49,"value":97},", which inflates standard errors unevenly. The third assumption is ",{"type":43,"tag":58,"props":99,"children":100},{},[101],{"type":49,"value":102},"normality of residuals",{"type":49,"value":104},", checked with a ",{"type":43,"tag":58,"props":106,"children":107},{},[108],{"type":49,"value":109},"Q-Q plot",{"type":49,"value":111},": if the residual points follow the diagonal reference line, the residuals are approximately normal.",{"type":43,"tag":52,"props":113,"children":114},{},[115],{"type":49,"value":116},"Residual plots are essential before trusting the p-values and confidence intervals from any regression. A regression that looks reasonable in terms of R² can have severely violated assumptions that make all the inferential statistics unreliable. In practice, residual analysis is used across every domain that uses regression: clinical trials (checking whether a treatment effect model fits the data), econometrics (validating country-level growth models), and machine learning (diagnosing whether a linear baseline is appropriate before trying complex models).",{"type":43,"tag":44,"props":118,"children":120},{"id":119},"how-it-works",[121],{"type":49,"value":122},"How It Works",{"type":43,"tag":124,"props":125,"children":126},"ol",{},[127,138,154],{"type":43,"tag":128,"props":129,"children":130},"li",{},[131,136],{"type":43,"tag":58,"props":132,"children":133},{},[134],{"type":49,"value":135},"Upload your data",{"type":49,"value":137}," — provide a CSV or Excel file with at least two numeric columns: one outcome variable (Y) and one or more predictor variables (X). One row per observation.",{"type":43,"tag":128,"props":139,"children":140},{},[141,146,148],{"type":43,"tag":58,"props":142,"children":143},{},[144],{"type":49,"value":145},"Describe the analysis",{"type":49,"value":147}," — e.g. ",{"type":43,"tag":149,"props":150,"children":151},"em",{},[152],{"type":49,"value":153},"\"residual plot for regression of salary on years of experience and education; show residuals vs fitted, Q-Q plot, and scale-location plot\"",{"type":43,"tag":128,"props":155,"children":156},{},[157,162,164,171,173,179,181,187],{"type":43,"tag":58,"props":158,"children":159},{},[160],{"type":49,"value":161},"Get the diagnostic plots",{"type":49,"value":163}," — the AI writes Python code using ",{"type":43,"tag":165,"props":166,"children":168},"a",{"href":167},"https://www.statsmodels.org/",[169],{"type":49,"value":170},"statsmodels",{"type":49,"value":172}," or ",{"type":43,"tag":165,"props":174,"children":176},{"href":175},"https://scikit-learn.org/",[177],{"type":49,"value":178},"scikit-learn",{"type":49,"value":180}," to fit the regression and ",{"type":43,"tag":165,"props":182,"children":184},{"href":183},"https://plotly.com/python/",[185],{"type":49,"value":186},"Plotly",{"type":49,"value":188}," to generate the diagnostic charts",{"type":43,"tag":44,"props":190,"children":192},{"id":191},"interpreting-the-results",[193],{"type":49,"value":194},"Interpreting the Results",{"type":43,"tag":196,"props":197,"children":198},"table",{},[199,223],{"type":43,"tag":200,"props":201,"children":202},"thead",{},[203],{"type":43,"tag":204,"props":205,"children":206},"tr",{},[207,213,218],{"type":43,"tag":208,"props":209,"children":210},"th",{},[211],{"type":49,"value":212},"Plot",{"type":43,"tag":208,"props":214,"children":215},{},[216],{"type":49,"value":217},"What to look for",{"type":43,"tag":208,"props":219,"children":220},{},[221],{"type":49,"value":222},"What a problem looks like",{"type":43,"tag":224,"props":225,"children":226},"tbody",{},[227,249,270,291,312,333],{"type":43,"tag":204,"props":228,"children":229},{},[230,239,244],{"type":43,"tag":231,"props":232,"children":233},"td",{},[234],{"type":43,"tag":58,"props":235,"children":236},{},[237],{"type":49,"value":238},"Residuals vs Fitted",{"type":43,"tag":231,"props":240,"children":241},{},[242],{"type":49,"value":243},"Random scatter around y=0",{"type":43,"tag":231,"props":245,"children":246},{},[247],{"type":49,"value":248},"Curve, U-shape, or systematic trend",{"type":43,"tag":204,"props":250,"children":251},{},[252,260,265],{"type":43,"tag":231,"props":253,"children":254},{},[255],{"type":43,"tag":58,"props":256,"children":257},{},[258],{"type":49,"value":259},"Normal Q-Q Plot",{"type":43,"tag":231,"props":261,"children":262},{},[263],{"type":49,"value":264},"Points follow the diagonal line",{"type":43,"tag":231,"props":266,"children":267},{},[268],{"type":49,"value":269},"Points curve away at ends (heavy tails)",{"type":43,"tag":204,"props":271,"children":272},{},[273,281,286],{"type":43,"tag":231,"props":274,"children":275},{},[276],{"type":43,"tag":58,"props":277,"children":278},{},[279],{"type":49,"value":280},"Scale-Location Plot",{"type":43,"tag":231,"props":282,"children":283},{},[284],{"type":49,"value":285},"Horizontal band of equal width",{"type":43,"tag":231,"props":287,"children":288},{},[289],{"type":49,"value":290},"Funnel shape (heteroscedasticity)",{"type":43,"tag":204,"props":292,"children":293},{},[294,302,307],{"type":43,"tag":231,"props":295,"children":296},{},[297],{"type":43,"tag":58,"props":298,"children":299},{},[300],{"type":49,"value":301},"Residuals vs Leverage",{"type":43,"tag":231,"props":303,"children":304},{},[305],{"type":49,"value":306},"No points in top-right corner",{"type":43,"tag":231,"props":308,"children":309},{},[310],{"type":49,"value":311},"High-leverage + large residual = influential outlier",{"type":43,"tag":204,"props":313,"children":314},{},[315,323,328],{"type":43,"tag":231,"props":316,"children":317},{},[318],{"type":43,"tag":58,"props":319,"children":320},{},[321],{"type":49,"value":322},"Histogram of Residuals",{"type":43,"tag":231,"props":324,"children":325},{},[326],{"type":49,"value":327},"Roughly bell-shaped",{"type":43,"tag":231,"props":329,"children":330},{},[331],{"type":49,"value":332},"Skewed or bimodal (non-normal errors)",{"type":43,"tag":204,"props":334,"children":335},{},[336,344,349],{"type":43,"tag":231,"props":337,"children":338},{},[339],{"type":43,"tag":58,"props":340,"children":341},{},[342],{"type":49,"value":343},"LOWESS smoother",{"type":43,"tag":231,"props":345,"children":346},{},[347],{"type":49,"value":348},"Flat line near zero",{"type":43,"tag":231,"props":350,"children":351},{},[352],{"type":49,"value":353},"Curved line (non-linearity)",{"type":43,"tag":44,"props":355,"children":357},{"id":356},"example-prompts",[358],{"type":49,"value":359},"Example Prompts",{"type":43,"tag":196,"props":361,"children":362},{},[363,379],{"type":43,"tag":200,"props":364,"children":365},{},[366],{"type":43,"tag":204,"props":367,"children":368},{},[369,374],{"type":43,"tag":208,"props":370,"children":371},{},[372],{"type":49,"value":373},"Scenario",{"type":43,"tag":208,"props":375,"children":376},{},[377],{"type":49,"value":378},"What to type",{"type":43,"tag":224,"props":380,"children":381},{},[382,400,417,434,451],{"type":43,"tag":204,"props":383,"children":384},{},[385,390],{"type":43,"tag":231,"props":386,"children":387},{},[388],{"type":49,"value":389},"Basic regression check",{"type":43,"tag":231,"props":391,"children":392},{},[393],{"type":43,"tag":394,"props":395,"children":397},"code",{"className":396},[],[398],{"type":49,"value":399},"residual plot for regression of house price on square footage, show all 4 diagnostic plots",{"type":43,"tag":204,"props":401,"children":402},{},[403,408],{"type":43,"tag":231,"props":404,"children":405},{},[406],{"type":49,"value":407},"Multiple regression",{"type":43,"tag":231,"props":409,"children":410},{},[411],{"type":43,"tag":394,"props":412,"children":414},{"className":413},[],[415],{"type":49,"value":416},"residuals vs each predictor separately for regression of salary on age, education, and experience",{"type":43,"tag":204,"props":418,"children":419},{},[420,425],{"type":43,"tag":231,"props":421,"children":422},{},[423],{"type":49,"value":424},"Time series check",{"type":43,"tag":231,"props":426,"children":427},{},[428],{"type":43,"tag":394,"props":429,"children":431},{"className":430},[],[432],{"type":49,"value":433},"residuals vs time order to check for autocorrelation in quarterly revenue regression",{"type":43,"tag":204,"props":435,"children":436},{},[437,442],{"type":43,"tag":231,"props":438,"children":439},{},[440],{"type":49,"value":441},"Influential points",{"type":43,"tag":231,"props":443,"children":444},{},[445],{"type":43,"tag":394,"props":446,"children":448},{"className":447},[],[449],{"type":49,"value":450},"residuals vs leverage with Cook's distance contours, flag observations with Cook's D > 0.5",{"type":43,"tag":204,"props":452,"children":453},{},[454,459],{"type":43,"tag":231,"props":455,"children":456},{},[457],{"type":49,"value":458},"Log transform check",{"type":43,"tag":231,"props":460,"children":461},{},[462],{"type":43,"tag":394,"props":463,"children":465},{"className":464},[],[466],{"type":49,"value":467},"residual plot before and after log-transforming GDP, compare which fits better",{"type":43,"tag":44,"props":469,"children":471},{"id":470},"assumptions-to-check",[472],{"type":49,"value":473},"Assumptions to Check",{"type":43,"tag":475,"props":476,"children":477},"ul",{},[478,488,498,508,518],{"type":43,"tag":128,"props":479,"children":480},{},[481,486],{"type":43,"tag":58,"props":482,"children":483},{},[484],{"type":49,"value":485},"Linearity",{"type":49,"value":487}," — the relationship between predictors and outcome is linear; check with residuals vs. fitted plot (should be a flat band around zero)",{"type":43,"tag":128,"props":489,"children":490},{},[491,496],{"type":43,"tag":58,"props":492,"children":493},{},[494],{"type":49,"value":495},"Independence",{"type":49,"value":497}," — residuals are not correlated with each other; especially important for time series or clustered data",{"type":43,"tag":128,"props":499,"children":500},{},[501,506],{"type":43,"tag":58,"props":502,"children":503},{},[504],{"type":49,"value":505},"Homoscedasticity",{"type":49,"value":507}," — residual variance is constant across all fitted values; check the scale-location plot for a horizontal band",{"type":43,"tag":128,"props":509,"children":510},{},[511,516],{"type":43,"tag":58,"props":512,"children":513},{},[514],{"type":49,"value":515},"Normality of residuals",{"type":49,"value":517}," — residuals are approximately normally distributed; check the Q-Q plot and residual histogram",{"type":43,"tag":128,"props":519,"children":520},{},[521,526],{"type":43,"tag":58,"props":522,"children":523},{},[524],{"type":49,"value":525},"No influential outliers",{"type":49,"value":527}," — no single observation dominates the fit; check residuals vs. leverage with Cook's distance",{"type":43,"tag":44,"props":529,"children":531},{"id":530},"related-tools",[532],{"type":49,"value":533},"Related Tools",{"type":43,"tag":52,"props":535,"children":536},{},[537,539,545,547,553,555,561],{"type":49,"value":538},"Use the ",{"type":43,"tag":165,"props":540,"children":542},{"href":541},"/tools/linear-regression",[543],{"type":49,"value":544},"Linear Regression tool",{"type":49,"value":546}," to fit and interpret the regression model itself before examining its residuals. Use the ",{"type":43,"tag":165,"props":548,"children":550},{"href":549},"/tools/exploratory-data-analysis-ai",[551],{"type":49,"value":552},"Exploratory Data Analysis tool",{"type":49,"value":554}," to check for outliers and non-normal distributions in the raw data before fitting. Use the ",{"type":43,"tag":165,"props":556,"children":558},{"href":557},"/tools/ai-scatter-chart-generator",[559],{"type":49,"value":560},"AI Scatter Chart Generator",{"type":49,"value":562}," to visualize the raw X–Y relationship and identify obvious non-linearities that a residual plot would confirm.",{"type":43,"tag":44,"props":564,"children":566},{"id":565},"frequently-asked-questions",[567],{"type":49,"value":568},"Frequently Asked Questions",{"type":43,"tag":52,"props":570,"children":571},{},[572,577,579,584],{"type":43,"tag":58,"props":573,"children":574},{},[575],{"type":49,"value":576},"My residuals vs. fitted plot shows a curved pattern — what does that mean?",{"type":49,"value":578},"\nA curve (often a U-shape) means the true relationship is non-linear and a linear model is not appropriate. Common fixes: apply a log, square root, or polynomial transformation to one of the variables, or use a non-linear regression model. Ask the AI to ",{"type":43,"tag":149,"props":580,"children":581},{},[582],{"type":49,"value":583},"\"try a log transform of the predictor and replot the residuals\"",{"type":49,"value":585}," to see if the pattern disappears.",{"type":43,"tag":52,"props":587,"children":588},{},[589,594],{"type":43,"tag":58,"props":590,"children":591},{},[592],{"type":49,"value":593},"My Q-Q plot has points curving away at both ends — is that a problem?",{"type":49,"value":595},"\nCurving away at both ends (an S-shape) indicates heavy tails — the residuals have more extreme values than a normal distribution would predict. This can affect hypothesis tests if extreme. If the middle of the Q-Q plot is straight, the inference is likely robust. If you have a small sample (\u003C 50), some deviation is expected — focus on severe S-curves or systematic skew.",{"type":43,"tag":52,"props":597,"children":598},{},[599,604,606,611,613,618],{"type":43,"tag":58,"props":600,"children":601},{},[602],{"type":49,"value":603},"What is Cook's distance and when does it matter?",{"type":49,"value":605},"\nCook's distance measures how much the fitted values would change if you removed a single observation. An observation with high leverage (unusual predictor values) AND a large residual has high Cook's distance — it is ",{"type":43,"tag":58,"props":607,"children":608},{},[609],{"type":49,"value":610},"influential",{"type":49,"value":612}," and may be driving the entire regression. A rule of thumb: Cook's D > 4/n (where n is sample size) or > 0.5 is worth investigating. Ask the AI to ",{"type":43,"tag":149,"props":614,"children":615},{},[616],{"type":49,"value":617},"\"show residuals vs leverage with Cook's distance contours\"",{"type":49,"value":619},".",{"type":43,"tag":52,"props":621,"children":622},{},[623,628,630,635],{"type":43,"tag":58,"props":624,"children":625},{},[626],{"type":49,"value":627},"Can I get residual plots for multiple regression?",{"type":49,"value":629},"\nYes — the AI will fit all predictors simultaneously and produce the same diagnostic plots (residuals vs fitted, Q-Q, scale-location, leverage). You can also ask for ",{"type":43,"tag":149,"props":631,"children":632},{},[633],{"type":49,"value":634},"\"partial residual plots\"",{"type":49,"value":636}," (also called component-plus-residual plots) that show the relationship between each individual predictor and the outcome after controlling for the others.",{"type":43,"tag":52,"props":638,"children":639},{},[640,645,647,652,654,659,661,666],{"type":43,"tag":58,"props":641,"children":642},{},[643],{"type":49,"value":644},"How do I fix heteroscedasticity if I find it?",{"type":49,"value":646},"\nOptions include: (1) transform the outcome variable (log or square root often stabilizes variance), (2) use ",{"type":43,"tag":58,"props":648,"children":649},{},[650],{"type":49,"value":651},"weighted least squares",{"type":49,"value":653}," where observations with higher variance get lower weight, or (3) use ",{"type":43,"tag":58,"props":655,"children":656},{},[657],{"type":49,"value":658},"robust standard errors",{"type":49,"value":660}," (HC3) which correct the standard errors without changing the estimates. Tell the AI which fix you want to try: ",{"type":43,"tag":149,"props":662,"children":663},{},[664],{"type":49,"value":665},"\"refit using log(Y) and recheck the residuals\"",{"type":49,"value":619},{"title":7,"searchDepth":668,"depth":668,"links":669},2,[670,671,672,673,674,675,676],{"id":46,"depth":668,"text":50},{"id":119,"depth":668,"text":122},{"id":191,"depth":668,"text":194},{"id":356,"depth":668,"text":359},{"id":470,"depth":668,"text":473},{"id":530,"depth":668,"text":533},{"id":565,"depth":668,"text":568},"markdown","content:tools:025.residual-plot.md","content","tools/025.residual-plot.md","tools/025.residual-plot","md",{"loc":4},1775502471196]