Or try with a sample dataset:


Multiple regression extends simple linear regression to model the effect of two or more predictor variables on a continuous outcome simultaneously. It answers: controlling for all other predictors, how much does each one individually affect the outcome?
Use multiple regression when:
For a single predictor, use Linear Regression.
The AI generates Python code using statsmodels or scikit-learn.
| Output | What it means |
|---|---|
| Coefficient | Change in outcome for a one-unit increase in that predictor, holding all others constant |
| p-value | Whether that predictor's effect is statistically significant |
| R² (R-squared) | Proportion of outcome variance explained by all predictors combined |
| Adjusted R² | R² penalized for the number of predictors — use this to compare models with different numbers of variables |
| VIF | Variance Inflation Factor — flags multicollinearity between predictors (VIF > 5–10 is a concern) |
| Scenario | What to type |
|---|---|
| Real estate pricing | multiple regression: predict price using size, bedrooms, bathrooms, and age |
| Employee performance | regression of performance_score on experience, training_hours, and manager_rating |
| Marketing attribution | predict conversions using email_opens, ad_clicks, and page_views |
| Health outcomes | multiple regression of blood_pressure on age, bmi, and exercise_frequency |
How many predictors can I include? As a rule of thumb, you need at least 10–20 observations per predictor to avoid overfitting. With 200 rows, 10–20 predictors is reasonable.
Two of my predictors are highly correlated — is that a problem? Yes — this is called multicollinearity, and it inflates standard errors, making coefficients unreliable. The AI will flag high VIF values automatically. You may need to drop one of the correlated variables or combine them.
How do I compare two different models? Ask the AI to fit both models and compare their adjusted R² and AIC values. A lower AIC indicates a better fit relative to model complexity.
What if my outcome is binary? Use Logistic Regression instead.