Inferensys

Glossary

R-squared (Coefficient of Determination)

R-squared is a statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variables in a regression model.
ML engineer running AI model benchmarks, performance charts on multiple screens, late night home office setup.
PERFORMANCE METRIC DESIGN

What is R-squared (Coefficient of Determination)?

R-squared is a core statistical measure for evaluating regression models, quantifying how well independent variables explain the variance in the dependent variable.

R-squared (R²), or the coefficient of determination, is a statistical measure that quantifies the proportion of the variance in a dependent variable that is predictable from one or more independent variables in a regression model. It provides a single score between 0 and 1, where 0 indicates the model explains none of the target's variability and 1 indicates it explains all variability. This metric is foundational for model benchmarking suites and assessing the explanatory power of linear models, serving as a key indicator in Evaluation-Driven Development.

While a higher R-squared generally indicates a better fit, it has critical limitations: it does not indicate whether the regression model is biased, and it can be artificially inflated by adding irrelevant predictors. For this reason, it is often analyzed alongside other regression metrics like Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). In machine learning, adjusted R-squared is preferred for multiple regression as it penalizes model complexity, providing a more reliable measure for feature selection and preventing overfitting during experiment tracking.

R-SQUARED (COEFFICIENT OF DETERMINATION)

Key Interpretations and Characteristics

R-squared quantifies the proportion of variance in a dependent variable explained by a regression model's independent variables. Its interpretation is nuanced, depending on model type, data structure, and the presence of bias.

01

Definition and Core Calculation

R-squared is defined as the proportion of the variance in the dependent variable (y) that is predictable from the independent variable(s) (X). It is calculated as:

R² = 1 - (SS_res / SS_tot)

  • SS_res (Sum of Squares Residual): The sum of squared differences between observed values and model-predicted values.
  • SS_tot (Total Sum of Squares): The sum of squared differences between observed values and the mean of the dependent variable.

A value of 1 indicates perfect prediction, while 0 indicates the model explains none of the variance around the mean.

02

Interpretation in Linear Regression

In ordinary least squares (OLS) linear regression, R-squared has a clear, bounded interpretation:

  • Explained Variance: Directly represents the fraction of total variance 'explained' by the linear model.
  • Goodness-of-Fit: A higher R-squared indicates a better fit of the model to the data.
  • Caveat: It does not indicate whether:
    • The independent variables are causally related to the dependent variable.
    • The model is correctly specified (e.g., omitting a key variable).
    • The coefficient estimates are unbiased.

It is a descriptive, not a causal, measure of fit.

03

Limitations and Common Misconceptions

R-squared is frequently misinterpreted. Key limitations include:

  • Non-Comparative Across Datasets: A high R-squared on data with high inherent variance is not comparable to a lower R-squared on data with low variance.
  • Sensitivity to Outliers: A single outlier can artificially inflate or deflate R-squared.
  • No Indication of Bias: A model can have a high R-squared but produce systematically biased predictions (poor calibration).
  • Always Increases with Predictors: Adding any variable, even random noise, will never decrease R-squared in OLS, leading to overfitting. This is addressed by the Adjusted R-squared.
04

Adjusted R-squared

Adjusted R-squared penalizes the addition of non-informative predictors to counteract overfitting. It is calculated as:

Adjusted R² = 1 - [(1 - R²) * (n - 1) / (n - k - 1)]

  • n: Number of observations.
  • k: Number of independent variables.

Unlike standard R-squared, Adjusted R-squared can decrease when a new predictor adds less explanatory power than expected by chance, providing a more reliable metric for model comparison, especially with multiple predictors.

05

R-squared in Non-Linear and Machine Learning Models

For non-linear models (e.g., polynomial regression, decision trees, neural networks), the interpretation of R-squared changes:

  • It remains a measure of explained variance but loses its direct connection to OLS properties.
  • It can be negative for models that fit worse than a simple horizontal line (the mean). This occurs when SS_res > SS_tot.
  • In machine learning, it is often called the coefficient of determination or R² score (sklearn.metrics.r2_score). It is a useful metric for regression tasks but should be evaluated alongside Mean Squared Error (MSE) or Mean Absolute Error (MAE) to understand error magnitude.
06

Practical Guidelines for Use

When evaluating R-squared in practice:

  • Context is Critical: An R-squared of 0.7 may be excellent in social sciences (high noise) but poor in physics experiments.
  • Use with Other Metrics: Always pair with residual analysis, MSE/MAE, and prediction error plots to diagnose model flaws.
  • Focus on Out-of-Sample Performance: A high training R-squared with a low validation R-squared signals overfitting.
  • Prioritize Adjusted R-squared for multiple regression to compare models with different numbers of features.
  • Remember its Domain: It is a variance-based metric; for probabilistic or classification-calibrated regression, also consider Log Loss or Brier Score.
COMPARATIVE ANALYSIS

R-squared vs. Other Regression Metrics

A comparison of R-squared with other core regression evaluation metrics, highlighting their calculation, interpretation, and primary use cases.

MetricR-squared (R²)Adjusted R-squaredMean Squared Error (MSE)Mean Absolute Error (MAE)

Core Definition

Proportion of variance in the dependent variable explained by the model.

R² adjusted for the number of predictors, penalizing model complexity.

Average of squared differences between predicted and actual values.

Average of absolute differences between predicted and actual values.

Formula

1 - (SS_res / SS_tot)

1 - [(1 - R²)(n - 1) / (n - k - 1)]

(1/n) * Σ(y_i - ŷ_i)²

(1/n) * Σ|y_i - ŷ_i|

Value Range

0 to 1 (or 0% to 100%)

Can be negative if model is worse than the mean; ≤ R²

0 to ∞

0 to ∞

Interpretation

Higher is better. 1 = perfect fit, 0 = fit as good as the mean.

Higher is better. Directly comparable for models with different predictors.

Lower is better. Heavily penalizes large errors (squared term).

Lower is better. Linear penalty, more interpretable in original units.

Primary Use Case

Explanatory power & model fit assessment.

Model selection when comparing models with different numbers of features.

Optimization target (loss function) during training; sensitivity to outliers.

Interpretable error reporting; robust to outliers.

Unit of Measurement

Unitless (proportion).

Unitless (proportion).

Squared units of the target variable.

Same units as the target variable.

Penalizes Model Complexity?

Sensitive to Outliers?

Limitations, Caveats, and Adjusted R-squared

While R-squared is a foundational regression metric, its interpretation requires careful consideration of model specification and complexity. This section details its critical limitations and introduces Adjusted R-squared as a more robust alternative.

R-squared has significant limitations that can mislead model evaluation. It always increases or stays the same when adding more predictors, even irrelevant ones, creating a false sense of improvement. This makes it unsuitable for comparing models with different numbers of features. Furthermore, a high R-squared does not imply causation, correct model specification, or the absence of bias. It is also sensitive to outliers and provides no information about prediction error magnitude on new data.

Adjusted R-squared addresses the flaw of automatic inflation by penalizing the addition of non-informative predictors. It adjusts the standard R-squared value based on the number of predictors (k) and sample size (n). Unlike R-squared, Adjusted R-squared can decrease when a new feature fails to improve the model sufficiently, providing a more honest assessment of generalization capability. It is the preferred metric for feature selection and comparing the explanatory power of models with differing complexities within the same dataset.

R-SQUARED

Frequently Asked Questions

Essential questions and answers about the R-squared (Coefficient of Determination) metric, a core statistic for evaluating regression models.

R-squared, or the Coefficient of Determination, is a statistical measure that quantifies the proportion of the variance in the dependent variable that is predictable from the independent variables in a regression model. It is calculated using the formula: R² = 1 - (SS_res / SS_tot), where SS_res is the sum of squares of residuals (the variance unexplained by the model) and SS_tot is the total sum of squares (the total variance in the dependent variable). A value of 1 indicates the model explains all the variability of the response data, while a value of 0 indicates it explains none. This calculation provides a standardized measure of model fit, allowing for comparison across different datasets and models.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.