Regression Sum of Squares Formula

5/5 - (1 bình chọn)

Sum of Squares

A statistical tool that is used to identify the dispersion of data

What is Sum of Squares?

Sum of squares (SS) is a statistical tool that is used to identify the dispersion of data as well as how well the data can fit the model in regression analysis. The sum of squares got its name because it is calculated by finding the sum of the squared differences.

The sum of squares is one of the most important outputs in regression analysis. The general rule is that a smaller sum of squares indicates a better model, as there is less variation in the data.

In finance, understanding the sum of squares is important because linear regression models are widely used in both theoretical and practical finance.

Types of Sum of Squares

In regression analysis, the three main types of sum of squares are the total sum of squares, regression sum of squares, and residual sum of squares.

1. Total sum of squares

The total sum of squares is a variation of the values of a dependent variable from the sample mean of the dependent variable. Essentially, the total sum of squares quantifies the total variation in a sample. It can be determined using the following formula:


  • y– the value in a sample
  • ȳ – the mean value of a sample

2. Regression sum of squares (also known as the sum of squares due to regression or explained sum of squares)

The regression sum of squares describes how well a regression model represents the modeled data. A higher regression sum of squares indicates that the model does not fit the data well.

The formula for calculating the regression sum of squares is:


  • ŷ– the value estimated by the regression line
  • ȳ – the mean value of a sample

3. Residual sum of squares (also known as the sum of squared errors of prediction)

The residual sum of squares essentially measures the variation of modeling errors. In other words, it depicts how the variation in the dependent variable in a regression model cannot be explained by the model. Generally, a lower residual sum of squares indicates that the regression model can better explain the data, while a higher residual sum of squares indicates that the model poorly explains the data.

The residual sum of squares can be found using the formula below:


  • y– the observed value
  • ŷ– the value estimated by the regression line

The relationship between the three types of sum of squares can be summarized by the following equation:

What Is the Sum of Squares?

The term sum of squares refers to a statistical technique used in regression analysis to determine the dispersion of data points. The sum of squares can be used to find the function that best fits by varying the least from the data. In a regression analysis, the goal is to determine how well a data series can be fitted to a function that might help to explain how the data series was generated. The sum of squares can be used in the financial world to determine the variance in asset values.

Understanding the Sum of Squares

The sum of squares is a statistical measure of deviation from the mean. It is also known as variation. It is calculated by adding together the squared differences of each data point. To determine the sum of squares, square the distance between each data point and the line of best fit, then add them together. The line of best fit will minimize this value.

A low sum of squares indicates little variation between data sets while a higher one indicates more variation. Variation refers to the difference of each data set from the mean. You can visualize this in a chart. If the line doesn’t pass through all the data points, then there is some unexplained variability. We go into a little more detail about this in the next section below.

Analysts and investors can use the sum of squares to make better decisions about their investments. Keep in mind, though that using it means you’re making assumptions about using past performance. For instance, this measure can help you determine the level of volatility in a stock’s price or how the share prices of two companies compare.

Let’s say an analyst who wants to know whether Microsoft (MSFT) share prices move in tandem with those of Apple (AAPL) can list out the daily prices for both stocks for a certain period (say one, two, or 10 years) and create a linear model or a chart. If the relationship between both variables (i.e., the price of AAPL and MSFT) is not a straight line, then there are variations in the data set that must be scrutinized.

Variation is a statistical measure that is calculated or measured by using squared differences.

How to Calculate the Sum of Squares

You can see why the measurement is called the sum of squared deviations, or the sum of squares for short. You can use the following steps to calculate the sum of squares:

  1. Gather all the data points.
  2. Determine the mean/average
  3. Subtract the mean/average from each individual data point.
  4. Square each total from Step 3.
  5. Add up the figures from Step 4.

In statistics, it is the average of a set of numbers, which is calculated by adding the values in the data set together and dividing by the number of values. But knowing the mean may not be enough to determine the sum of squares. As such, it helps to know the variation in a set of measurements. How far individual values are from the mean may provide insight into how fit the observations or values are to the regression model that is created.

Types of Sum of Squares

The formula we highlighted earlier is used to calculate the total sum of squares. The total sum of squares is used to arrive at other types. The following are the other types of sum of squares.

Residual Sum of Squares

As noted above, if the line in the linear model created does not pass through all the measurements of value, then some of the variability that has been observed in the share prices is unexplained. The sum of squares is used to calculate whether a linear relationship exists between two variables, and any unexplained variability is referred to as the residual sum of squares.

The RSS allows you to determine the amount of error left between a regression function and the data set after the model has been run. You can interpret a smaller RSS figure as a regression function that is well-fit to the data while the opposite is true of a larger RSS figure.

Here is the formula for calculating the residual sum of squares:

Regression Sum of Squares

The regression sum of squares is used to denote the relationship between the modeled data and a regression model. A regression model establishes whether there is a relationship between one or multiple variables. Having a low regression sum of squares indicates a better fit with the data. A higher regression sum of squares, though, means the model and the data aren’t a good fit together.

Here is the formula for calculating the regression sum of squares:

Adding the sum of the deviations alone without squaring will result in a number equal to or close to zero since the negative deviations will almost perfectly offset the positive deviations. To get a more realistic number, the sum of deviations must be squared. The sum of squares will always be a positive number because the square of any number, whether positive or negative, is always positive.

Limitations of Using the Sum of Squares

Making an investment decision on what stock to purchase requires many more observations than the ones listed here. An analyst may have to work with years of data to know with a higher certainty how high or low the variability of an asset is. As more data points are added to the set, the sum of squares becomes larger as the values will be more spread out.

The most widely used measurements of variation are the standard deviation and variance. However, to calculate either of the two metrics, the sum of squares must first be calculated. The variance is the average of the sum of squares (i.e., the sum of squares divided by the number of observations). The standard deviation is the square root of the variance.

There are two methods of regression analysis that use the sum of squares: the linear least squares method and the non-linear least squares method. The least squares method refers to the fact that the regression function minimizes the sum of the squares of the variance from the actual data points. In this way, it is possible to draw a function, which statistically provides the best fit for the data. Note that a regression function can either be linear (a straight line) or non-linear (a curving line).

Example of Sum of Squares

Let’s use Microsoft as an example to show how you can arrive at the sum of squares.

Using the steps listed above, we gather the data. So if we’re looking at the company’s performance over a five-year period, we’ll need the closing prices for that time frame:

  • $74.01
  • $74.77
  • $73.94
  • $73.61
  • $73.40

Now let’s figure out the average price. The sum of the total prices is $369.73 and the mean or average price is $369.73 ÷5 = $73.95.

Then, figure out the sum of squares, we find the difference of each price from the average, square the differences, and add them together:

  • SS = ($74.01 – $73.95)2 + ($74.77 – $73.95)2 + ($73.94 – $73.95)2 + ($73.61 – $73.95)2 + ($73.40 – $73.95)2
  • SS = (0.06)2 + (0.82)2 + (-0.01)2 + (-0.34)2 + (-0.55)2
  • SS = 1.0942

In the example above, 1.0942 shows that the variability in the stock price of MSFT over five days is very low and investors looking to invest in stocks characterized by price stability and low volatility may opt for MSFT.

How Do You Define the Sum of Squares?

The sum of squares is a form of regression analysis to determine the variance from data points from the mean. If there is a low sum of squares, it means there’s low variation. A higher sum of squares indicates higher variance. This can be used to help make more informed decisions by determining investment volatility or to compare groups of investments with one another.

How Do You Calculate the Sum of Squares?

In order to calculate the sum of squares, gather all your data points. Then determine the mean or average by adding them all together and dividing that figure by the total number of data points. Next, figure out the differences between each data point and the mean. Then square those differences and add them together to give you the sum of squares.

How Does the Sum of Squares Help in Finance?

Investors and analysts can use the sum of squares to make comparisons between different investments or make decisions about how to invest. For instance, you can use the sum of squares to determine stock volatility. A low sum generally indicates low volatility while higher volatility is derived from a higher sum of squares.

The Bottom Line

As an investor, you want to make informed decisions about where to put your money. While you can certainly do so using your gut instinct, there are tools at your disposal that can help you. The sum of squares takes historical data to give you an indication of implied volatility. Use it to see whether a stock is a good fit for you or to determine an investment if you’re on the fence between two different assets. Keep in mind, though, that the sum of squares uses past performance as an indicator and doesn’t guarantee future performance.

What is Residual Sum of Squares?

Residual Sum of Squares (RSS) is a statistical method that helps identify the level of discrepancy in a dataset not predicted by a regression model. Thus, it measures the variance in the value of the observed data when compared to its predicted value as per the regression model. Hence, RSS indicates whether the regression model fits the actual dataset well or not. 

Also referred to as the Sum of Squared Errors (SSE), RSS is obtained by adding the square of residuals. Residuals are projected deviations from actual data values and represent errors in the regression model’s estimation. A lower RSS indicates that the regression model fits the data well and has minimal data variation. In finance, investors use RSS to track the changes in the prices of a stock to predict its future price movements.

Key Takeaways

  • Residual Sum of Squares (RSS) is a statistical method used to measure the deviation in a dataset unexplained by the regression model.
  • Residual or error is the difference between the observation’s actual and predicted value.
  • If the RSS value is low, it means the data fits the estimation model well, indicating the least variance. If it is zero, the model fits perfectly with the data, having no variance at all.
  • It helps stock market players to assess the future stock price movements by monitoring the fluctuation in the stock prices.

Residual Sum of Squares Explained

RSS is one of the types of the Sum of Squares (SS) – the rest two being the Total Sum of Squares (TSS) and Sum of Squares due to Regression (SSR) or Explained Sum of Squares (ESS). Sum of squares is a statistical measure through which the data dispersion is assessed to determine how well the data would fit the model in regression analysis.

While the TSS measures the variation in values of an observed variable with respect to its sample mean, the SSR or ESS calculates the deviation between the estimated value and the mean value of the observed variable. If the TSS equals SSR, it means the regression model is a perfect fit for the data as it reflects all the variability in the actual data.

On the other hand, RSS measures the extent of variability of observed data not shown by a regression model. To calculate RSS, first find the model’s level of error or residue by subtracting the actual observed values from the estimated values. Then, square and add all error values to arrive at RSS.

The lower the error in the model, the better the regression prediction. In other words, a lower RSS signifies that the regression model explains the data better, indicating the least variance. It means the model fits the data well. Likewise, if the value comes to zero, it’s considered the best fit with no variance.

Note that the RSS is not similar to R-Squared. While the former defines the exact amount of variation, R-squared is the amount of variation defined with respect to the proportion of total variation.

Residual Sum of Squares in Finance

The discrepancy detected in the data set through RSS indicates whether the data is a fit or misfit to the regression model. Thus, it helps stock market players to understand the fluctuation occurring in the asset prices, letting them assess their future price movements.

Regression functions are formed to predict the movement of stock prices. But the benefit of these regression models depends on whether they well explain the variance in stock prices. However, if there are errors or residuals in the model unexplained by regression, then the model may not be useful in predicting future stock movements.

As a result, the investors and money managers get an opportunity to make the best and most well-informed decisions using RSS. In addition, RSS also lets policymakers analyze various variables affecting the economic stability of a nation and frame the economic models accordingly.


Here is the formula to calculate the residual sum of squares:


Calculation Example

Let’s consider the following residual sum of squares example based on the set of data below:

The absolute variance can be easily found out by implementing the above RSS formula:

= {1 – [1+(2*0)]}2 + {2 – [1+(2*1)]}2 + {6 – [1+(2*2)]}2 + {8 – [1+(2*3)]}2

= 0+1+1+1 = 3

the regression model represented

Frequently Asked Questions (FAQs)

What is Residual Sum of Squares (RSS)?

RSS is a statistical method used to detect the level of discrepancy in a dataset not revealed by regression. If the residual sum of squares results in a lower figure, it signifies that the regression model explains the data better than when the result is higher. In fact, if its value is zero, it’s regarded as the best fit with no error at all.

What is the difference between ESS and RSS?

ESS stands for Explained Sum of Squares, which marks the variation in the data explained by the regression model. On the other hand, Residual Sum of Squares (RSS) defines the variations marked by the discrepancies in the dataset not explained by the estimation model.

How do TSS and RSS differ?

The Total Sum of Squares (TSS) defines the variations in the observed values or datasets from the mean. In contrast, the Residual Sum of Squares (RSS) assesses the errors or discrepancies in the observed data and the modeled data.


GIA SƯ DẠY SAT ⭐️⭐️⭐️⭐️⭐️

Mọi chi tiết liên hệ với chúng tôi :
Các số điện thoại tư vấn cho Phụ Huynh :
Điện Thoại : 091 62 65 673 hoặc 01634 136 810
Các số điện thoại tư vấn cho Gia sư :
Điện thoại : 0902 968 024 hoặc 0908 290 601

Hãy bình luận đầu tiên

Để lại một phản hồi

Thư điện tử của bạn sẽ không được hiện thị công khai.