How is the calculator picking the “best” regression line?
|Is the calculator checking residuals on EVERY possible regression line to find the least squares line (the best fitting line)? After all, by definition, the least squares line is the line which has the smallest sum of the squares of the residuals.|
Well, the answer is “probably not”! How would the calculator know that it had checked a sufficient number of lines and residuals to have arrived at the absolute best choice?
It is more likely that your calculator is using a formula for the best fit regression line, also called the least squares line. Don’t get too excited that you have discovered a short cut to finding the best regression lines! Such formulas can be rather messy. Here is the formula for a linear least squares line, or best fit straight line:
(Remember that the equation of a line is y = mx + b, with m = slope and b = y-intercept.)
Isn’t it grand that the calculator does all of this work and gives us, in a manner of seconds, a regression equation that is the best fitting regression equation!
Line of Best Fit (Least Square Method)
A line of best fit is a straight line that is the best approximation of the given set of data.
It is used to study the nature of the relation between two variables. (We’re only considering the two-dimensional case, here.)
A line of best fit can be roughly determined using an eyeball method by drawing a straight line on a scatter plot so that the number of points above the line and below the line is about equal (and the line passes through as many points as possible).
A more accurate way of finding the line of best fit is the least square method .
How to Find the Line of Best Fit in 3 Steps
Imagine you are at a new marketing job. You have a set of data in Excel in front of you about sales numbers, and a scatter plot of those data points in a graphing calculator on your desk. Your boss comes by and asks you to give a regression analysis of the data by noon — he needs to know the trend line of the sales. You rack your brain for how to find the line of best fit, remembering that it involves something with finding a straight line on a scatter plot. What do you do?
The least squares regression is a simple linear regression analysis that is used to find the slope of the line that best fits or represents a set of data points.
A linear equation represents the linear relationship between the x-values and y-values of the points on a graph or chart.
3 Steps to Find the Equation for the Line of Best Fit
Real-world data sets don’t have perfect or exact lines. Your job is to find an equation of a line that can represent or approximate the data. This is called the line of best fit or the regression line.
You could eyeball the graph, draw a line, and pick some random numbers. Or, you could use the least squares regression to methodically figure out the line of best fit. Here’s how.
Step 1: Find the Slope
To find the slope of our line of best fit, assemble your data into each column of a chart like the one below. Here’s what each column represents:
How to Find the Line of Best Fit
The least squares regression is one common way to find the equation of the line of best fit for any set of data you might come across in the real world.
- Step 1 is to calculate the average x-value and average y-values. From there, you do some computations to find the slope of the line of best fit.
- Step 2 is to use that slope to find the y-intercept.
- Step 3 is to put it all together.
Whether you are in class or at a job, now you can say with confidence how to find the line of best fit for any set of data!
What is the Line of Best Fit?
The line of best fit (or trendline) is an educated guess about where a linear equation might fall in a set of data plotted on a scatter plot. Trend lines are usually plotted with software, as once you’ve got more than a few points on a piece of paper, it can be difficult to figure out where that line of best fit might be.
This handy applet from Illinois State University is free and allows you to plot a series of points (up to 10) and find the line of best fit. This first graph I made was for the points:
- (1, 2)
- (2, 3)
- (3, 4)
- (4, 5)
- (5, 6)
Not surprisingly, the line of best fit traveled through the center of the five dots.
Look what happens when one of the points is moved down:
The line of best fit drops slightly lower. That’s because the dropped point acts like gravity, pulling the best fit line downward.
Just because you get a line of best fit, doesn’t mean that it makes sense. Take this set of unrelated (scattered) data points. If you look at the points by themselves, there clearly isn’t any kind of trend. But the software will give you a guesstimate anyway.
You should always plot your data on a scatter plot before you get your line of best fit, and eyeball your graph to see if a linear equation makes sense for your data. It’s possible to find non-linear lines of best fit (like polynomial functions), but if you’ve got completely random data, it’s possible that the line of best fit is going to be a pretty awful guesstimate.
Equation for the Line of Best Fit
Our online linear regression calculator will give you an equation to go with your data. For example, the first graph above gives the equation y = 1 + 1x. If you graph this equation on a graphing calculator (such as this one), you’ll see that the line matches perfectly with the line in the first image above. You can find a linear regression by hand, but I wouldn’t recommend it as the process is very tedious and it’s easy for errors to slip in.
A line of best fit is usually found through Simple Linear Regression. The following software programs can perform linear regression (and most other types of regression analysis):
Types of Trendline
This is a good choice when a set of data points appear to be following a straight line. The line is the line of best fit; a straight line that’s a good approximation of the data.
Polynomial Trend line
A polynomial trend line has a series of curves and bumps. In the real world, data usually follows a polynomial trendline (as opposed to a linear trendline, which is rarely seen).
The image above shows a polynomial with one curve (a parabola); this is called a second degree polynomial. Data points with a series of bumps and curves can be fitted to third degree and higher polynomials.
An exponential line can show exponential growth or exponential decay. It’s useful when data points grow (or fall) at extremely fast rates.
Best Fit Lines
In some instances a linear relationship exists between the dependent and independent variables. When a linear relationship exists between the two variables plotted a best fit line can be drawn. The best fit line should pass as close to all of the points as possible but does not necessarily have to pass through any of them. The volume of a liquid sample and its mass are linearly related. In the set of experiments shown below, the volume of a number of liquid samples with different masses were measured, the results were plotted and a best fit line was drawn.
The best fit line can be represented using the equation for a straight line. To determine the slope, determine the slope between two points on the best fit line that lie far apart from each other. Using the points (4.5, 9.0) and (1.5, 3.0) the slope would be (9.0 – 3.0) / (4.5 – 1.5) or 2.0 ml/gram. Reading the y-intercept off this graph (where line crosses the y-axis) gives us a value of -0.1. Therefore, the best fit line has the equation, y = 2.0x – 0.1, or:
volume = 2.0 (mass) – 0.1
The equation of the line could also be calculated using linear regression on a calculator or computer. Performing linear regression on this data gives similar results.
volume = 1.982 (mass) + 0.002
✅ Math Formulas ⭐️⭐️⭐️⭐️⭐