What does it mean when you add a trendline to your graph in Excel?
Check your graph with the sample graph below and make sure that your graph includes all the information that is displayed on the sample graph. The slope of the line has units of ^{mV}⁄_{g/mL}
In our sample graph (click image to enlarge) “Instrument Response Vs. Concentration”,
we selected the Trend/Regression Type as Linear, set the Intercept to be (0,0) and added a Trendline. Here we are assuming that the data are linear, and the mathematical model that the data follows is:
instrument response = m (concentration ) + b
where m is the slope, and b is the intercept.
If we assume that errors affecting the concentrations (the values of the x-axis) are insignificant, then any difference between an experimental data point and the model would be due to error in measuring the instrument’s response.
For each data point, the difference between the voltage measurement, V_{measured}, and the predicted voltage, V_{predicted}, is a residual error, RE.
RE = (V_{measured} – V_{predicted})
Because these residual errors can be positive or negative, each data point’s RE is first squared and then summed to give a total residual error, RE_{tot}.
RE_{tot} = Σ(V_{measured} – V_{predicted})^{2}
If different values for the slope and intercept for the line are picked, it would lead to different total residual errors. Therefore, the best-fit line is the line that has the best values for the slope and intercept, such that it would lead to the smallest total residual error.
Go to this link to see if you can find the best-fit line of a given set of data point using least squares visually.
- Put your mouse over the red dot that is labeled y-intercept and drag the line up and down the y-axis.
- Put your mouse over the red dot that is labeled slope and change the slope of the line.
Use a combination of these motions to obtain a line that will give the smallest total area of all the squares. Below is my best-fit line. My best effort is 811. What’s yours?
(Screen capture from http://www.dynamicgeometry.com)
Well done, you have just completed a linear regression analysis on a set of points. Now you understand the algorithm that a computer program uses to find a best-fit line for a set of data that follows a linear mathematical model. The simplest way to get the best-fit line through a set of point in Excel is to add the Trendline. The calculation that goes on to get the best-fit line is done for you. The calculations involved for a set of data with n data points, is as follows:
SLOPE | |
INTERCEPT |
The correlation coefficient, r, gives us a measure of the reliability of the linear relationship between the x and y values. A value of r = 1 indicates an exact linear relationship between x and y. Values of r close to 1 indicate excellent linear reliability. If the correlation coefficient is much less than 1, the predictions based on the linear relationship will not be reliable.
correlation coefficient, r |
Recall our sample graph as plotted by Excel (click on the image to enlarge the image). The trendline equation was determined with the y-intercept set to (0.0). The R^{2} value that was displayed is the square of the correlation coefficient, r.
We can ask Excel to do these calculations manually. Take a look at the slope of the best-fit line done manually. The slope and y-intercept are very similar. In the manual calculation, the y-intercept is not restricted to (0,0). Click on the image to enlarge the image.
That’s it! Now you have some understanding about what Excel does when you add a Trendline in Excel.