✅ Linear Correlation Coefficient Formula ⭐️⭐️⭐️⭐️⭐

5/5 - (1 bình chọn)

Mục Lục

Correlation Coefficient: Simple Definition, Formula, Easy Steps

Correlation coefficients are used to measure how strong a relationship is between two variables. There are several types of correlation coefficient, but the most popular is Pearson’s. Pearson’s correlation (also called Pearson’s R) is a correlation coefficient commonly used in linear regression. If you’re starting out in statistics, you’ll probably learn about Pearson’s R first. In fact, when anyone refers to the correlation coefficient, they are usually talking about Pearson’s.

Correlation Coefficient Formula: Definition

Correlation coefficient formulas are used to find how strong a relationship is between data. The formulas return a value between -1 and 1, where:

  • 1 indicates a strong positive relationship.
  • -1 indicates a strong negative relationship.
  • A result of zero indicates no relationship at all.

Meaning

  • A correlation coefficient of 1 means that for every positive increase in one variable, there is a positive increase of a fixed proportion in the other. For example, shoe sizes go up in (almost) perfect correlation with foot length.
  • A correlation coefficient of -1 means that for every positive increase in one variable, there is a negative decrease of a fixed proportion in the other. For example, the amount of gas in a tank decreases in (almost) perfect correlation with speed.
  • Zero means that for every increase, there isn’t a positive or negative increase. The two just aren’t related.

The absolute value of the correlation coefficient gives us the relationship strength. The larger the number, the stronger the relationship. For example, |-.75| = .75, which has a stronger relationship than .65.

Types of correlation coefficient formulas.

There are several types of correlation coefficient formulas.

One of the most commonly used formulas is Pearson’s correlation coefficient formula. If you’re taking a basic stats class, this is the one you’ll probably use:

Pearson correlation coefficient

Two other formulas are commonly used: the sample correlation coefficient and the population correlation coefficient.

Sample correlation coefficient

Sx and sy are the sample standard deviations, and sxy is the sample covariance.

Population correlation coefficient

The population correlation coefficient uses σx and σy as the population standard deviations, and σxy as the population covariance.

Potential problems with Pearson correlation.

The PPMC is not able to tell the difference between dependent variables and independent variables. For example, if you are trying to find the correlation between a high calorie diet and diabetes, you might find a high correlation of .8. However, you could also get the same result with the variables switched around. In other words, you could say that diabetes causes a high calorie diet. That obviously makes no sense. Therefore, as a researcher you have to be aware of the data you are plugging in. In addition, the PPMC will not give you any information about the slope of the line; it only tells you whether there is a relationship.

Real Life Example

Pearson correlation is used in thousands of real life situations. For example, scientists in China wanted to know if there was a relationship between how weedy rice populations are different genetically. The goal was to find out the evolutionary potential of the rice. Pearson’s correlation between the two groups was analyzed. It showed a positive Pearson Product Moment correlation of between 0.783 and 0.895 for weedy rice populations. This figure is quite high, which suggested a fairly strong relationship.

If you’re interested in seeing more examples of PPMC, you can find several studies on the National Institute of Health’s Openi website, which shows result on studies as varied as breast cyst imaging to the role that carbohydrates play in weight loss.

Example question: Find the value of the correlation coefficient from the following table:

SubjectAge xGlucose Level y
14399
22165
32579
44275
55787
65981

Step 1: Make a chart. Use the given data, and add three more columns: xy, x2, and y2.

SubjectAge xGlucose Level yxyx2y2
14399
22165
32579
44275
55787
65981

Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99 = 4,257.

SubjectAge xGlucose Level yxyx2y2
143994257
221651365
325791975
442753150
557874959
659814779

Step 3: Take the square of the numbers in the x column, and put the result in the x2 column.

SubjectAge xGlucose Level yxyx2y2
1439942571849
221651365441
325791975625
4427531501764
5578749593249
6598147793481

Step 4: Take the square of the numbers in the y column, and put the result in the y2 column.

SubjectAge xGlucose Level yxyx2y2
14399425718499801
2216513654414225
3257919756256241
44275315017645625
55787495932497569
65981477934816561

Step 5: Add up all of the numbers in the columns and put the result at the bottom of the column. The Greek letter sigma (Σ) is a short way of saying “sum of” or summation.

SubjectAge xGlucose Level yxyx2y2
14399425718499801
2216513654414225
3257919756256241
44275315017645625
55787495932497569
65981477934816561
Σ247486204851140940022

Step 6: Use the following correlation coefficient formula.

The answer is: 2868 / 5413.27 = 0.529809

From our table:

  • Σx = 247
  • Σy = 486
  • Σxy = 20,485
  • Σx2 = 11,409
  • Σy2 = 40,022
  • n is the sample size, in our case = 6

The correlation coefficient =

  • 6(20,485) – (247 × 486) / [√[[6(11,409) – (2472)] × [6(40,022) – 4862]]]

The range of the correlation coefficient is from -1 to 1. Our result is 0.5298 or 52.98%, which means the variables have a moderate positive correlation.

Correlation Formula: TI 83

If you’re taking AP Statistics, you won’t actually have to work the correlation formula by hand. You’ll use your graphing calculator. Here’s how to find r on a TI83.

Step 1: Type your data into a list and make a scatter plot to ensure your variables are roughly correlated. In other words, look for a straight line. Not sure how to do this? See: TI 83 Scatter plot.

Step 2: Press the STAT button.

Step 3: Scroll right to the CALC menu.

Step 4: Scroll down to 4:LinReg(ax+b), then press ENTER. The output will show “r” at the very bottom of the list.

Tip: If you don’t see r, turn Diagnostic ON, then perform the steps again.

How to Compute the Pearson Correlation Coefficient in Excel

Step 1: Type your data into two columns in Excel. For example, type your “x” data into column A and your “y” data into column B.

Step 2: Select any empty cell.

Step 3: Click the function button on the ribbon.

Step 4: Type “correlation” into the ‘Search for a function’ box.

Step 5: Click “Go.” CORREL will be highlighted.

Step 6: Click “OK.”

Step 7: Type the location of your data into the “Array 1” and “Array 2” boxes. For this example, type “A2:A10” into the Array 1 box and then type “B2:B10” into the Array 2 box.

Step 8: Click “OK.” The result will appear in the cell you selected in Step 2. For this particular data set, the correlation coefficient(r) is -0.1316.

Caution: The results for this test can be misleading unless you have made a scatter plot first to ensure your data roughly fits a straight line. The correlation coefficient in Excel 2007 will always return a value, even if your data is something other than linear (i.e. the data fits an exponential model).

That’s it!

Correlation Coefficient SPSS: Overview.

Step 1: Click “Analyze,” then click “Correlate,” then click “Bivariate.” The Bivariate Correlations window will appear.

Step 2: Click one of the variables in the left-hand window of the Bivariate Correlations pop-up window. Then click the center arrow to move the variable to the “Variables:” window. Repeat this for a second variable.

Step 3: Click the “Pearson” check box if it isn’t already checked. Then click either a “one-tailed” or “two-tailed” test radio button. If you aren’t sure if your test is one-tailed or two-tailed, see: Is it a a one-tailed test or two-tailed test?

Step 4: Click “OK” and read the results. Each box in the output gives you a correlation between two variables. For example, the PPMC for Number of older siblings and GPA is -.098, which means practically no correlation. You can find this information in two places in the output. Why? This cross-referencing columns and rows is very useful when you are comparing PPMCs for dozens of variables.

Tip #1: It’s always a good idea to make an SPSS scatter plot of your data set before you perform this test. That’s because SPSS will always give you some kind of answer and will assume that the data is linearly related. If you have data that might be better suited to another correlation (for example, exponentially related data) then SPSS will still run Pearson’s for you and you might get misleading results.
Tip #2: Click on the “Options” button in the Bivariate Correlations window if you want to include descriptive statistics like the mean and standard deviation.

Minitab

The Minitab correlation coefficient will return a value for r from -1 to 1.

Example question: Find the Minitab correlation coefficient based on age vs. glucose level from the following table from a pre-diabetic study of 6 participants:

SubjectAge xGlucose Level y
14399
22165
32579
44275
55787
65981

Step 1: Type your data into a Minitab worksheet. I entered this sample data into three columns.

Step 2: Click “Stat”, then click “Basic Statistics” and then click “Correlation.”

Step 3: Click a variable name in the left window and then click the “Select” button to move the variable name to the Variable box. For this example question, click “Age,” then click “Select,” then click “Glucose Level” then click “Select” to transfer both variables to the Variable window.

Step 4: (Optional) Check the “P-Value” box if you want to display a P-Value for r.

Step 5: Click “OK”. The Minitab correlation coefficient will be displayed in the Session Window. If you don’t see the results, click “Window” and then click “Tile.” The Session window should appear.

For this dataset:

  • Value of r: 0.530
  • P-Value: 0.280

That’s it!

Tip: Give your columns meaningful names (in the first row of the column, right under C1, C2 etc.). That way, when it comes to choosing variable names in Step 3, you’ll easily see what it is you are trying to choose. This becomes especially important when you have dozens of columns of variables in a data sheet!

Meaning of the Linear Correlation Coefficient

Pearson’s Correlation Coefficient is a linear correlation coefficient that returns a value of between -1 and +1. A -1 means there is a strong negative correlation and +1 means that there is a strong positive correlation. A 0 means that there is no correlation (this is also called zero correlation).

This can initially be a little hard to wrap your head around (who likes to deal with negative numbers?). The Political Science Department at Quinnipiac University posted this useful list of the meaning of Pearson’s Correlation coefficients. They note that these are “crude estimates” for interpreting strengths of correlations using Pearson’s Correlation:

r value =
+.70 or higherVery strong positive relationship
+.40 to +.69Strong positive relationship
+.30 to +.39Moderate positive relationship
+.20 to +.29weak positive relationship
+.01 to +.19No or negligible relationship
0No relationship [zero correlation]
-.01 to -.19No or negligible relationship
-.20 to -.29weak negative relationship
-.30 to -.39Moderate negative relationship
-.40 to -.69Strong negative relationship
-.70 or higherVery strong negative relationship

It may be helpful to see graphically what these correlations look like:

The images show that a strong negative correlation means that the graph has a downward slope from left to right: as the x-values increase, the y-values get smaller. A strong positive correlation means that the graph has an upward slope from left to right: as the x-values increase, the y-values get larger.

Cramer’s V Correlation

Cramer’s V Correlation is similar to the Pearson Correlation coefficient. While the Pearson correlation is used to test the strength of linear relationships, Cramer’s V is used to calculate correlation in tables with more than 2 x 2 columns and rows. Cramer’s V correlation varies between 0 and 1. A value close to 0 means that there is very little association between the variables. A Cramer’s V of close to 1 indicates a very strong association.

Cramer’s V
.25 or higherVery strong relationship
.15 to .25Strong relationship
.11 to .15Moderate relationship
.06 to .10weak relationship
.01 to .05No or negligible relationship

Where did the Correlation Coefficient Come From?

A correlation coefficient gives you an idea of how well data fits a line or curve. Pearson wasn’t the original inventor of the term correlation but his use of it became one of the most popular ways to measure correlation.

Francis Galton (who was also involved with the development of the interquartile range) was the first person to measure correlation, originally termed “co-relation,” which actually makes sense considering you’re studying the relationship between a couple of different variables. In Co-Relations and Their Measurement, he said

“The statures of kinsmen are co-related variables; thus, the stature of the father is correlated to that of the adult son,..and so on; but the index of co-relation … is different in the different cases.”

It’s worth noting though that Galton mentioned in his paper that he had borrowed the term from biology, where “Co-relation and correlation of structure” was being used but until the time of his paper it hadn’t been properly defined.

In 1892, British statistician Francis Ysidro Edgeworth published a paper called “Correlated Averages,” Philosophical Magazine, 5th Series, 34, 190-204 where he used the term “Coefficient of Correlation.” It wasn’t until 1896 that British mathematician Karl Pearson used “Coefficient of Correlation” in two papers: Contributions to the Mathematical Theory of Evolution and Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity and Panmixia. It was the second paper that introduced the Pearson product-moment correlation formula for estimating correlation.

Correlation Coefficient Hypothesis Test

If you can read a table — you can test for correlation coefficient. Note that correlations should only be calculated for an entire range of data. If you restrict the range, r will be weakened.

Sample problem: test the significance of the correlation coefficient r = 0.565 using the critical values for PPMC table. Test at α = 0.01 for a sample size of 9.

Step 1: Subtract two from the sample size to get df, degrees of freedom.
9 – 7 = 2

Step 2: Look the values up in the PPMC Table. With df = 7 and α = 0.01, the table value is = 0.798

Step 3: Draw a graph, so you can more easily see the relationship.

r = 0.565 does not fall into the rejection region (above 0.798), so there isn’t enough evidence to state a strong linear relationship exists in the data.

Relationship to cosine

It’s rare to use trigonometry in statistics (you’ll never need to find the derivative of tan(x) for example!), but the relationship between correlation and cosine is an exception. Correlation can be expressed in terms of angles:

  • Positive correlation = acute angle <45°,
  • Negative correlation = obtuse angle >45°,
  • Uncorrelated = orthogonal (right angle).

More specifically, correlation is the cosine of an angle between two vectors defined as follows (Knill, 2011):

If X, Y are two random variables with zero mean, then the covariance Cov[XY] = E[X · Y] is the dot product of X and Y. The standard deviation of X is the length of X.

What Do Correlation Coefficients Positive, Negative, and Zero Mean?

Correlation coefficients are indicators of the strength of the linear relationship between two different variables, x and y. A linear correlation coefficient that is greater than zero indicates a positive relationship. A value that is less than zero signifies a negative relationship. Finally, a value of zero indicates no relationship between the two variables x and y.

This article explains the significance of linear correlation coefficient for investors, how to calculate covariance for stocks, and how investors can use correlation to predict the market.

Understanding Correlation

The correlation coefficient (ρ) is a measure that determines the degree to which the movement of two different variables is associated. The most common correlation coefficient, generated by the Pearson product-moment correlation, is used to measure the linear relationship between two variables. However, in a non-linear relationship, this correlation coefficient may not always be a suitable measure of dependence.

The possible range of values for the correlation coefficient is -1.0 to 1.0. In other words, the values cannot exceed 1.0 or be less than -1.0. A correlation of -1.0 indicates a perfect negative correlation, and a correlation of 1.0 indicates a perfect positive correlation. If the correlation coefficient is greater than zero, it is a positive relationship. Conversely, if the value is less than zero, it is a negative relationship. A value of zero indicates that there is no relationship between the two variables.

Correlation and the Financial Markets

In the financial markets, the correlation coefficient is used to measure the correlation between two securities. For example, when two stocks move in the same direction, the correlation coefficient is positive. Conversely, when two stocks move in opposite directions, the correlation coefficient is negative.

If the correlation coefficient of two variables is zero, there is no linear relationship between the variables. However, this is only for a linear relationship. It is possible that the variables have a strong curvilinear relationship. When the value of ρ is close to zero, generally between -0.1 and +0.1, the variables are said to have no linear relationship (or a very weak linear relationship).

For example, suppose that the prices of coffee and computers are observed and found to have a correlation of +.0008. This means that there is no correlation, or relationship, between the two variables.

Calculating ρ

The covariance of the two variables in question must be calculated before the correlation can be determined. Next, each variable’s standard deviation is required. The correlation coefficient is determined by dividing the covariance by the product of the two variables’ standard deviations.

Standard deviation is a measure of the dispersion of data from its average. Covariance is a measure of how two variables change together. However, its magnitude is unbounded, so it is difficult to interpret. The normalized version of the statistic is calculated by dividing covariance by the product of the two standard deviations. This is the correlation coefficient.

Positive Correlation

A positive correlation—when the correlation coefficient is greater than 0—signifies that both variables move in the same direction. When ρ is +1, it signifies that the two variables being compared have a perfect positive relationship; when one variable moves higher or lower, the other variable moves in the same direction with the same magnitude.

The closer the value of ρ is to +1, the stronger the linear relationship. For example, suppose the value of oil prices is directly related to the prices of airplane tickets, with a correlation coefficient of +0.95. The relationship between oil prices and airfares has a very strong positive correlation since the value is close to +1. So, if the price of oil decreases, airfares also decrease, and if the price of oil increases, so do the prices of airplane tickets.

In the chart below, we compare one of the largest U.S. banks, JPMorgan Chase & Co. (JPM), with the Financial Select SPDR Exchange Traded Fund (ETF) (XLF).12 As you can imagine, JPMorgan Chase & Co. should have a positive correlation to the banking industry as a whole. We can see the correlation coefficient is currently at 0.98, which is signaling a strong positive correlation. A reading above 0.50 typically signals a positive correlation.

JPM and XLF Correlation Coefficient

Understanding the correlation between two stocks (or a single stock) and its industry can help investors gauge how the stock is trading relative to its peers. All types of securities, including bonds, sectors, and ETFs, can be compared with the correlation coefficient.

Negative Correlation

A negative (inverse) correlation occurs when the correlation coefficient is less than 0. This is an indication that both variables move in the opposite direction. In short, any reading between 0 and -1 means that the two securities move in opposite directions. When ρ is -1, the relationship is said to be perfectly negatively correlated.

In short, if one variable increases, the other variable decreases with the same magnitude (and vice versa). However, the degree to which two securities are negatively correlated might vary over time (and they are almost never exactly correlated all the time). 

Examples of Negative Correlation

For example, suppose a study is conducted to assess the relationship between outside temperature and heating bills. The study concludes that there is a negative correlation between the prices of heating bills and the outdoor temperature. The correlation coefficient is calculated to be -0.96. This strong negative correlation signifies that as the temperature decreases outside, the prices of heating bills increase (and vice versa).

When it comes to investing, a negative correlation does not necessarily mean that the securities should be avoided. The correlation coefficient can help investors diversify their portfolio by including a mix of investments that have a negative, or low, correlation to the stock market. In short, when reducing volatility risk in a portfolio, sometimes opposites do attract.  

For example, assume you have a $100,000 balanced portfolio that is invested 60% in stocks and 40% in bonds. In a year of strong economic performance, the stock component of your portfolio might generate a return of 12% while the bond component may return -2% because interest rates are rising (which means that bond prices are falling).

Thus, the overall return on your portfolio would be 6.4% ((12% x 0.6) + (-2% x 0.4). The following year, as the economy slows markedly and interest rates are lowered, your stock portfolio might generate -5% while your bond portfolio may return 8%, giving you an overall portfolio return of 0.2%.

What if, instead of a balanced portfolio, your portfolio were 100% equities? Using the same return assumptions, your all-equity portfolio would have a return of 12% in the first year and -5% in the second year. These figures are clearly more volatile than the balanced portfolio’s returns of 6.4% and 0.2%.

Linear Correlation Coefficient

The linear correlation coefficient is a number calculated from given data that measures the strength of the linear relationship between two variables: x and y. The sign of the linear correlation coefficient indicates the direction of the linear relationship between x and y. When r (the correlation coefficient) is near 1 or −1, the linear relationship is strong; when it is near 0, the linear relationship is weak.

Even for small datasets, the computations for the linear correlation coefficient can be too long to do manually. Thus, data are often plugged into a calculator or, more likely, a computer or statistics program to find the coefficient.

The Pearson Coefficient

Both the Pearson coefficient calculation and basic linear regression are ways to determine how statistical variables are linearly related. However, the two methods do differ. The Pearson coefficient is a measure of the strength and direction of the linear association between two variables with no assumption of causality. The Pearson coefficient shows correlation, not causation. Pearson coefficients range from +1 to -1, with +1 representing a positive correlation, -1 representing a negative correlation, and 0 representing no relationship.

Simple linear regression describes the linear relationship between a response variable (denoted by y) and an explanatory variable (denoted by x) using a statistical model. Statistical models are used to make predictions.

In finance, for example, correlation is used in several analyses including the calculation of portfolio standard deviation. Because it is so time-consuming, correlation is best calculated using software like Excel. Correlation combines statistical concepts, namely, variance and standard deviation. Variance is the dispersion of a variable around the mean, and standard deviation is the square root of variance.

Finding Correlation Using Excel

There are several methods to calculate correlation in Excel. The simplest is to get two data sets side-by-side and use the built-in correlation formula:

If you want to create a correlation matrix across a range of data sets, Excel has a Data Analysis plugin that is found on the Data tab, under Analyze. 

Select the table of returns. In this case, our columns are titled, so we want to check the box “Labels in first row,” so Excel knows to treat these as titles. Then you can choose to output on the same sheet or on a new sheet.

Once you hit enter, the data is automatically created. You can add some text and conditional formatting to clean up the result.

Linear Correlation Coefficient Frequently Asked Questions

What Is the Linear Correlation Coefficient?

The linear correlation coefficient is a number calculated from given data that measures the strength of the linear relationship between two variables, x and y.

How Do You Find the Linear Correlation Coefficient?

Correlation combines several important and related statistical concepts, namely, variance and standard deviation. Variance is the dispersion of a variable around the mean, and standard deviation is the square root of variance. 

The formula is:

The computing is too long to do manually, and sofware, such as Excel, or a statistics program, are tools used to calculate the coefficient.

What Is Meant By Linear Correlation?

The correlation coefficient is a value between -1 and +1. A correlation coefficient of +1 indicates a perfect positive correlation. As variable x increases, variable y increases. As variable x decreases, variable y decreases. A correlation coefficient of -1 indicates a perfect negative correlation. As variable x increases, variable z decreases. As variable x decreases, variable z increases.

How Do You Find the Linear Correlation Coefficient on a Calculator?

A graphing calculator is required to calculate the correlation coefficient. The following instructions are provided by Statology.

Step 1: Turn on Diagnostics

You will only need to do this step once on your calculator. After that, you can always start at step 2 below. If you don’t do this, r (the correlation coefficient) will not show up when you run the linear regression function.

  • Press [2nd] and then [0] to enter your calculator’s catalog. Scroll until you see “diagnosticsOn”.
  • Press enter until the calculator screen says “Done”.

This is important to repeat: You never have to do this again unless you reset your calculator.

Step 2: Enter Data

Enter your data into the calculator by pressing [STAT] and then selecting 1:Edit. To make things easier, you should enter all of your “x data” into L1 and all of your “y data” into L2.

Step 3: Calculate!

Once you have your data in, you will now go to [STAT] and then the CALC menu up top. Finally, select 4:LinReg and press enter.

That’s it! You’re are done! Now you can simply read off the correlation coefficient right from the screen (its r). Remember, if r doesn’t show on your calculator, then diagnostics need to be turned on. This is also the same place on the calculator where you will find the linear regression equation and the coefficient of determination.

The Bottom Line

The linear correlation coefficient can be helpful in determining the relationship between an investment and the overall market or other securities. It is often used to predict stock market returns. This statistical measurement is useful in many ways, particularly in the finance industry.

For example, it can be helpful in determining how well a mutual fund is behaving compared to its benchmark index, or it can be used to determine how a mutual fund behaves in relation to another fund or asset class. By adding a low, or negatively correlated, mutual fund to an existing portfolio, diversification benefits are gained.

Linear Correlation Coefficient Formula

To find out the relation between two variables in a population, linear correlation formula is used. To see how the variables are connected we will use the linear correlation. Also known as “Pearson’s Correlation”, a linear correlation is denoted by r” and the value will be between -1 and 1.

The elements denote a strong relationship if the product is 1. Similarly, if the coefficient comes close to -1, it has a negative relation. If the Linear coefficient is zero means there is no relation between the data given.

Where “n” is the number of observations, “xi” and “yi “are the variables.

Solved Examples

Question 1: Calculate the linear correlation coefficient for the following data. X = 4, 8 ,12, 16 and Y = 5, 10, 15, 20.

Solution:

Given variables are,

X = 4, 8 ,12, 16 and Y = 5, 10, 15, 20

For finding the linear coefficient of these data, we need to first construct a table for the required values.

xyx2y2XY
45162520
8106410080
1215144225180
1620256400320
Σ x = 40Σ y =50480750600

According to the formula of linear correlation we have,

Correlation Coefficient Formula

Correlation coefficient formula is given and explained here for all of its types. There are various formulas to calculate the correlation coefficient and the ones covered here include Pearson’s Correlation Coefficient Formula, Linear Correlation Coefficient Formula, Sample Correlation Coefficient Formula, and Population Correlation Coefficient Formula. Before going to the formulas, it is important to understand what correlation and correlation coefficient is. A brief introduction is given below and to learn about them in detail, click the linked article.

About Correlation Coefficient

The correlation coefficient is a measure of the association between two variables. It is used to find the relationship is between data and a measure to check how strong it is. The formulas return a value between -1 and 1, where -1 shows negative correlation and +1 shows a positive correlation.

The correlation coefficient value is positive when it shows that there is a correlation between the two values and the negative value shows the amount of diversity among the two values.

What Is the Correlation Coefficient Formula?

The correlation coefficient is a statistical concept. It establishes a relation between predicted and actual values obtained at the end of a statistical experiment. The correlation coefficient formula helps to calculate the relationship between two variables and thus the result so obtained explains the exactness between the predicted and actual values. 

Pearson Correlation Coefficient Formula:

1. Sample Correlation Coefficient

The formula for pearson correlation coefficient for population of size (written as ρX, Y) is given as:

standard deviation of Xand σY is standard deviation of Y.

Given X and Y are two random variables.

2. Population Correlation Coefficient

The formula for pearson correlation coefficient for sample of size n (written as rxy) is given as:

where is the sample size, xi yi are the ith sample points and x̄ ȳ are the sample means for the random variables X and Y respectively.

Given X and Y are two random variables.

3. Linear Correlation Coefficient

It uses pearson’s correlation coefficient to determine the linear relationship between two variables. Its value lies between -1 and 1. It is given as:

where is the sample size, xi yi are the ith sample points and x̄ ȳ are the sample means for the random variables x and y respectively.

The sign of r indicates the strength of the linear relationship between the variables.

  • If r is near 1, then the two variables have a strong linear relationship.
  • If r is near 0, then the two variables have no linear relation.
  • If r is near -1, then the two variables have a weak (negative) linear relationship.

Let us see the applications of the correlation coefficient formula in the following section.

Examples using Correlation Coefficient Formula

Example 1. Given the following population data. Find the Pearson correlation coefficient between x and y for this data. (Take 1√7 as 0.378)

x6008001000
y120010002000

Solution:

To simplify the calculation, we divide both x and y by 100.

Example 2. A survey was conducted in your city. Given is the following sample data containing a person’s age and their corresponding income. Find out whether the increase in age has an effect on income using the correlation coefficient formula. (Use 1√181 as 0.074 and 1√2091 as 0.07)

Age25303643
Income30000440005200070000

Solution:

To simplify the calculation, we divide y by 1000.

Answer: Yes, with the increase in age a person’s income increases as well, since the Pearson correlation coefficient between age and income is very close to 1.

Example 3: Calculate the Correlation coefficient of given data.

x4142434445
y3.23.33.43.53.6

Solution:

Here n = 5

Let us find ∑x , ∑y, ∑xy, ∑x 2, ∑y2

xyxyx2y2
413.2131.2168110.24
423.3138.6176410.89
433.4146.2184911.56
443.5154193612.25
453.6162202512.96
∑x = 215∑y = 17∑xy = 732∑x2 = 9255∑y2 = 57.9

values:

∑x = 215

∑x2 = 9255

x̄  = 43

∑(x – x̄)2 = σσ=10  

Y values:

∑y = 17

∑y2 = 57.9

∑(y – ȳ)2 =σσ= 0.1

X and Y combined

N = 5

∑((x – x̄)(y – ȳ)) = 1

∑xy = 732

R calculation:

r = ∑((x – x̄)(y – ȳ))/√((σσx)(σσy))

r = 1/√((10)(0.1)) = 1

Since r = 1, this indicates significant relation between x and y.

FAQs on Correlation Coefficient Formula

What Is Correlation Coefficient Formula in Statistics?

The correlation coefficient formula determines the relationship between two variables in a dataset and thus checks for the exactness between the predicted and actual values.

How To Use Correlation Coefficient Formula?

We can use the coefficient correlation formula to calculate the Pearson product-moment correlation,

  • Step 1: Determine the covariance of the two given variables.
  • Step 2: Calculate the standard deviation of each variable.
  • Step 3: Divide the covariance by the product of the standard deviations of two variables.

What Is n in the Correlation Coefficient Formula?

In the coefficient correlation formula, n refers to the sample size.

What Are the Applications of Correlation Coefficient Formula?

Given below are the most important applications of the coefficient correlation formula:

  • The coefficient correlation formula helps in the analysis of the given data by quantifying the degree to which two variables are related which further depicts a linear relationship between two variables.
  • It is used for financial analysis as it determines the relationship between data sets in business and thus, in a way support decision making.
  • It helps a lot in decision-making in various fields as it helps to understand the strength of the relationship between two different variables.

The linear correlation coefficient has the following properties, illustrated in Figure 10.4 “Linear Correlation Coefficient “:

  1. The value of r lies between −1 and 1, inclusive.
  2. The sign of r indicates the direction of the linear relationship between x and y:
    1. If r<0

then y tends to decrease as x is increased. If r>0

  1. then y tends to increase as x is increased.

The size of |r| indicates the strength of the linear relationship between x and y:

  1. If |r| is near 1 (that is, if r is near either 1 or −1) then the linear relationship between x and y is strong.
  2. If |r| is near 0 (that is, if r is near 0 and of either sign) then the linear relationship between x and y is weak.

Pay particular attention to panel (f) in Figure 10.4 “Linear Correlation Coefficient “. It shows a perfectly deterministic relationship between x and y, but r=0 because the relationship is not linear. (In this particular case the points lie on the top half of a circle.)

Example 1

Compute the linear correlation coefficient for the height and weight pairs plotted in Figure 10.2 “Plot of Height and Weight Pairs”.

Solution:

Even for small data sets like this one computations are too long to do completely by hand. In actual practice the data are entered into a calculator or computer and a statistics program is used. In order to clarify the meaning of the formulas we will display the data and related quantities in tabular form. For each (x,y) pair we compute three numbers: x2, xy, and y2, as shown in the table provided. In the last line of the table we have the sum of the numbers in each column. Using them we compute:

so that

The number r=0.868 quantifies what is visually apparent from Figure 10.2 “Plot of Height and Weight Pairs”: weights tends to increase linearly with height (r is positive) and although the relationship is not perfect, it is reasonably strong (r is near 1).

Key Takeaways

  • The linear correlation coefficient measures the strength and direction of the linear relationship between two variables x and y.
  • The sign of the linear correlation coefficient indicates the direction of the linear relationship between x and y.
  • When r is near 1 or −1 the linear relationship is strong; when it is near 0 the linear relationship is weak.

Example of Correlation

John is an investor. His portfolio primarily tracks the performance of the S&P 500 and John wants to add the stock of Apple Inc. Before adding Apple to his portfolio, he wants to assess the correlation between the stock and the S&P 500 to ensure that adding the stock won’t increase the systematic risk of his portfolio. To find the coefficient, John gathers the following prices for the last five years (Step 1):

Using the formula above, John can determine the correlation between the prices of the S&P 500 Index and Apple Inc.

First, John calculates the average prices of each security for the given periods (Step 2):

After the calculation of the average prices, we can find the other values. A summary of the calculations is given in the table below:

Using the obtained numbers, John can calculate the coefficient:

The coefficient indicates that the prices of the S&P 500 and Apple Inc. have a high positive correlation. This means that their respective prices tend to move in the same direction. Therefore, adding Apple to his portfolio would, in fact, increase the level of systematic risk.

Math Formulas ⭐️⭐️⭐️⭐️⭐

Hãy bình luận đầu tiên

Để lại một phản hồi

Thư điện tử của bạn sẽ không được hiện thị công khai.


*