Pearson Correlation
The Pearson correlation method is usually used as a primary check for the relationship between two variables.
The coefficient of correlation, , is a measure of the strength of the linear relationship between two variables and . It is computed as follow:
with
- , i.e. standard deviation of
- , i.e. standard deviation of
The correlation ranges between -1 and 1.
- A value of near or equal to 0 implies little or no linear relationship between and .
- In contrast, the closer comes to 1 or -1, the stronger the linear relationship.
We can compute the t-test as follow and check the distribution table with a degree of freedom equals to :
Spearman Rank Correlation
A rank correlation sorts the observations by rank and computes the level of similarity between the rank. A rank correlation has the advantage of being robust to outliers and is not linked to the distribution of the data. Note that, a rank correlation is suitable for the ordinal variable.
Spearman's rank correlation, , is always between -1 and 1 with a value close to the extremity indicates strong relationship. It is computed as follow:
with stated the covariances between rank and . The denominator calculates the standard deviations.
In R, we can use the cor() function. It takes three arguments, , and the method.
cor(x, y, method)
Arguments:
- x: First vector
- y: Second vector
- method: The formula used to compute the correlation. Three string values:
- "pearson"
- "kendall"
- "spearman"
An optional argument can be added if the vectors contain missing value: use = "complete.obs"
We will use the BudgetUK dataset. This dataset reports the budget allocation of British households between 1980 and 1982. There are 1519 observations with ten features, among them:
- wfood: share food share spend
- wfuel: share fuel spend
- wcloth: budget share for clothing spend
- walc: share alcohol spend
- wtrans: share transport spend
- wother: share of other goods spend
- totexp: total household spend in pound
- income total net household income
- age: age of household
- children: number of children