Correlation Coefficient

The correlation coefficient is used in statistics to measure the strength of the association between two given variables. The range of a correlation coefficient varies from – 1 to +1. In this article, the correlation coefficient and the concept of Central Limit Theorem are discussed.

• – 1 indicating a negative correlation. The variables are in opposite directions: if there is a positive increase in one of the variables, there is a decrease in the second variable.
• + 1 indicating a positive correlation. When there is a positive increase in one variable, it results in a positive increase in the second variable as well.
• 0 indicates no correlation exists between the variables.

The most familiar type of correlation coefficient is the Pearson correlation coefficient. The Pearson correlation coefficient can be calculated using the following steps which are as follows:

• Find the covariance of the two variables.
• Determine the standard deviation of both the variables individually. Standard deviation is the amount of spread of the numbers from the mean.
• The Pearson Correlation Coefficient can be obtained by the ratio of the covariance and the product of the two variables’ standard deviations.

xy = cov (x, y) / 𝜎x 𝜎y

where: ⍴xy =Pearson product-moment correlation coefficient

Cov (x, y) = covariance of variables x and y

σx = standard deviation of x

σy = standard deviation of y

The relationship between the two variables such that one of the variables is high and the other is low is Inverse correlation.

Central Limit Theorem

Statement of Central Limit Theorem: It states that if large random samples are drawn with replacement from the population having mean μ and standard deviation σ, then the sample means’ distribution will be normally distributed. Central Limit Theorem holds good irrespective of the population being skewed or normal, but the sample size should be large (n ≥ 30). It is also true if the population follows binomial distribution if the minimum of (np, n [1 – p]) > 5,  where, n is the sample size and p is the probability of success of the population.

The sample means and sample standard deviation that are derived from the population are computed using the below formula:

A few examples of Central Limit Theorem are

• If a coin is flipped many times, the probability of obtaining a certain number of heads approaches a normal distribution, provided the mean = (1 / 2) of the total number of times the coin flips. It will be equal to a normal distribution at the limit of an infinite number of flips.
• The total distance covered by morning or an evening walk will approximate towards a normal distribution.
• In the cases involving electronic noise or examination grades where a single measured quantity can be weighted as the average of several small effects.

The steps used to solve the problem of the central limit theorem that involve ‘>’ ‘<’ or “between” are given below:

1) The numerical values of mean, size of the population, standard deviation, sample size and the value or values associated with the respective sign should be identified from the given problem.

2) Draw a graph with the centre as a mean.

3) The formula to find the z-score is given by:

4) To obtain the z-value, the z-table should be referred.

5) The different cases where the problem consists of different signs are as follows:

a] Central Limit Theorem with “>”: 0.5 – [z-score value]

b] Central Limit Theorem with “<”: 0.5 + [z-score value]

c] Central limit theorem involving “between”.

Step 3 has to be followed.

5) Z-value and x{bar} has to be found. It is converted from decimal to percentage in the final step.

Applications of Central Limit Theorem

1] There exists a decrease in the sample mean deviation as there is an increase in the samples that are taken from the population which helps in finding the mean of the population more precisely.

2] The sample mean is used in finding a range of values which includes the population mean.

3] The concept of Central Limit Theorem is applied in political/election polls to obtain the % of people who are in favour of a particular candidate as confidence intervals which are telecasted by the new channels.