In a world full of data, we must learn how to analyze it and extract meaningful information. Let’s look at a few basic statistical methods that can be used to discover patterns and trends from raw data.
The mean of a data set gives us a general trend of where most of the values lie. However, it is prone to inconsistencies caused by outliers in the data.
Data |
---|
5 |
9 |
3 |
6 |
Mean = ( 5 + 9 + 4 + 6 ) / 4 \ = 6
This shows that the values in the data set lie close to 6
. However, if the same data consisted of a value 1200
, then the average would come out to be:
(5 + 9 + 4 + 6 + 1200)/5 = 244.8
As you can see, only one outlier caused the mean to change by a great amount. This is why the mean is most useful when we remove the outliers from the data.
3, 5, 6, 9Then, since the number of entries is
( 5 + 6 ) / 2.The benefit of the median is that it ignores outliers, and gives an accurate center of the data.
5, 5, 7, 8, 9, 1, 2, 5, 8Then the mode would be 5, since it is repeated three times.
The formula to calculate standard deviation is given below:
Here, we sum the square of the difference of each value from the mean, divide it by the total number of entries, and take the square root.
Data: 5, 6, 3, 2, 9, 10 Mean = (5+6+3+2+9+10) / 6 = 5.83 SD = sqrt( ( ( 5 - 5.83)^2 + ( 6 - 5.83 )^2 + .... + ( 10 - 5.83)^2 ) / 6 ) = 2.9107
Here, rainfall is the independent variable, and umbrellas sold are the dependent variable.
Regression can help us find out whether the variables have a strong or weak relationship. It can also help us to forecast values in the future.