In a world full of data, we must learn how to analyze it and extract meaningful information. Let's look at a few basic statistical methods that can be used to discover patterns and trends from raw data. 

# 1. Mean
The first method is where we find the **mean** of the data, more commonly known as the average. To calculate the mean, we add all the numbers in the data and divide the sum by the number of entries in the list. 

The mean of a data set gives us a general trend of where most of the values lie. However, it is prone to inconsistencies caused by outliers in the data. 

## Example

| Data  |
| - |
| 5  |
| 9  |
| 3  |
|  6 |

<pre>
Mean = ( 5 + 9 + 4 + 6 ) / 4 \
= 6
</pre>
This shows that the values in the data set lie close to `6`. However, if the same data consisted of a value `1200`, then the average would come out to be: 
<pre>
(5 + 9 + 4 + 6 + 1200)/5 = 244.8
</pre>

As you can see, only one outlier caused the mean to change by a great amount. This is why the mean is most useful when we remove the outliers from the data. 

# 2. Median
**Median** is the value that lies in the center of the data set. To calculate the median, we first sort the data in ascending order and then choose the middle value. If the number of entries is odd, then the median is simply the center value. If this number is even, we take the average of the two central values to get the median. 

## Example
For the data given above, if we wish to find the median, first we sort the data as follows: 
<pre>
3, 5, 6, 9
</pre>
Then, since the number of entries is #key# four: an even number, with no number in the middle #key#, we take an average of the two central values: 
<pre>
( 5 + 6 ) / 2.
</pre>
The benefit of the median is that it ignores outliers, and gives an accurate center of the data. 

# 3. Mode
**Mode** represents the most frequent value of a data set. If no values are repeated in the data, then there is no mode. 

## Example
For example, in the data above, there is no mode. However, if we have the following data: 
<pre>
5, 5, 7, 8, 9, 1, 2, 5, 8
</pre>
Then the mode would be 5, since it is repeated three times. 


# 4. Standard deviation
The next important measure is the **standard deviation**, which describes the spread of data around the mean. Greater standard deviation means that the data is highly variable. 

The formula to calculate standard deviation is given below:



Here, we sum the square of the difference of each value from the mean, divide it by the total number of entries, and take the square root. 

## Example


<pre>
Data:
5, 6, 3, 2, 9, 10

Mean = (5+6+3+2+9+10) / 6 = 5.83 

SD = sqrt( ( ( 5 - 5.83)^2 + ( 6 - 5.83 )^2 + .... + ( 10 - 5.83)^2 ) / 6 )

= 2.9107 
</pre>

# 5. Range 
The **range** is the difference between the highest and lowest point of the data. It gives us an idea of how the data is spread. 

# 6. Percentiles
A **percentile** is a value or a score below which a percentage of the data falls. For example, if you have 10 mangoes and the second heaviest mango weighs 150gm, 80% of the mangoes weigh less. 150gm is the 80th percentile weight. 

## Formula
To get this "80," we use the following equation:\
`( 10 - 2 / 10 ) * 100 `

# 7. Regression 
**Regression** shows the relationship between a dependent and an independent variable. It explains how changes in one variable affect the other. See the formula and example graph for regression below. 

## Formula


Here, rainfall is the *independent variable*, and umbrellas sold are the *dependent variable*. 

Regression can help us find out whether the variables have a strong or weak relationship. It can also help us to forecast values in the future. 

What are different statistical measures for analysis?

Mean, median, mode, standard deviation, range, percentiles, and regression are key statistical measures for data analysis.

What are different statistical measures for analysis?

1. Mean

Example

2. Median

Example

3. Mode

Example

4. Standard deviation

Example

5. Range

6. Percentiles

Formula

7. Regression

Formula

Graph example