What is skewed data?

Skewed data describes an uneven distribution of data in which the values are not spread out evenly. The majority of the data, therefore, falls on either the left or right side of the distribution accordingly.

When we deal with skewed data, the following terms help us understand the data distribution:

  • Mean: It is the average value of our distribution. It is calculated by dividing the sum of all values by the total number of values.

  • Median: It is the middlemost value of our data when sorted in increasing order.

  • Mode: It is the value that occurs most frequently in our data.

  • Standard deviation: It tells us how much variation exists between the data and the mean.

Types of skewed data

There are two main types of skewness: positive skewness (right skewness) and negative skewness (left skewness).

Right-skewed data

In the right-skewed data, the tail of the distribution is extended towards the right side of the graph in the positive x-axis. That is why it is also called positively skewed distribution.

Note: For right-skewed data: mean > median > mode

Example

In the following example, the x-axis represents salary amounts, while the y-axis represents the count of employees who have those salaries.

Salary

Total employees with the salary

$200

12

$400

18

$600

14

$800

10

$1000

7

$1200

5

$1400

4

$1600

3

$1800

2

$2000

1

The bar graph of our data is following:

Right-skewed distribution

Explanation

The height of each bar indicates how many data points exist at that particular value. We can see that there are more data points located on the left side of the graph towards the smaller values of the x-axis. As we move towards larger values on the x-axis, the number of data points decreases significantly, resulting in a tail that extends towards the right side of the graph. So we conclude that there are significantly fewer employees having large salary values.

Left-skewed data

In the left-skewed data, the tail of the distribution is extended towards the left side of the graph in the negative x-axis. That is why it is also called negatively skewed distribution.

Note: For left-skewed data: mean < median < mode

Example

In the following example, the x-axis represents salary amounts, while the y-axis represents the count of employees who have those salaries.

Salary

Total employees with the salary

$200

1

$400

2

$600

3

$800

4

$1000

7

$1200

8

$1400

8

$1600

10

$1800

14

The bar graph of our data is following:

Left skewed distribution

Explanation

The height of each bar indicates how many data points exist at that particular value. We can see that there are fewer data points located on the left side of the graph towards the smaller values of the x-axis. As we move towards larger values on the x-axis, the number of data points increases significantly. As there are fewer data points on the left side of the graph, a tail is extended towards the left side of the graph. So we conclude there are significantly fewer employees having small salary values.

Causes of skewed data

Skewed data can be caused by various factors, including:

  • Outliers: Outliers are extreme values that differ from most data points. When outliers exist in our data, our data is skewed.

  • Measurement errors: Inaccuracies or errors in the measurement process can introduce skewness to our data.

  • Natural variability: In some cases, the real-world examples naturally have more data points on one side than the other resulting in skewness.

  • Transformation or data manipulation: Skewness can also be introduced when we perform certain data transformations or manipulations.

Implication of skewness

Skewness helps us find outliers in our data. It explains the behavior of data which is extremely useful in some fields. For instance, it finds massive use in accounting and finance to assess risks and determine potential loss areas. In machine learning and artificial intelligence, skewed data can help us observe biased predictions, poor generalization, and challenges in evaluation. Moreover, it is important in achieving fair and accurate predictions regarding AI/ ML models.

Conclusion

Skewed data refers to a situation where the distribution of values in data is not balanced and can create difficulties in analyzing and drawing reliable conclusions from the data. Understanding skewed data is important for accurate data analysis and interpretation.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved