What is skewed data?

Skewed data describes an uneven distribution of data in which the values are not spread out evenly. The majority of the data, therefore, falls on either the left or right side of the distribution accordingly.

When we deal with skewed data, the following terms help us understand the data distribution:

Mean: It is the average value of our distribution. It is calculated by dividing the sum of all values by the total number of values.
Median: It is the middlemost value of our data when sorted in increasing order.
Mode: It is the value that occurs most frequently in our data.
Standard deviation: It tells us how much variation exists between the data and the mean.

Types of skewed data

There are two main types of skewness: positive skewness (right skewness) and negative skewness (left skewness).

Right-skewed data

In the right-skewed data, the tail of the distribution is extended towards the right side of the graph in the positive x-axis. That is why it is also called positively skewed distribution.

Note: For right-skewed data: mean > median > mode

Example

In the following example, the x-axis represents salary amounts, while the y-axis represents the count of employees who have those salaries.

Explanation

The height of each bar indicates how many data points exist at that particular value. We can see that there are more data points located on the left side of the graph towards the smaller values of the x-axis. As we move towards larger values on the x-axis, the number of data points decreases significantly, resulting in a tail that extends towards the right side of the graph. So we conclude that there are significantly fewer employees having large salary values.

Left-skewed data

In the left-skewed data, the tail of the distribution is extended towards the left side of the graph in the negative x-axis. That is why it is also called negatively skewed distribution.

Note: For left-skewed data: mean < median < mode

Example

In the following example, the x-axis represents salary amounts, while the y-axis represents the count of employees who have those salaries.

Explanation

The height of each bar indicates how many data points exist at that particular value. We can see that there are fewer data points located on the left side of the graph towards the smaller values of the x-axis. As we move towards larger values on the x-axis, the number of data points increases significantly. As there are fewer data points on the left side of the graph, a tail is extended towards the left side of the graph. So we conclude there are significantly fewer employees having small salary values.

Causes of skewed data

Skewed data can be caused by various factors, including:

Outliers: Outliers are extreme values that differ from most data points. When outliers exist in our data, our data is skewed.
Measurement errors: Inaccuracies or errors in the measurement process can introduce skewness to our data.
Natural variability: In some cases, the real-world examples naturally have more data points on one side than the other resulting in skewness.
Transformation or data manipulation: Skewness can also be introduced when we perform certain data transformations or manipulations.

Implication of skewness

Skewness helps us find outliers in our data. It explains the behavior of data which is extremely useful in some fields. For instance, it finds massive use in accounting and finance to assess risks and determine potential loss areas. In machine learning and artificial intelligence, skewed data can help us observe biased predictions, poor generalization, and challenges in evaluation. Moreover, it is important in achieving fair and accurate predictions regarding AI/ ML models.

Conclusion

Skewed data refers to a situation where the distribution of values in data is not balanced and can create difficulties in analyzing and drawing reliable conclusions from the data. Understanding skewed data is important for accurate data analysis and interpretation.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

Salary	Total employees with the salary
$200	12
$400	18
$600	14
$800	10
$1000	7
$1200	5
$1400	4
$1600	3
$1800	2
$2000	1

Salary	Total employees with the salary
$200	1
$400	2
$600	3
$800	4
$1000	7
$1200	8
$1400	8
$1600	10
$1800	14