How to find the mean, median, and mode in Python

Key takeaways:

  • Python offers built-in functions and libraries like statistics and numpy to simplify statistical computations such as mean, median, and mode.

  • The mean is obtained by dividing the total sum of values by the count of values.

  • The median is the central value in an ordered dataset or the average of two middle values if the dataset size is even. Sorting is essential when manually computing the median.

  • The mode represents the value(s) that appear most frequently in a dataset.

  • Custom implementations require handling cases like empty datasets and multiple modes.

  • The statistics module includes mean(), median(), and mode() for efficient calculations.

In data analysis, understanding the central tendency of a dataset is crucial. The mean, median, and mode are three key metrics that provide insights into the dataset’s characteristics. Python, with its robust libraries, makes calculating these metrics straightforward. In this Answer, we’ll explore how to compute the mean, median, and mode in Python using built-in functions and libraries like statistics.

Understanding the metrics

Let’s first understand what mean, mode, and median are. We’ll be working with a small dataset. Usually, these metrics are performed on large datasets with huge chunks of data, but we’ll use a small one for demonstration purposes. Here are the definitions for all the metrics we’ll cover in this Answer:

  • Mean: A dataset’s arithmetic average. It is computed by dividing the sum of values by the total number of values.

  • Median: The median is the midway value in a sorted dataset. In the case of an even number of values, the median is calculated as the mean of the two middle numbers in the dataset.

  • Mode: The values that appear most frequently in the dataset are called the mode.

Mean is the average of all values
Mean is the average of all values
1 of 3

Mean, median, and mode in Python

One may ask why use Python for computing these statistics? Well, the answer is simple: Python offers straightforward functions for statistical operations and modules like statistics or numpy provide efficient implementations. These tools handle large datasets with ease.

Let’s examine how to use Python’s statistics module to compute these metrics and discuss the implementation from scratch.

1. Calculate the mean using += and \ operators

Here’s how we can compute the mean without using any modules:

def calculate_mean(data):
if not data:
return None # Handle empty dataset
total = 0
for num in data:
total += num # Add each number to the total
return total / len(data) # Divide the sum by the length of the list
# Example
numbers = [1, 2, 3, 4, 5]
print("Mean:", calculate_mean(numbers))

In the code shown above:

  • Line 2: We add a condition to check if the dataset we got is empty. It won’t have a mean in this case, so we return None.

  • Line 7: We return the sum of all the elements in the dataset divided by the length of the dataset. This gives us the mean.

2. Calculate the mean using the statistics module

Let’s calculate the mean using the mean() method of the statistics module:

# Example dataset
numbers = [1, 2, 3, 4, 5]
# Using statistics module
import statistics
mean_value = statistics.mean(numbers)
print("Mean:", mean_value)

As we can see above, using the mean() method really makes things easier. We just need to provide it with the array, and it’ll compute the mean for us.

3. Calculate the median using len and sorted

Let’s calculate the median using len() and sorted() methods.

def calculate_median(data):
if not data:
return None # Handle empty dataset
sorted_data = sorted(data)
n = len(sorted_data)
mid = n // 2
if n % 2 == 0: # Even number of elements
return (sorted_data[mid - 1] + sorted_data[mid]) / 2
else: # Odd number of elements
return sorted_data[mid]
# Example
numbers = [1, 2, 3, 4, 5, 6]
print("Median:", calculate_median(numbers))

Let’s review the code above:

  • Line 2: This condition checks if the dataset is empty and returns None.

  • Line 4: We need to first sort the dataset using the sorted() method.

  • Lines 5–6: We get the length of the entire data and divide it by 2 to get the middle element.

4. Calculate the median using the statistics module

In the code shown below, there are two variations of the median() function that we used:

  • median_low(): Returns the lower of the two middle numbers when the dataset size is even. If the dataset size is odd, it returns the middle number (same as median()).

  • median_high(): Returns the higher of the two middle numbers when the dataset size is even. If the dataset size is odd, it returns the middle number (same as median()).

import statistics
numbers = [1, 2, 3, 4, 5]
median_value = statistics.median(numbers)
print("Median:", median_value)
print("Median low:", statistics.median_low(numbers))
print("Median high:", statistics.median_high(numbers))
print("Even List")
numbers_even = [1, 2, 3, 4, 5, 6]
median_value = statistics.median(numbers_even)
print("Median:", median_value)
print("Median low:", statistics.median_low(numbers_even))
print("Median high:", statistics.median_high(numbers_even))

Again, calculating the median is just as easy. Using the median() method really makes things easier. We just need to provide it with the array, and it’ll compute the median for us.

5. Calculate the mode with the for loop

Here’s how we can compute the mode without using any modules:

def calculate_mode(data):
if not data:
return None # Handle empty dataset
frequency = {}
for num in data:
frequency[num] = frequency.get(num, 0) + 1
max_freq = max(frequency.values())
modes = [key for key, val in frequency.items() if val == max_freq]
return modes if len(modes) > 1 else modes[0]
# Example
numbers_with_mode = [1, 2, 2, 3, 4]
print("Mode:", calculate_mode(numbers_with_mode))

Let’s look at how the code works:

  • Line 2: This condition checks if the dataset is empty and returns None.

  • Line 4: We’ll keep a dictionary to keep track of how many times each element appears in data.

  • Lines 5–6: This loop iterates over data and updates its count in frequency for every number.

  • Line 8: Here, we’re extracting the highest frequency of any value in the dictionary.

  • Line 9: Finally, we identify the elements in the dataset that appear most frequently and store them in a list. If multiple elements have the same highest frequency, all of them are included in the list.

6. Mode and multimode with the statistics module

In the code shown below, here are two variations of the mode() function that we used:

  • mode(): The mode is the most frequently occurring value in a dataset. It represents the number that appears the highest number of times.

  • multimode(): A dataset is multimodal when multiple values appear with the highest frequency. The function statistics.multimode() returns a list of all such values.

# Example dataset with repeated values
numbers_with_mode = [1, 2, 2, 3, 4]
# Using statistics module
mode_value = statistics.mode(numbers_with_mode)
print("Mode:", mode_value)
print("Multimode:", statistics.multimode(numbers_with_mode))

As expected, it’s as simple as using the mode() method.

Learn the basics with our engaging Learn Python course!

Start your coding journey with Learn Python, the perfect course for beginners! Whether exploring coding as a hobby or building a foundation for a tech career, this course is your gateway to mastering Python—the most beginner-friendly and in-demand programming language. With simple explanations, interactive exercises, and real-world examples, you’ll confidently write your first programs and understand Python essentials. Our step-by-step approach ensures you grasp core concepts while having fun. Join now and start your Python journey today—no prior experience is required!

Conclusion

Understanding the mean, median, and mode is essential for analyzing data effectively. Python provides multiple ways to calculate these metrics, from manual implementations using loops and operators to built-in functions in the statistics module. The simplicity and efficiency of Python’s libraries make it an excellent choice for statistical computations. Whether working with small or large datasets, these methods help extract meaningful insights effortlessly.

Frequently asked questions

Haven’t found what you were looking for? Contact Us


What if there is no mode?

If there is no mode, the dataset has no repeated values, and the mode can be returned as None or an indication of no repetition.


How do we find median formula?

The median formula is:

  • Sort the dataset, and for an odd count, it’s the middle value.
  • For an even count, it’s the average of the two middle values.

What are some other statistics?

Other statistics include range, variance, standard deviation, quartiles, percentiles, and interquartile range.


What is the median of 21, 62, 66, 66, 79, 28, 63, 48, 59, 94, and 19?

To find the median, first, sort the numbers in ascending order:
19, 21, 28, 48, 59, 62, 63, 66, 66, 79, and 94

As there are 11 numbers (odd count), the median is the middle value, which is the 6th number:
Median = 62


What is mode() in Python?

The mode() function in Python finds the most frequently occurring value in a dataset. It is available in the statistics module. For example:

from statistics import mode

numbers = [1, 2, 2, 3, 4]
print(mode(numbers))  # Output: 2

If multiple values have the same highest frequency, mode() returns the first one found.


Free Resources