Color extraction from image

In industries such as graphic design, data visualization, and image analysis, color palette extraction can be a helpful tool. What we focus on in this Answer is to generalize the colors from any picture into five categories which can give a basic color palette of that image using the Pillow library in Python.

What is Pillow?

Pillow is a Python imaging library (PIL) forkA fork is a copy of a software project, often created to continue development independently., providing image processing capabilities. It facilitates opening, manipulating, and saving image files in various formats. Pillow offers functionalities like resizing, filtering, and color manipulation. With its user-friendly interface, it's widely used for computer vision, web development, and data analysis applications. This typically comes under the umbrella of computer vision.

Computer vision is one of the most crucial advancements in artificial intelligence. It provides us with another medium to communicate with the computer. Computer vision serves as the foundational concept that supports real-life applications such as autonomous cars, improved medical diagnosis, and facial recognition.

Now let's focus on the implementation of our program using the Pillow library

Program overview

Before getting into the details of the program, we will show a brief overview of what we do, step by step. The steps include:

  • Open and convert the image to RGB mode.

  • Apply k-means clustering to group similar colors together.

  • Calculate color occurrences in each cluster.

  • Sort colors based on occurrences to identify dominant colors.

  • Visualize the dominant colors in a color palette.

Libraries used

For the actual implementation of our program, we will be utilizing different libraries, such as:

  • Pillow

pip install pillow
  • NumPy

pip install numpy
  • OpenCV

pip install opencv-python
  • scikit-learn

pip install scikit-learn

Code implementation

The following program will be executed on the following image from which the most dominant colors will be extracted.

import cv2
import numpy as np
from PIL import Image
from sklearn.cluster import KMeans

image_path = "pic4.jpg"
num_colors = 5
num_clusters = 5

def get_dominant_colors(image_path, num_colors=10, num_clusters=5):
    image = Image.open(image_path)
    image = image.resize((200, 200))
    image = image.convert('RGB')
    img_array = np.array(image)
    pixels = img_array.reshape(-1, 3)
    kmeans = KMeans(n_clusters=num_clusters, random_state=0)
    labels = kmeans.fit_predict(pixels)
    centers = kmeans.cluster_centers_
    color_counts = {}
    for label in np.unique(labels):
        color = tuple(centers[label].astype(int))
        color_counts[color] = np.count_nonzero(labels == label)
    sorted_colors = sorted(color_counts.items(), key=lambda x: x[1], reverse=True)
    dominant_colors = [color for color, count in sorted_colors[:num_colors]]
    color_occurrences = [count for color, count in sorted_colors[:num_colors]]
    dominant_colors_hex = ['#%02x%02x%02x' % color for color in dominant_colors]
    return dominant_colors_hex, color_occurrences

dominant_colors, color_occurrences = get_dominant_colors(image_path, num_colors, num_clusters)

print("Dominant Colors:")
print(dominant_colors)

palette_height = 100
palette_width = 100 * num_colors
palette = np.zeros((palette_height, palette_width, 3), dtype=np.uint8)

start_x = 0
for color_hex in dominant_colors:
    color_rgb = tuple(int(color_hex[i:i+2], 16) for i in (1, 3, 5))
    end_x = start_x + 100
    palette[:, start_x:end_x] = color_rgb
    start_x = end_x

palette_image = Image.fromarray(palette)
palette_bgr = cv2.cvtColor(np.array(palette_image), cv2.COLOR_RGB2BGR)

cv2.imshow("Palette", palette_bgr)
cv2.waitKey(0)
cv2.destroyAllWindows()

Code explanation

Lines 1–4: Here, we import the required modules..

Line 6: Set the path to the input image using the variable image_path.

Line 7: Define the variable num_colors and set it to 5, representing the number of dominant colors to extract from the image.

Line 8: Define the variable num_clusters and set it to 5, representing the number of clusters to use in the k-means clustering.

Line 10: Define a function get_dominant_colors that takes image_path, num_colors, and num_clusters as input parameters.

Line 11: Inside the get_dominant_colors function, load the image using Pillow's Image.open function and store it in the variable image.

Line 12–13: Resize the image to a smaller size using the resize method and converting the image to RGB mode using convert method.

Line 14: Convert the image to a NumPy array using the np.array function and store it in img_array.

Line 15: Flatten the NumPy array into a list of pixels, where each pixel is represented as an RGB triplet. This is achieved using the reshape method with -1 as the first dimension, which means NumPy will infer the size based on the other dimensions. The flattened array is stored in pixels.

Line 16: Create a KMeans object kmeans with num_clusters as the number of clusters to create and random_state=0 for reproducibility.

Line 17: Perform k-means clustering on the flattened pixel list using scikit-learn fit_predict method of the kmeans object. It assigns each pixel to one of the num_clusters clusters and returns the cluster labels, which are stored in the variable labels.

Line 18: Get the cluster centers, which represent the dominant colors found by the k-means clustering algorithm, and store them in the variable centers.

Line 19: Create an empty dictionary color_counts to store the count of each color in the clusters.

Line 20 – 22: Loop through the unique labels from the clustering result, calculate the count of each color in the clusters, and store the colors and their occurrences in the color_counts dictionary.

Line 23: Sort the colors based on their occurrences in descending order using the sorted function.

Line 24 – 26: Extract the dominant colors and their occurrences from the sorted list, taking only the first num_colors items. The dominant colors are stored in the dominant_colors list, and their corresponding occurrences are stored in the color_occurrences list. The dominant colors are also converted to hexadecimal format and stored in the dominant_colors_hex list.

Line 29: Call the get_dominant_colors function with the given input image path, num_colors, and num_clusters, and store the results in dominant_colors and color_occurrences.

Line 32: Print the list of dominant colors using the print function.

Line 34: Set the height of the color palette image to 100 pixels using the variable palette_height.

Line 35: Calculate the width of the color palette image based on the number of colors and set it to palette_width. Each color block will have a width of 100 pixels.

Line 36: Create an empty NumPy array palette with dimensions (palette_height, palette_width, 3), where the last argument represents RGB channels.

Line 38: Set the variable start_x to 0, which will be used to position the colored blocks in the palette array.

Line 39 – 43: Loop through the dominant colors, convert them from hexadecimal to RGB format, and fill the palette array with colored blocks for each dominant color. Each colored block has a size of 100x100 pixels.

Line 45: Convert the NumPy array palette back to a Pillow image using the Image.fromarray function, creating the palette_image.

Line 46: Convert the palette_image from RGB format to BGR format using the cv2.cvtColor function. OpenCV uses BGR order instead of RGB.

Line 48 – 50: Show the color palette using OpenCV's cv2.imshow function. The window will remain open until any key is pressed.

Expected output of the above program.
Expected output of the above program.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved