What is RANSAC?

Random sample consensus (RANSAC) is an iterative parameter estimation approach used to fit models to the data that contains outliersDistance of data points is greater than the specified threshold.. These outliers significantly affect the mean, regression coefficients, and distort the distributions of the data. Thus, it helps in estimating the parameters with high accuracy even when the dataset contains a large number of outliers.

Examples include real-world data recorded from sensors that has some missing values and contains outliers which are not quite useful to draw potential conclusions from. This is where RANSAC algorithm comes into play.

How does it work?

Unlike other algorithms which estimate the initial solution using as much of the data available as possible, RANSAC works with the smallest sample possible from the available data. It continues taking more data points with a consistent pattern and outputs an array of inliersDistance of data points is within the defined threshold. which can be used to perform useful operations.

A detailed explanation of how the RANSAC algorithm works is given below.

import random
import numpy as np
import matplotlib.pyplot as plt
# Line of best fit
def fit_line(point_1, point_2):
    x = np.array([point_1[0], point_2[0]])
    y = np.array([point_1[1], point_2[1]])
    slope, intercept = np.polyfit(x, y, 1)
    return slope, intercept
# Distance calculation from the line of best fit
def calculate_distance(point, model):
    x, y = point
    slope, intercept = model
    distance = np.abs(slope * x - y + intercept) / np.sqrt(slope ** 2 + 1)
    return distance
# Algorithm to determine inliers
def ransac(data, iterations, threshold):
    best_model = None
    best_inliers_list = []
    inliers = []
    while(iterations):
        point_1, point_2 = random.sample(data, 2)
        current_model = fit_line(point_1, point_2)
        for value in data:
          if calculate_distance(value, current_model) <= threshold:
              if value not in inliers:
                inliers.append(value)
        if len(inliers) > len(best_inliers_list):
            best_model = current_model
            best_inliers_list = inliers
        if len(best_inliers_list) >= len(data) * 0.8:
            break
    return best_model, best_inliers_list
# dataset
data = [(9, 1), (5, 6), (3, 9), (7, 5), (2, 10), (4, 7), (8, 3), (6, 2), (1, 2), (5, 10)]
iterations = 100
threshold = 2.0
# Output processing
model, inliers = ransac(data, iterations, threshold)
print("Equation of line of best fit: ", model)
print("Inliers: ", inliers)
outliers = []
for point in data:
    if point not in inliers:
        outliers.append(point)
print("Outliers: ", outliers)
x = []
y = []
for point in data:
  x.append(point[0])
  y.append(point[1])
x = np.array(x)
y = np.array(y)
# Plot to show best fit and inliers
plt.scatter(x, y, color = "blue")
plt.plot(x, model[0]*x + model[1], color='red')
plt.scatter([point[0] for point in inliers], [point[1] for point in inliers], color='green')
plt.xlabel('X')
plt.ylabel('Y')
plt.title("Line of best fit with inliers")
plt.savefig('./output/plot.png')
plt.show()

ransac() is a user-defined method that takes three parameters.

Parameters

data: the original dataset

iterations: the number of iterations to be performed in the RANSAC algorithm.

threshold: specifies a distance limit to determine if a point is an inlier or outlier.

Explanation

Line 23: Here, we define the ransac() method which takes three parameters as input. It returns the best-fit line and inliers to the variables named best_model and best_inliers_list.
Line 29–30: random.sample(data, 2) chooses two data points randomly from the dataset, fit_line() returns the slope and y-intercept of the fitted line.
Line 32–35: The for loop iterates over each data point from the dataset, and calculates the distance from the line of the best-fit through calculate_distance() method. It appends the data point to the list of inliers if the distance is below the specified threshold.
Line 37–39: These lines check if the current iteration of the algorithm finds out more inliers than the previous iteration by comparing the length of inliers list by the length of best_inliers_list. It sets the current_model as the best_model and the best_inliers_list as the current_inliers if the current iteration has more inliers than the previous one.
Line 41–42: This check serves as a breaking point for the algorithm. It indicates that the algorithm has found the best_inliers_list. The condition assures that the number of data points in the best_inliers_list is less than or equal to 80% of the total data points.
Line 55: Here we pass the dataset as input, and calls the ransac() method to determine the inliers.
Line 56–62: Here, we output the line of best fit, the inliers determined by the ransac(), and the outliers.
Line 75–82: Here, we use plt.show() to plot a graph that shows inliers and outliers. The data points shown in green are inliers and the data points shown in blue are outliers.

Free Resources