Random sample consensus (RANSAC) is an iterative parameter estimation approach used to fit models to the data that contains
Examples include real-world data recorded from sensors that has some missing values and contains outliers which are not quite useful to draw potential conclusions from. This is where RANSAC algorithm comes into play.
Unlike other algorithms which estimate the initial solution using as much of the data available as possible, RANSAC works with the smallest sample possible from the available data. It continues taking more data points with a consistent pattern and outputs an array of
A detailed explanation of how the RANSAC algorithm works is given below.
import randomimport numpy as npimport matplotlib.pyplot as plt# Line of best fitdef fit_line(point_1, point_2):x = np.array([point_1[0], point_2[0]])y = np.array([point_1[1], point_2[1]])slope, intercept = np.polyfit(x, y, 1)return slope, intercept# Distance calculation from the line of best fitdef calculate_distance(point, model):x, y = pointslope, intercept = modeldistance = np.abs(slope * x - y + intercept) / np.sqrt(slope ** 2 + 1)return distance# Algorithm to determine inliersdef ransac(data, iterations, threshold):best_model = Nonebest_inliers_list = []inliers = []while(iterations):point_1, point_2 = random.sample(data, 2)current_model = fit_line(point_1, point_2)for value in data:if calculate_distance(value, current_model) <= threshold:if value not in inliers:inliers.append(value)if len(inliers) > len(best_inliers_list):best_model = current_modelbest_inliers_list = inliersif len(best_inliers_list) >= len(data) * 0.8:breakreturn best_model, best_inliers_list# datasetdata = [(9, 1), (5, 6), (3, 9), (7, 5), (2, 10), (4, 7), (8, 3), (6, 2), (1, 2), (5, 10)]iterations = 100threshold = 2.0# Output processingmodel, inliers = ransac(data, iterations, threshold)print("Equation of line of best fit: ", model)print("Inliers: ", inliers)outliers = []for point in data:if point not in inliers:outliers.append(point)print("Outliers: ", outliers)x = []y = []for point in data:x.append(point[0])y.append(point[1])x = np.array(x)y = np.array(y)# Plot to show best fit and inliersplt.scatter(x, y, color = "blue")plt.plot(x, model[0]*x + model[1], color='red')plt.scatter([point[0] for point in inliers], [point[1] for point in inliers], color='green')plt.xlabel('X')plt.ylabel('Y')plt.title("Line of best fit with inliers")plt.savefig('./output/plot.png')plt.show()
ransac()
is a user-defined method that takes three parameters.
data
: the original dataset
iterations
: the number of iterations to be performed in the RANSAC algorithm.
threshold
: specifies a distance limit to determine if a point is an inlier or outlier.
Line 23: Here, we define the ransac()
method which takes three parameters as input. It returns the best-fit line and inliers to the variables named best_model
and best_inliers_list
.
Line 29–30: random.sample(data, 2)
chooses two data points randomly from the dataset, fit_line()
returns the slope and y-intercept of the fitted line.
Line 32–35: The for
loop iterates over each data point from the dataset, and calculates the distance from the line of the best-fit through calculate_distance()
method. It appends the data point to the list of inliers if the distance is below the specified threshold.
Line 37–39: These lines check if the current iteration of the algorithm finds out more inliers than the previous iteration by comparing the length of inliers
list by the length of best_inliers_list
. It sets the current_model
as the best_model
and the best_inliers_list
as the current_inliers
if the current iteration has more inliers than the previous one.
Line 41–42: This check serves as a breaking point for the algorithm. It indicates that the algorithm has found the best_inliers_list
. The condition assures that the number of data points in the best_inliers_list
is less than or equal to 80% of the total data points.
Line 55: Here we pass the dataset as input, and calls the ransac()
method to determine the inliers.
Line 56–62: Here, we output the line of best fit, the inliers determined by the ransac()
, and the outliers.
Line 75–82: Here, we use plt.show()
to plot a graph that shows inliers and outliers. The data points shown in green are inliers and the data points shown in blue are outliers.
Free Resources