How to get a random node from linked list in a single pass

Problem statement

Given a singly linked list of unknown length, return a random node from the list by only one traversal. Each node must have the same probability of being chosen.

This problem is a popular interview question. The approach discussed below is a useful technique in data streaming applications when data is too large to fit in the memory.

Description of the algorithm

We will use the reservoir sampling technique with a reservoir size equal to 1.

The algorithm consists of the following steps:

  1. Initialize the result node with the head node of the list.
  2. Track number of the processed elements. After the first step i=1i = 1, the head node is processed.
  3. Iterate over the elements in the list:
    • Generate uniformly distributed random number r[0,i];r \in [0, i];
    • If the generated number rr equals00, then select the current node as a result node.
    • Increment the number of processed elements: i=i+1;i = i + 1;
    • Move to the next element of the list.

After these steps, the resulting node will be a random node chosen with equal probability from a given list. See the proof of correctness below:

Complexity analysis

Letnn be the length of the given linked list:

  • Time complexity is O(n)O(n)as we iterate over the whole list.
  • The Space complexity is O(1)O(1)as constant auxiliary space is used.

Code example

Let's look at the code below:

#include <random>
//get random number in a given range
int getRandomNumber(int left, int right)
{
//C++11's random number generation facilities
std::uniform_int_distribution<> distrib(left, right);
std::mt19937 randomGen;
return distrib(randomGen);
}
//get random node from a given linked list in a single pass
ListNode* getRandomNode(ListNode* head)
{
ListNode* resultNode = head;
int i = 1;
ListNode* currNode = head->next;
while(currNode != nullptr)
{
int randomInd = getRandom(0, i);
if(randomInd == 0)
resultNode = currNode;
++i;
currNode = currNode->next;
}
return resultNode;
}

Explanation

Let nn be the length of the given list, and prove that the probability of ii-th node (which has an index i1i-1) to be selected as a result node is 1/n1/n.

  • During iteration over the ii-th node, there are already i1i-1 processed elements. We get a random number from 00 to i1i - 1, so there are ii values to be chosen with equal probability.
  • The probability that generated random number equals 00 is 1/i1/i, so the ii-th node is selected with probability 1/i1/i.
  • At the end of the whole process, the ii-th node will remain selected if it is not replaced by further nodes during iteration, which means further nodes haven't been selected.
  • For (i+1)(i+1)-th node, the probability of not being selected is 11/(i+1)1 - 1/(i+1), for (i+2)(i+2)-th node, it is, (i+2)n(i+2) n- th node   11/n- \; 1 - 1/n . The probability that ii-th node remains selected after further iteration is a product of these probabilities:(11i+1)(11i+2)(11n)=in.(1 - \frac{1}{i+1})(1 - \frac{1}{i+2}) \dots (1 - \frac{1}{n}) = \frac{i}{n}.
  • So, the probability for ii-th node to be selected as a result node is:

P(i-th  is  selected)P(further  nodes  are  not  selected)=1iin=1n.P(i\text{-}th \; is \; selected)\cdot P(further \; nodes\;are \; not\; selected)= \frac{1}{i} \cdot \frac{i}{n}= \frac{1}{n}.

This proves that each node is returned with equal probability 1/n1/n.

Free Resources