How to get a random node from linked list in a single pass

Problem statement

Given a singly linked list of unknown length, return a random node from the list by only one traversal. Each node must have the same probability of being chosen.

This problem is a popular interview question. The approach discussed below is a useful technique in data streaming applications when data is too large to fit in the memory.

Description of the algorithm

We will use the reservoir sampling technique with a reservoir size equal to 1.

The algorithm consists of the following steps:

Initialize the result node with the head node of the list.
Track number of the processed elements. After the first step $i = 1$ , the head node is processed.
Iterate over the elements in the list:

Generate uniformly distributed random number $r \in [0, i];$
If the generated number $r$ equals $0$ , then select the current node as a result node.
Increment the number of processed elements: $i = i + 1;$
Move to the next element of the list.

#include <random>
//get random number in a given range
int getRandomNumber(int left, int right)
{
    //C++11's random number generation facilities 
    std::uniform_int_distribution<> distrib(left, right);
    std::mt19937 randomGen;
    return distrib(randomGen);
}
//get random node from a given linked list in a single pass
ListNode* getRandomNode(ListNode* head) 
{
    ListNode* resultNode = head;
    int i = 1;
    ListNode* currNode = head->next;
    while(currNode != nullptr)
    {
        int randomInd = getRandom(0, i);
        if(randomInd == 0) 
          resultNode = currNode;
        ++i;
        currNode = currNode->next;
    }
    return resultNode;
}

Explanation

Let $n$ be the length of the given list, and prove that the probability of $i$ -th node (which has an index $i-1$ ) to be selected as a result node is $1/n$ .

During iteration over the $i$ -th node, there are already $i-1$ processed elements. We get a random number from $0$ to $i - 1$ , so there are $i$ values to be chosen with equal probability.

The probability that generated random number equals $0$ is $1/i$ , so the $i$ -th node is selected with probability $1/i$ .

At the end of the whole process, the $i$ -th node will remain selected if it is not replaced by further nodes during iteration, which means further nodes haven't been selected.

For $(i+1)$ -th node, the probability of not being selected is $1 - 1/(i+1)$ , for $(i+2)$ -th node, it is, $(i+2) n$ - th node $- \; 1 - 1/n$ . The probability that $i$ -th node remains selected after further iteration is a product of these probabilities: $(1 - \frac{1}{i+1})(1 - \frac{1}{i+2}) \dots (1 - \frac{1}{n}) = \frac{i}{n}.$