How to choose randomly from a sequence of unknown length. The typical use is to choose a random line from a file.
Detailed description:
You are given a file (and you do not know how big the file is, nor how big the lines in the file are). Write an algorithm that will generate a random number (which will be used to index into the file) such that the output of lines will be roughly proportional? That is if the file contained 4 lines, and if I ran the program 1 million times I should get each line printed approximately 250K times.
Solution:
import random def random_element(seq): """Return an element chosen at random from `seq`.""" it = None for n, elem in enumerate(seq): if random.randint(0, n) == 0: it = elem return it
To see why this function works, consider it inductively. For the initial case, we'll always pick the first element as "it", since a random int between 0 and 0 will be 0. So after one element, we've chosen uniformly from the one element we've seen.
For the inductive case, imagine we've read N-1 elements already, and "it" is a random element chosen uniformly from those N-1 elements. The chance that the Nth element should be chosen instead is 1/N, because at this point, the chance that any particular line should be "it" is 1/N. So we choose a random number, and if it's the 1/N outcome, we take the Nth line as "it." In the case that we don't take the new line, which is (N-1)/N, the old line is kept. It was chosen randomly from the N-1 lines that preceded this one, so each line has a final chance of ((N-1)/N)/(N-1), or 1/N, or being chosen. Therefore the new line and each of the old lines is equally likely, so the distribution is uniform.
Since the initial case meets our uniform-selection criterion, and we've shown that if N-1 satisfies it then N satisfies it, then by induction, our selection will be uniformly random for any N.
Complexity:
time - O(n)Links and credits:
space - O(1)
http://nedbatchelder.com/blog/201208/selecting_randomly_from_an_unknown_sequence.html
No comments:
Post a Comment