Wayne's Github Page

A place to learn about statistics

Using random functions

Random functions have a unique role in statistical computing. We can use these to simulate/draw samples from a complex process then study its behavior. Examples include seeing if the median follows a central limit theorem, approximating the finite sample distribution instead of relying on asymptotic theory to make inference. Classic sampling when the population is too large is used for surveys, in machine learning, and in optimization methods.

The random functions are conceptually identitcal to random variables, repeated evaluations with the same input will return potentially different values from a distribution. For example, mathematically we migth say \(Y \sim Normal(0, 1)\) and our data is \(Y_1, \dots, Y_n\) then programmatically we could write (the example below is an attempt to keep notation consistent):

from random import gauss

n = 5
def Y():
    return gauss(0, 1)
Ys = [Y() for _ in range(n)]

Common functionalities in random functions

There are a few basic functionalities one should be aware of with random functions. These are true for most modules that support random operations.

Notable packages with random functions

Random vs hash functions

A truly random continuous function, when called twice, will have 0 probability of returning the same value.

import random

# Get 3 different random values with the same input
rand_perc = [random.uniform(0, 1) for _ in range(3)]

Hash functions although map their inputs to a seeming random output, are consistent in returning the same output when provided with the same input. This is useful with ID generation or AB testing assignment when you want the same user to be assigned to the same treatment/control group as before.

import hashlib

input = 'wayne lee - homepage v2'


def str2perc(input):
    output = hashlib.md5(input.encode())
    # a random but predictable hex value
    output_hex = output.hexdigest()
    # converting the hex value into a percentage
    output_perc = int(output_hex, 16) / 16**len(output_hex)
    return output_perc


# Get 3 identical random values with the same input
hash_perc = [str2perc(input) for _ in range(3)]

Repeating the same random numbers by fixing the seed and more

Sometimes we want to ensure our results are reproducible, this is when we fix the seed and the order we call the random functions in.

import random

random.seed('hello world')
print([random.uniform(0, 1) for _ in range(3)])

random.seed('hello world')
print([random.uniform(0, 1) for _ in range(3)])

random.seed('hello world')
#_ = random.shuffle([1, 2, 3])
_ = random.uniform(0, 1)
print([random.uniform(0, 1) for _ in range(3)])

In the example above, notice how the first 2 random sequences are identical and the last one is simply shifted. The key is that in addition to fixing the random seeds, one needs to be careful about the other calls to the random module in order to reproduce the same results. This is not suitable for production when multiple services are acting asynchronously.