Wayne's Github Page

A place to learn about statistics

Homework 3 - random variables ARE functions

Context: modeling data with random variables

In your introductory statistics course, you likely came across probability distributions. These are useful mathematical tools that summarize complicated datasets efficiently using parameters. For example, if you claim some data is Normally distributed, then by simply providing the mean and variance of the data (i.e. the parameters for the Normal distribution but also common summary statistics for most distributions) could give people a strong sense of how the data is distributed. For example, IQ scores are Normal and are designed so the mean is 100 with a standard deviation (SD) around 15 (i.e. variance is 15^2). These parameters along with the Normal distribution allow us to know various facts like that half of the people have IQ above 100, 68% of the population should have IQ between 85 and 115, and less than 2.5% of the people have IQs above 130.

R has several built-in functions that draw random samples from well-known distributions given parameters:

Q0 Knowing R’s built-in random functions

With the help of Wikipedia, you should be able to find how the parameters of the various distributions relate to summary statistics like the mean and variance of those distributions.

Side Note:

Q1 Creating a multi-modal random variable

A difficult type of data to describe are datasets with multiple modes or peaks in their distribution. Height across men and women is one such example. One popular probability distribution to describe such data is the mixture gaussian distribution which models the different peaks as different Normal distributions overlayed on one another. For example, the height for men and women each have their own Normal distribution, with different means and SDs, which creates two different peaks when you put the data together.

Please write a function rgaussmix() that can generate samples from a mixture gaussian distribution. The inputs (in order) should be:

To draw a single sample from the Mixture Gaussian Distribution with k modes.

The function should return a numeric vector of length n where each element is a realization from the Gaussian mixture specified.

Requirements for the function:

Please submit the code for this quesiton.

Q2 Testing out the function

Please apply your function above with the following inputs and show the desired output:

Q3 Central Limit Theorem

Here we are going to simulate the central limit theorem, something even more miraculous than the Law of Large Numbers!

We want to demonstrate that the CLT works on complicated distributions, like a Gaussian mixture. Imagine we had a Gaussian mixture where the first and second components are Normal(10, sd=2) and Normal(18, sd=2) respectively, and the first component is 4 times more likely to appear than the second component. By demonstrate, we mean that the sample averages will follow a bell curve and the width of the bell curve will decrease with larger sample sizes.

You are expected to leverage your function from Q1.

Please demonstrate the CLT works by showing the following:

Please show all code and all graphs.

Side note: notice how the distribution of the sample average should be “bell-shaped” but no longer bimodal!!!