Wayne's Github Page

A place to learn about statistics

Homework 3: Simple Linear Regression

Q1 - interpreting the formulas

Please answer the following TRUE/FALSE statements according to the formulas we have shown from the simple linear regression:

Q2 - the importance of the intercept in regression

Please generate some data as below:

n <- 50
x <- runif(n, 1, 5)
y <- 1.2 * x + rnorm(n, sd=0.4)

Q3 - directionality in regression

Imagine that the data from Q2, \(x\) is actually data from a well-calibrated machine (essentially 0 error) and \(y\) is the output from an uncalibrated machine measuring the same object (noisy and potentially biased). If you were asked to use statistics to “de-bias” the machine that produced \(y\), should you fit a regression with \(y\) or \(x\) as the dependent variable (please explain!)? De-bias here means that they’ve given up on calibrating the machine that generates \(y\) and wish that your regression model will act as a second stage process to correct any systematic bias in the data. So they’ll use the uncalibrated machine to obtain biased measurements, then wish to obtain values that look like they’re calibrated.

Hint: what is the objective of the regression?

Q4 - violating the regression assumption

Please generate data as below

n <- 200
x <- runif(n, 1, 5)
y <- 0.1 + 1.2 * x + rnorm(n, sd=x)

Q5 - Evaluating residual plots

For each of the following residual plots, please comment on which of the statements cannot be rejected and explain with at most 2 sentences. Please assume the residuals are from fitting a “linear line that may not be the regression line” to the data (e.g. no curves were used to obtain residuals) and there is only one independent variable \(x\).

bad residuals

Q6 - Translating assumptions

From the paper Global Evidence on Economic Preferences:

Q7 - Counter examples