** Quiz 2- Answers:
* Question 1
* We have a model that explains the price of a house by its distance
* from a garbage incinerator and its "quality".
* price = b0 + b1.distance + b2.quality u.
* Quality is hard to measure, so we consider a simple regression
* of price on distance. This leaves quality in the error term.
* In this model we may have an omitted variable bias problem for two reasons:
* 1) Quality belongs to the theoretical model
* 2) Quality may be correlated with an included variable (in this case with distance).
* If the location of the incinerator were determined randomly,
* there would be no correlation bw distance and quality.
* We would have Cov(distance,u)=0 and our estimate of b1
* would be consistent.
* In that case we would not have an omitted variable problem.
* But, it is possible that the incinerator was built in an area
* where housing quality is already low, thereby creating a correlation
* bw distance and quality and causing our estimate of b1 to be inconsistent.
* Question 2
use "C:\Users\Seren\Downloads\mus03data.dta", clear
* if you want to suppress results, type quietly reg
*quietly reg ltotexp suppins phylim actlim totchr age female income in 1/500
reg ltotexp suppins phylim actlim totchr age female income in 1/500
estimates store reg1
reg ltotexp suppins phylim actlim totchr age female income in 1/500, robust
estimates store reg2
reg ltotexp suppins phylim actlim totchr age female income in 1/500, vce(cluster totchr)
estimates store reg3
* how do the t-statistics change across the three estimates of standard errors?
estimates table reg1 reg2 reg3, b(%9.4f) se stats (N r2 F)
corr ltotexp suppins phylim actlim totchr age female income in 1/500
* When you restrict the sample to the first 500 obs, you have a selected sample.
* You restrict yourself to the observations with the lowest ltotexp.
* In the restricted sample, the corr bw ltotexp and totchr is quite high.
* Therefore, clustering the errors by totchr changes the statistical
* significance of totchr (reduces it when compared to
* the default case of homoskedasticity).
* In the entire sample, the corr is even higher. Therefore, the change in the
* t-statistic is even larger when errors are clustered (when compared to the default case.)
* See HW#2.