** QUIZ 6
* Question 1
*(a) The error term u contains, among other variables, family income, which has a positive
* effect on GPA and is also very likely to be correlated with PC ownership. PC ownership can
* be seen as an endogenous explanatory variable.
*(b) Families with higher incomes are more likely to be able to buy computers for their children.
* For this reason, family income certainly satisfies the requirement for an instrumental
* variable to be correlated with the endogenous explanatory variable.
* But, as mentioned in part (a), faminc has a positive affect on GPA, so the requirement for
* a good IV that it be uncorrelated with the error term fails for faminc.
* If we observed faminc we would include it as an explanatory variable in the equation.
* If it is the only important omitted variable correlated with PC, we can then estimate
* the expanded equation by OLS.
*(c) This is a natural experiment that affects whether or not some students own computers.
* Some students who buy computers with the grant would not have a computer without the grant.
* Students who did not receive the grant might still own computers.
* Define a dummy variable, grant, equal to one if the student received a grant, and zero otherwise.
* Then, if grant was randomly assigned, it is uncorrelated with u.
* In particular, it is uncorrelated with family income and other socioeconomic factors in u.
* Further, grant should be correlated with PC: the probability of owning a PC should be
* significantly higher for a student who receives the grant.
* However, if the university gave grant priority to low-income students, grant would be
* negatively correlated with u, and IV would be inconsistent.
* Question 2
. use http://fmwww.bc.edu/ec-p/data/wooldridge/fertil2
. tab children
children | Freq. Percent Cum.
------------+-----------------------------------
0 | 1,132 25.96 25.96
1 | 907 20.80 46.76
2 | 696 15.96 62.71
3 | 528 12.11 74.82
4 | 392 8.99 83.81
5 | 255 5.85 89.66
6 | 197 4.52 94.18
7 | 134 3.07 97.25
8 | 68 1.56 98.81
9 | 32 0.73 99.54
10 | 13 0.30 99.84
11 | 3 0.07 99.91
12 | 3 0.07 99.98
13 | 1 0.02 100.00
------------+-----------------------------------
Total | 4,361 100.00
* (a)
. reg children educ age agesq
Source | SS df MS Number of obs = 4361
-------------+------------------------------ F( 3, 4357) = 1915.20
Model | 12243.0295 3 4081.00985 Prob > F = 0.0000
Residual | 9284.14679 4357 2.13085765 R-squared = 0.5687
-------------+------------------------------ Adj R-squared = 0.5684
Total | 21527.1763 4360 4.93742577 Root MSE = 1.4597
------------------------------------------------------------------------------
children | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | -.0905755 .0059207 -15.30 0.000 -.102183 -.0789679
age | .3324486 .0165495 20.09 0.000 .3000032 .364894
agesq | -.0026308 .0002726 -9.65 0.000 -.0031652 -.0020964
_cons | -4.138307 .2405942 -17.20 0.000 -4.609994 -3.66662
------------------------------------------------------------------------------
* Another year of education, holding age fixed, results in about .091 fewer children.
* In other words, for a group of 100 women, if each gets another year of education,
* together they are expected to have about nine fewer children.
* (b)
* To check if frsthalf is partially correlated with educ, regress educ on frsthalf and all
* exogenous variables in the structural equation.
. reg educ age agesq frsthalf
Source | SS df MS Number of obs = 4361
-------------+------------------------------ F( 3, 4357) = 175.21
Model | 7238.42472 3 2412.80824 Prob > F = 0.0000
Residual | 60001.141 4357 13.7712052 R-squared = 0.1077
-------------+------------------------------ Adj R-squared = 0.1070
Total | 67239.5657 4360 15.4219187 Root MSE = 3.711
------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.1079504 .0420402 -2.57 0.010 -.1903706 -.0255302
agesq | -.0005056 .0006929 -0.73 0.466 -.0018641 .0008529
frsthalf | -.8522854 .1128296 -7.55 0.000 -1.073489 -.6310821
_cons | 9.692864 .5980686 16.21 0.000 8.520346 10.86538
------------------------------------------------------------------------------
* Women born in the first half of the year are predicted to have almost one year less
* education, holding age fixed. The t-statistic on frsthalf is greater than 7.5 in absolute value,
* so we can say that the identification condition holds.
* (c)
* Estimate the structural equation using frsthalf as an IV for educ.
. ivreg children age agesq (educ= frsthalf)
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 4361
-------------+------------------------------ F( 3, 4357) = 1765.12
Model | 11844.96 3 3948.32001 Prob > F = 0.0000
Residual | 9682.2163 4357 2.22222086 R-squared = 0.5502
-------------+------------------------------ Adj R-squared = 0.5499
Total | 21527.1763 4360 4.93742577 Root MSE = 1.4907
------------------------------------------------------------------------------
children | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | -.1714989 .0531796 -3.22 0.001 -.2757581 -.0672398
age | .3236052 .0178596 18.12 0.000 .2885913 .3586191
agesq | -.0026723 .0002797 -9.55 0.000 -.0032206 -.0021239
_cons | -3.387805 .5481502 -6.18 0.000 -4.462459 -2.313152
------------------------------------------------------------------------------
Instrumented: educ
Instruments: age agesq frsthalf
------------------------------------------------------------------------------
* The estimated effect of education on fertility in the IV estimation is much larger.
* As expected, the standard error for the IV estimate is also bigger,
* about nine times bigger than the OLS std error. This gives us a fairly wide 95% CI.
* (d)
. reg children educ age agesq electric tv bicycle
Source | SS df MS Number of obs = 4356
-------------+------------------------------ F( 6, 4349) = 984.92
Model | 12387.1794 6 2064.5299 Prob > F = 0.0000
Residual | 9116.10133 4349 2.09613735 R-squared = 0.5761
-------------+------------------------------ Adj R-squared = 0.5755
Total | 21503.2808 4355 4.93760752 Root MSE = 1.4478
------------------------------------------------------------------------------
children | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | -.0767093 .0063526 -12.08 0.000 -.0891636 -.064255
age | .3402038 .0164417 20.69 0.000 .3079697 .3724379
agesq | -.0027081 .0002706 -10.01 0.000 -.0032385 -.0021777
electric | -.3027293 .0761869 -3.97 0.000 -.4520944 -.1533641
tv | -.2531443 .0914374 -2.77 0.006 -.4324082 -.0738803
bicycle | .317895 .0493661 6.44 0.000 .2211123 .4146778
_cons | -4.389784 .2403173 -18.27 0.000 -4.860928 -3.918639
------------------------------------------------------------------------------
. ivreg children age agesq electric tv bicycle (educ= frsthalf)
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 4356
-------------+------------------------------ F( 6, 4349) = 921.71
Model | 11991.5668 6 1998.59447 Prob > F = 0.0000
Residual | 9511.71394 4349 2.18710369 R-squared = 0.5577
-------------+------------------------------ Adj R-squared = 0.5571
Total | 21503.2808 4355 4.93760752 Root MSE = 1.4789
------------------------------------------------------------------------------
children | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | -.1639814 .0655269 -2.50 0.012 -.2924476 -.0355153
age | .3281451 .0190587 17.22 0.000 .2907803 .3655098
agesq | -.0027222 .0002766 -9.84 0.000 -.0032644 -.00218
electric | -.1065314 .165965 -0.64 0.521 -.4319073 .2188445
tv | -.002555 .2092301 -0.01 0.990 -.4127527 .4076427
bicycle | .3320724 .0515264 6.44 0.000 .2310543 .4330904
_cons | -3.591332 .6450889 -5.57 0.000 -4.856035 -2.326629
------------------------------------------------------------------------------
Instrumented: educ
Instruments: age agesq electric tv bicycle frsthalf
------------------------------------------------------------------------------
qui reg children educ age agesq
estimates store REG_OLS
qui ivreg children age agesq (educ= frsthalf)
estimates store REG_IV
qui reg children educ age agesq electric tv bicycle
estimates store REG_OLSexpanded
qui ivreg children age agesq electric tv bicycle (educ= frsthalf)
estimates store REG_IVexpanded
estimates table REG_OLS REG_IV REG_OLSexpanded REG_IVexpanded, b(%9.4f) se stats(N r2)
--------------------------------------------------------------
Variable | REG_OLS REG_IV REG_OLS~d REG_IVe~d
-------------+------------------------------------------------
educ | -0.0906 -0.1715 -0.0767 -0.1640
| 0.0059 0.0532 0.0064 0.0655
age | 0.3324 0.3236 0.3402 0.3281
| 0.0165 0.0179 0.0164 0.0191
agesq | -0.0026 -0.0027 -0.0027 -0.0027
| 0.0003 0.0003 0.0003 0.0003
electric | -0.3027 -0.1065
| 0.0762 0.1660
tv | -0.2531 -0.0026
| 0.0914 0.2092
bicycle | 0.3179 0.3321
| 0.0494 0.0515
_cons | -4.1383 -3.3878 -4.3898 -3.5913
| 0.2406 0.5482 0.2403 0.6451
-------------+------------------------------------------------
N | 4361 4361 4356 4356
r2 | 0.5687 0.5502 0.5761 0.5577
--------------------------------------------------------------
legend: b/se
* Adding electric, tv, and bicycle to the model reduces the estimated effect
* of educ in both cases, but not by too much.
* In the equation estimated by OLS, the coefficient on tv implies that,
* other factors fixed, four families that own a television will have about
* one fewer child than four families without a TV.
* Television ownership can be a proxy for different things,
* including income and perhaps geographic location.
* Can you come up with a causal interpretation?
* Interestingly, the effect of TV ownership is practically and statistically
* insignificant in the equation estimated by IV (even though we are not using
* an IV for tv).
* The coefficient on electric is also greatly reduced in magnitude in the IV estimation.
* The substantial drops in the magnitudes of these coefficients suggest that
* a linear model might not be the appropriate functional form,
* which would not be surprising since children is a count variable.