use "C:\data\03728-0001-data.dta", clear
keep if sex==2
* women
keep if age > = 40
* completed fertility
keep if year==2002|year==1998|year==1994|year==1990|year==1986|year==1982|year==1978|year==1974
ren childs kids
drop if kids ==.
drop if age ==.
drop if sibs ==.
drop if educ ==.
gen afb = agekdbrn
gen city16=(res16>=4)&(res16<=6)
gen lowinc16 = (incom16==1)|(incom16==2)
gen immig = (born ==2)|(parborn==8)
replace race = racecen1 if year == 2002
gen white = race == 1
label var afb "woman's age when 1st child born"
label var white "=1 if r's race is white"
label var immig "=1 if respondent or both r's parents born abroad"
label var lowinc16 "=1 if income is below average income at age 16"
label var city16 "=1 if respondent lived in a city (pop>50000) at age 16"
keep year sibs kids age afb educ white city16 lowinc16 immig
order kids age educ year sibs afb white city16 lowinc16 immig
gen trend = year - 1974
* Now to replicated the results of Table 8.7 in the W&B textbook
* Poisson Regression Model
poisson kids educ trend white immig lowinc16 city16
* As discussed in the W&B textbook, one can get robust standard
* errors for the Poisson Regression coeficient estimates by
* using the vce(rbust) option. This is what is called
* Quasi-Maximum Likelihood Estimation (QMLE)
poisson kids educ trend white immig lowinc16 city16, vce(robust)
* Negative Binomial Regression I
nbreg kids educ trend white immig lowinc16 city16
* Negative Binomial Regression II
gnbreg kids educ trend white immig lowinc16 city16
* Tabulate the kids variable
* From this information we might see if there are
* an excess number of zeroes. This can be done by
* by calculating Prob(y=0) using a Poisson model (with no
* explanatory variables) and the lambda value set equal
* to the sample mean of the counts. If there appears to
* be an excess number of zeroes when comparing the proportion
* of zeroes in the sample with the Poisson Prob(y = 0)
* then we can model the counts using a Hurdle
* model or an excess zeroes model. See the W&B book.
tabulate kids
summarize kids
* From the tabulation and summarize we see that the proportion
* of zeroes in the sample is 744/5150 = 0.1445. The mean of the
* counts is 2.59. Then the Prob(y = 0) = exp(-2.59) = 0.075 is
* what we would expect of a Poisson Distribution with a mean
* count of 2.59 (i.e. lamda = 2.59). We are then interested
* in whether or not there is a significanct difference in
* the two proportions, 0.1445 and 0.075. A 95% confidence interval
* of the true proportion, say, p, is given by 0.1445 +- z(.025)*
* sqrt[(0.1445)*(1 - 0.1445)/5150] = 0.1445 +- 0.004899 which
* obviously does not encompass 0.075. Given this result we
* probably should continue our analysis assuming an excess of
* zeroes by using Hurdle or zero-inflated negative binomial
* models.