* See Keane.des for description of the data. Example taken from
* Econometric Analysis of Cross Section and Panel Data
* by Jeffrey M. Wooldridge, 2002, pp. 498 - 450.
* Original panel data from Keane and Wolpin, 1997
* A cross-section of men for the year 1987. The three possible
* outcomes are 1) enrolled in school (status = 0), not in school
* (status = 1), and working (status = 2). The explanatory variables
* are education, a quadratic in past work experience, and a black
* binary indicator. The base category is enrolled in school. Out of
* 1,717 observatins, 99 are enrolled in school, 332 are at home,
* and 1,286 are working.
* Multinomial Logit Estimates of School and Labor Market Decisions
* Table 15.2 in Wolldrdge, p. 499.
use c:\data\keane.dta
keep if (year == 87)
* Tabulate the dependent variable (status)
tabulate status
* Multinomial logit with base outcome alternative 1 (status=0)
mlogit status educ exper expersq black, baseoutcome(1) nolog
* Odds Ratio estimates - Multinomial logit with base outcome alternative 1 (status=0)
mlogit status educ exper expersq black, rr baseoutcome(1) nolog
* Predict probabilities of each status and compare to actual freqs
quietly mlogit status educ exper expersq black, baseoutcome(1)
predict pmlogit1 pmlogit2 pmlogit3, pr
summarize pmlogit* st, separator(3)
list pmlogit* in 1/10
* Create Classification Table and get accuracy rate
egen pred_max = rowmax(pmlogit*)
generate pred_choice = .
forv i=1/3 {
replace pred_choice = `i' if (pred_max == pmlogit`i')
}
local status_label: value label status
label values pred_choice `status_label'
tabulate pred_choice status
* Accuracy rate = (12 + 130 + 1224)/1717 = 0.796
* In comparison, the accuracy rate that one would expect from naively classifying
* using the majority class (status = 2 - work) would be 74.9% accuracy on average.
* See the previous tabulation result for the dependent variable - status.
* Thus, the current mlogit classifier is providing a LIFT of 79.6/74.9 = 1.06.