*** Table 5.1 in W&B book: MULTINOMIAL LOGIT EXAMPLE: CHOICE OF SECONDARY SCHOOL
* This is a multinomial problem. This program reproduces Table 5.1 in the
* W&B textbook. The regressors in this problem are case-specific only.
* If the regressors had been alternative-specific only then the
* model would be a conditional logit model. If the regressors of
* a multinomial model have BOTH case-specific and alternative-specific
* information then it is called a generalized (or hybrid) multinomial
* model. The data here is in WIDE FORM. That is, each row contains
* the dummy variable outcomes associated with the individual's multinomial choice
* plus the characteristics of the individual that makes the choice. In the case
* of conditional multinomial logit and generalized multinomial logit
* modeling, the data must be in LONG FORM. That is, there are several
* vertically "stacked" observations per case (individual). Each of the
* stacked observations contains the information on the L alternatives
* resulting in L stacked observations per case plus the case-specific
* information if available as in the generalized multinomial model.
* Read in dataset and describe dependent variable and regressors.
use c:\data\school.dta, clear
describe
* Summarize dependent variable and regressors
summarize, separator(0)
* Tabulate the dependent variable
tabulate school
* Table of log of income by school
table school, contents(N linc mean linc sd linc)
* Table of mother's education by school
table school, contents(N motheduc mean motheduc sd motheduc)
********** Table 5.1 in W&B MULTINOMIAL LOGIT MODEL OF SCHOOL CHOICE
* Creation of year dummies
generate d95 = (year == 1995)
generate d96 = (year == 1996)
generate d97 = (year == 1997)
generate d98 = (year == 1998)
generate d99 = (year == 1999)
generate d00 = (year == 2000)
generate d01 = (year == 2001)
generate d02 = (year == 2002)
* Create full time dummy for mother's employment. 1 = full time employment
* 0 = otherwise.
* Creat Work dummy if mother works at all or is not employed. 1 = employed
* (either full-time or part-time), 0 = otherwise.
generate mothftime = (mothemp == 1)
generate mothwork = (mothemp < 3)
* Multinomial logit with base outcome alternative 1
mlogit school motheduc mothwork linc lsize parity d95 d96 d97 d98 d99 d00 d01 d02, baseoutcome(1) nolog
* Relative-risk ratio (RRR) option reports exp(b) rather than b
mlogit school motheduc mothwork linc lsize parity d95 d96 d97 d98 d99 d00 d01 d02, rr baseoutcome(1) nolog
* Using the very convenient "listcoef" command to generate relative risk ratios for non-base althernatives
listcoef motheduc mothwork linc lsize parity, help
* Following used below
estimates store MNL
* Marginal effect at mean of explanatory variables for outcome 3
mfx, predict(pr outcome(3))
* Average marginal effect of explanatory variables for outcome 3
margins, dydx(*) predict(outcome(3)) noatlegend
* Predict probabilities of choice of each mode and compare to actual freqs
predict pmlogit1 pmlogit2 pmlogit3, pr
summarize pmlogit*, separator(3)
* List predicted values of alternatives for first 10 observations
list pmlogit* in 1/10
* Create Classification Table and get accuracy rate
egen pred_max = rowmax(pmlogit*)
generate pred_choice = .
forv i=1/3 {
replace pred_choice = `i' if (pred_max == pmlogit`i')
}
local school_label: value label school
label values pred_choice `school_label'
tabulate pred_choice school
* Accuracy rate = (113 + 49 + 208)/675 = 0.548
* In comparison, the accuracy rate that one would expect from naively classifying
* using the majority class (Gymnasium) would be 41.04% accuracy on average.
* See the previous tabulation result for the dependent variable - school.
* Thus, the current mlogit classifier is providing a LIFT of 54.8/41.04 = 1.335.